Re: password file problem.
Ugh, Thanks everyone. What a Monday I came back to after the long weekend. If I had my head screwed on straight... I already knew all this but forgot in the heat of non related issues. Again! Thanks! Matt Anderson On Monday 26 November 2001 2:53 pm, you wrote: Hello everyone. I can't seem to get the --password-file= option to work correctly. I'm using ssh as the transport. I've got the file 0600 and only the password with no carriage return. Can someone provide an example of use. Here is my try: rsync -rtvvuz -e ssh --password-file=file /home/matt/* remotecomputer:/home/matt It still asks for a password which, if I type it manually, it then runs fine. Thanks! Matt Anderson
Re: Rsync: Re: patch to enable faster mirroring of large filesystems
Dear all, here's my own (renewed) pitch to throw in a --files-from patch. As Dave has suggested in the past, transferring a list of files can be accomplished using --include and --exclude, and has called for people to test the performance gains of his old optimization when using these options (see his original mail below). I've finally decided to bite the bullet and try this out on a real-life case, the syncronization of a directory tree containing just over 1 million files in 400 directories. Currently the whole directory tree is rsynced to our mirror sites, although only a subset of the files (about 720,000) are really used in production mode. Therefore having a good and simple way to specify the list of files to be syncronized may save time and disk space. In order to do the test, I built the list of files to be transferred and then fed it to rsync 2.3.2 using --include followed by an --exclude '*', which should trigger the include optimization Dave has talked about. Not 100% sure that this was the right way to do it, I also created a second list which also contained the list of directories in addition to the plain filenames, as explained in http://lists.samba.org/pipermail/rsync/2001-January/003372.html Either way, the results show that using the include/exclude mechanism is highly inefficient: the regular rsync over the whole directory tree of 1 million files takes about 15 minutes, while the include/exclude solution takes over 2 hours in one case (no directories) and it just hangs in the other case. It's true that when using include/exclude you have to account for the additional transfer of the file list from client to server, but bandwidth is clearly not the bottleneck in this case since both machines are on the same gigabit LAN. By trussing the processes I noticed that building the local include/exclude structure is very slow, but haven't looked into the details. My guess is that having to deal with regexps, file matching, and continuous reallocation of memory for the include/exclude file structure takes its toll on rsync. As far as I can tell the overwhelming amount of time is spent in dealing with manipulating the include/exclude lists rather than actually performing operations on files. Here are the numbers: adstree-17: wc /tmp/bib.list /tmp/bib-dir.list 722941 722941 13012938 /tmp/bib.list 723277 723277 13014618 /tmp/bib-dir.list 1446218 1446218 26027556 total adstree-18: time rsync-2.3.2 -avvn rsync://adsfore.harvard.edu/text-257/. . receiving file list ... done wrote 75 bytes read 15233741 bytes 16460.09 bytes/sec total size is 947471650 speedup is 62.20 83.88u 364.48s 15:25.50 48.4% adstree-19: time rsync-2.3.2 -avvn --include-from /tmp/bib.list --exclude '*' rsync://adsfore.harvard.edu/text-257/. . receiving file list ... done wrote 16627723 bytes read 72 bytes 2100.13 bytes/sec total size is 0 speedup is 0.00 3618.25u 33.40s 2:11:56.90 46.1% adstree-20: time rsync-2.3.2 -avvn --include-from /tmp/bib-dir.list --exclude '*' rsync://adsfore.harvard.edu/text-257/. . Mon Nov 26 09:27:58 EST 2001 receiving file list ... ^C 3633.04u 61.41s 23:20:19.32 4.3% In message [EMAIL PROTECTED], Dave Dykstra writes: On Tue, Nov 20, 2001 at 11:45:44AM +, Lachlan Cranswick wrote: Is there any chance this can be added into the distribution as it sounds really nifty. I exchanged some off-list email with the patch author and besides the fact that it adds too many options I object to it because it only supports copying from the local side to remote, not also from remote to local. His option is essentially the same as the --files-from option that was discussed last January. See the thread in the archives beginning at http://lists.samba.org/pipermail/rsync/2001-January/003368.html In summary, he can do pretty much what he wants by making an --include-from list that lists all the parent directories of the files he wants plus all the files he wants and end it with an --exclude '*', but before rsync 2.4.0 I had an optimization (which I put in when I officially maintained rsync) that would directly read the included files in that situation rather than recurse through all the directories. The author of rsync Andrew Tridgell took that optimization out in 2.4.0 because he thought it was confusing that the optimization didn't require explicitly listing the parent directories like an --exclude '*' otherwise does, and I couldn't prove that recursing through the directories made a significant performance impact. Later people argued that a new option --files-from would be worth doing just for convenience even if not for performance, but I said I still wanted people to do some performance testing before I'd implement it. I wanted people to run version 2.3.2 on their systems and compare the time difference between running with and without my optimization, which you can force by simply putting in a single wildcard in one included
Re: rsync server over SSH [includes code patches]
Hi -- Sorry for the delay getting back to you; Thanksgiving holiday intervened and I'm only now catching up on my email backlog 1. You're entirely right about the --remote-user option. I'll remove that. 2. I'll merge with the latest version from CVS. 3. I'll do that; I prefer -u myself. That's better. I read over the patch a little more closely this time and have a few more comments: 1. The --remote-user option is unnecessary because you can instead specify '-e ssh -l user'. 2. Please post the patch again against the latest development version of rsync out of CVS (http://www.samba.org/samba/cvs.html or rsync://rsync.samba.org/ftp/unpacked/rsync/) because that's the form it will need to be in order to get it in. I'll test it out then and look at it closely. 3. Please post the next patch with GNU diff's -u option. It's easier to read. - Dave Dykstra
Re: rsync server over SSH [includes code patches]
Actually, my patch already has that in rsync_module(): if (is_a_socket(f_in)) { addr = client_addr(f_in); host = client_name(f_in); } else { char *ssh_client = getenv(SSH_CLIENT); addr = ssh_client ? ssh_client : n/a; host = remote shell connection; } The problem is that I was only looking at string usages of SSH_CLIENT (debugging mostly) rather than allow_access() use. That shouldn't be hard to fix; we just need to truncate a copy of the SSH_CLIENT string at the first whitespace. I'll see if I can get that into the next version of my patch. JD On Mon, 26 Nov 2001, Martin Pool wrote: On 25 Nov 2001, Jeremy Hansen [EMAIL PROTECTED] wrote: Ok, I have the patch working, things seems to work except that using hosts allow in the rsyncd.conf seems to break things. What an interesting bug. :-) The proximate connection to the rsync server will be from the sshd process which is running on the server host, so stdin will probably be a unix-domain socket. In other words because rsync is not directly connected to the client, it can't use the usual mechanism to find the client's address. Perhaps we can get rsync to look at $SSH_CLIENT, which contains the necessary information. We need to think carefully to make sure this is secure though. -- Martin
Not all files synched - hard link problems???
I am sorry if this has been covered before: I have done a couple of futile searches in the bug reporting system and is there any way to search the archive? I am having a strange symptom: I am synching directories (that have very long file names, by the time the full path is specified) and a lot of hard links. It seems that the directory is being copied piecemeal - that is, if I run rsync enough times, the entire contents ultimately get copied. It seems like I am running into some hard limit in the size of the filelist or something. I am running 2.4.6 on linux - the source directory is remote mounted on solaris, the destination is linux. For instance, I have a directory that is in the tree that contains 173 files at the source - most of which are hard links - here is the effect of an ls | wc on the destination after five succesive identical runs of rsync on the source (this is in a subdirectory of one of the directories in the command below). The directory did not exist before running the sync. [root@ks-s0-107-1- SC]# ls | wc 49 491286 [root@ks-s0-107-1- SC]# ls | wc 85 852234 [root@ks-s0-107-1- SC]# ls | wc 120 1203243 [root@ks-s0-107-1- SC]# ls | wc 152 1524112 [root@ks-s0-107-1- SC]# ls | wc 173 1734739 So that it seems to have synched 49, then 36, then 35, then 32, the finally the last 21 files in the directory. (The increment seems to vary if I try it again, i.e. from 49 to 90). I get no error mesages (that I can see). In fact, the first time I run the program, it seems to notice all the files and produces 163 messages that file blah-de-blah is a hard link, but then doesn't seem to make the link for file blah-de-blah - this behavior remains constant with each succesive run. Here is the rsync command (generated by a perl script and broken into little pieces by my mailer) rsync -e 'ssh -l root -p 8989' --rsync-path /usr/bin/rsync --stats --progress -tpgoCHr -v -v --include install/ --include install/kickstart/ --include install/kickstart/zantaz_upgrade/ --include install/kickstart/zantaz_upgrade/20011121/ --include install/redhat-7.1/ --include install/redhat-7.1/zantaz_rpm_upgrade/ --include install/redhat-7.1/zantaz_rpm_upgrade/DS05_00_00-SAM-SUN-20011121/ --include install/redhat-7.1/zantaz_rpm_upgrade/DS05_00_00-SAM-SUN-20011121-devel/ --include install/kickstart/zantaz_upgrade/20011121/** --include install/redhat-7.1/zantaz_rpm_upgrade/DS05_00_00-SAM-SUN-20011121/** --include install/redhat-7.1/zantaz_rpm_upgrade/DS05_00_00-SAM-SUN-20011121-devel/** --exclude * /net/swdev/staging/* [EMAIL PROTECTED]:/ Thanks, Dave
Rsync: Re: patch to enable faster mirroring of large filesystems
Date: Tue, 27 Nov 2001 10:49:11 -0600 From: Dave Dykstra [EMAIL PROTECTED] Thank you very much for doing the test Alberto. I didn't have any set of files that large on which I could do a test, and as I said when I tested the worse case I could think of with my application I couldn't measure an appreciable difference. First, I want to make sure that you really did get the optimization turned on. [ . . . 3 paragraphs of clues on verifying optimization omitted . . . ] I know you're trying to get reliable statistics so it's clear what sort of performance we're talking about here. But may I respectfully suggest that -having- to be so careful about whether optimization actually got turned on is a clue that there is still a big problem here? Seriously, even if --files-from= was -not- as efficient as the optimized case, if it's so difficult to ensure that you -are- in the optimized case, what's the point? If 90% of the users get it wrong--- and 90% of -those- can't even figure out how to -tell-, even if they're trying to be careful---then clearly the optimization isn't as useful as it might be. (And btw, if it's that hard to figure out, there should be a debugging switch that -tells- the user whether it got turned on. Yet another out-of-control command-line option, or perhaps an addition to one of the verbose modes, but not one that forces the user to drown in lots of other output, or cause unpatched rsyncs to hang, or... People shouldn't have to patch their local rsync just be sure this is happening.) Meanwhile, people are tying themselves in knots trying to figure out how specify which files to transfer. As I pointed out months ago when this subject first came up, it seemed that about half the traffic on the list was from people who were confused about how to specify the list of files that rsync was supposed to handle. Letting them use other tools (e.g., find, or some perl script they just wrote) that were more transparent and with which they were more familiar seemed like it would dramatically decrease their learning curve. I would propose that, -whether or not- the use of --files-from= was a performance-killer, rsync should have it. It -would- allow people to quickly debug a working setup. -If- for some reason its performance was bad compared to include/exclude, -then- they could go from a known-working configuration that might not run at full speed to a more-difficult-to-debug one that did. This is the right direction. (If life was really that bad, it might not be hard for the statistics from a run to indicate how much time was spent traversing the file system vs moving files over the connection, which would be a clue that it was time to move to the optimized case. But it'd be nice to just avoid having to think about this hair in the first place.) And, of course, if the data we've seen -was- generated with optimization, then obviously there's no downside to --files-from=. It seems pretty clear that the data presented paints a bad picture. It's hard to believe that --files-from= could be worse. P.S. Would --files-from= reduce rsync's large memory consumption as well, or does it still imply rsync caching some info about every file it sees during its entire run, and never flushing this info until the end? Not remembering something about each file for the entire run would alone be a powerful reason to include it---there are some tasks for which finishing -at all- is more important than waiting a while. It sucks to tell the user, You can't use the slower approach at all because we thinmk you should always be fast. Go buy more memory instead---if the machine is under your control, can take more memory in the first place, etc. I don't recall whether -both- ends of the connection are so memory-intensive; if so, this is even more important.
ERROR
I have had rsync set up for 3 years running fine. suddenly I am getting an error. @ERROR: Unknown module 'root' I haven't changed the versions, nor any of the configs, nor the OS. The module is clearly in there and worked yesterday. What could be causing this? rsync v 2.3.1 Solaris 7 [root] path = / comment = root hosts allow = uid = 0 gid = 0 auth users = ghddfs secrets file = /etc/rsync.secret
Re: Rsync: Re: patch to enable faster mirroring of large filesystems
On Tue, Nov 27, 2001 at 02:34:22PM -0500, Lenny Foner wrote: ... I know you're trying to get reliable statistics so it's clear what sort of performance we're talking about here. But may I respectfully suggest that -having- to be so careful about whether optimization actually got turned on is a clue that there is still a big problem here? No, the difficulty of turning on the optimization is irrelevant because the optimization is no longer in the current version of rsync. It is only needed to do the performance test which is a one-time thing. You seem to be missing my point. I agree that --files-from is useful even if it has no impact or even negative impact on performance. Nevertheless, I want to know what the impact on performance will be compared to using an explicit include-from list, and I am bartering my volunteer effort of developing the code for someone else's volunteer effort of doing performance tests of the old optimized case which I expect to be practically identical to the performance of --files-from. I personally don't need --files-from because the --include-from list is working fine for me, so I need extra motivation to put some time into it. I think it has to be done much like that optimization was done and since I wrote the optimization in the first place I expect it will probably be more efficient for me to do it than it would be for somebody else to do it; otherwise I'd probably just say forget it and wait for somebody else to write the code. ... [ranting deleted] ... P.S. Would --files-from= reduce rsync's large memory consumption as well, or does it still imply rsync caching some info about every file it sees during its entire run, and never flushing this info until the end? I'm pretty sure that rsync won't use up memory for excluded files so it would make no difference. - Dave Dykstra
Re: Not all files synched - hard link problems???
Unfortunately there is no way to search the archive. That would be very useful. I haven't heard of any similar problems reported with hard links before, and I've been following this list closely for several years. I notice your command line looks pretty complicated, so I suggest that you try to narrow it down to the smallest reproducible case, preferably one that you can completely describe to someone else how to reproduce starting from scratch. Often such an exercise alone will reveal a solution, but if not at least it allows somebody else to debug it. - Dave Dykstra On Tue, Nov 27, 2001 at 11:29:54AM -0800, Dave Madole wrote: I am sorry if this has been covered before: I have done a couple of futile searches in the bug reporting system and is there any way to search the archive? I am having a strange symptom: I am synching directories (that have very long file names, by the time the full path is specified) and a lot of hard links. It seems that the directory is being copied piecemeal - that is, if I run rsync enough times, the entire contents ultimately get copied. It seems like I am running into some hard limit in the size of the filelist or something. I am running 2.4.6 on linux - the source directory is remote mounted on solaris, the destination is linux. For instance, I have a directory that is in the tree that contains 173 files at the source - most of which are hard links - here is the effect of an ls | wc on the destination after five succesive identical runs of rsync on the source (this is in a subdirectory of one of the directories in the command below). The directory did not exist before running the sync. [root@ks-s0-107-1- SC]# ls | wc 49 491286 [root@ks-s0-107-1- SC]# ls | wc 85 852234 [root@ks-s0-107-1- SC]# ls | wc 120 1203243 [root@ks-s0-107-1- SC]# ls | wc 152 1524112 [root@ks-s0-107-1- SC]# ls | wc 173 1734739 So that it seems to have synched 49, then 36, then 35, then 32, the finally the last 21 files in the directory. (The increment seems to vary if I try it again, i.e. from 49 to 90). I get no error mesages (that I can see). In fact, the first time I run the program, it seems to notice all the files and produces 163 messages that file blah-de-blah is a hard link, but then doesn't seem to make the link for file blah-de-blah - this behavior remains constant with each succesive run. Here is the rsync command (generated by a perl script and broken into little pieces by my mailer) rsync -e 'ssh -l root -p 8989' --rsync-path /usr/bin/rsync --stats --progress -tpgoCHr -v -v --include install/ --include install/kickstart/ --include install/kickstart/zantaz_upgrade/ --include install/kickstart/zantaz_upgrade/20011121/ --include install/redhat-7.1/ --include install/redhat-7.1/zantaz_rpm_upgrade/ --include install/redhat-7.1/zantaz_rpm_upgrade/DS05_00_00-SAM-SUN-20011121/ --include install/redhat-7.1/zantaz_rpm_upgrade/DS05_00_00-SAM-SUN-20011121-devel/ --include install/kickstart/zantaz_upgrade/20011121/** --include install/redhat-7.1/zantaz_rpm_upgrade/DS05_00_00-SAM-SUN-20011121/** --include install/redhat-7.1/zantaz_rpm_upgrade/DS05_00_00-SAM-SUN-20011121-devel/** --exclude * /net/swdev/staging/* [EMAIL PROTECTED]:/ Thanks, Dave
FW: ERROR
On Tue, Nov 27, 2001 at 01:22:43PM -0800, Simison, Matthew wrote: turns out, I have an rsync virtual site builder script, which makes the new user conf file run with perm 400. And apparently someone thought that the main rsyncd.conf file needed to be the same, and that killed all connections. Sorry for bothering. Matt -Original Message- From: Dave Dykstra [mailto:[EMAIL PROTECTED]] Sent: Tuesday, November 27, 2001 1:13 PM To: Simison, Matthew Subject: Re: ERROR Does it show up when you list the modules? That is, if you do just rsync servername:: ? - Dave Dykstra On Tue, Nov 27, 2001 at 12:29:09PM -0800, Simison, Matthew wrote: I have had rsync set up for 3 years running fine. suddenly I am getting an error. @ERROR: Unknown module 'root' I haven't changed the versions, nor any of the configs, nor the OS. The module is clearly in there and worked yesterday. What could be causing this? rsync v 2.3.1 Solaris 7 [root] path = / comment = root hosts allow = uid = 0 gid = 0 auth users = ghddfs secrets file = /etc/rsync.secret
Re: Not all files synched - hard link problems???
On 27 Nov 2001, Dave Dykstra [EMAIL PROTECTED] wrote: Unfortunately there is no way to search the archive. That would be very useful. Just use google and say site:lists.samba.org rsync mbp prototype or whatever. -- Martin
How to avoid copying empty directories?
rsync -avu --include 'tmp1/*/*.c' --include */ --exclude * tmp1 tmp2 The above command copies all the empty directories under tmp1/ . Is there any way to avoid it?
Re: rsync server over SSH [includes code patches]
On 27 Nov 2001, Dave Dykstra [EMAIL PROTECTED] wrote: 2.4.7 isn't released yet. Martin has put a lot of changes in the last week into CVS, and when I tried it yesterday it didn't compile anywhere but Linux. Today it looks a bit better but I still have problems on all my platforms except Linux Sgi. Anyway, the point is that it's probably better for you to wait until he does make a release to save yourself some work because it isn't done changing. Martin, the problem on all my platforms is that it can't find AF_INET6 when compiling lib/inet_pton.c (I compile on the oldest releases of each type that are still in active use so I can rely on upward compatibility; for example, it works on Solaris 7 but not Solaris 5.5.1). Yes, the IPv6 patch which was supposed to not break anything unless --enabled actually turned out to break most things aside from BSD/Linux, and I have been gradually trying to sort out their header files. Other good news is that make check now seems to work on most platforms, except for Redhat/Insure++ which generates some apparently spurious warnings. AF_INET6 __P(()) Thanks, I know about those two. I'm going to try to fix them today. I'm not sure if you've seen this: http://build.samba.org:80/build.pl?tree=rsyncfunction=Recent+Builds It's a pretty good tool, since I don't have direct access to (for example) a CRAY. The downside is that it requires changes to be checked in to HEAD for them to be tested. It's not easy being green! That's a simple fix, but going beyond that getaddrinfo.c has a problem because __P isn't defined. __P is an idiom for handling old compilers that can't cope with ANSI prototypes. We don't support them, so I just need to get rid of it. -- Martin
More - cause - not all files synched - program hangs in select
The bug has something to do with verbosity - it works fine without verbosity on. The irony, of course, is that one turns verbosity on to fix these things (I have applied the hang patch, BTW). The last thing I see is the match_report message, then the program hangs in select. Here are the symptoms: I am syncing a number of parallel directories that contain large numbers hard links. At the source: dir a: f1 f2 f3 f4 f5 dir b: f1 f2 f3 f4 f5 dir c: f1 f2 f3 f4 f5 f1 in each directory is a hard link to every other f1, etc. (note, there are actually about 10 directories of 100 files in each directory) After my first run of rsync, I get something like this. dir a: f1 (link count 3) f2 (3) f3 (1) f4 (1) f5 (1) dir b: f1 (3) f2 (3) dir c: f1 (3) f2 (3) After my second run of rsync, I get something like this. dir a: f1 (3) f2 (3) f3 (3) f4 (1) f5 (1) dir b: f1 (3) f2 (3) f3 (3) dir c: f1 (3) f2 (3) f3 (3) After my third run of rsync, I get something like this. dir a: f1 (3) f2 (3) f3 (3) f4 (3) f5 (1) dir b: f1 (3) f2 (3) f3 (3) f4 (3) dir c: f1 (3) f2 (3) f3 (3) f4 (3) etc. So that dir a: is completely copied the first time, but only the beginnings of dir b and c. With each successive run, dirs b c ( d e f) fill up, until it really is done. Again, this doesn't happen if I turn verbosity entirely off. Perhaps somebody is closing some file descriptor - I noticed that the program was hanging on the remote box in select - and then somebody else (probably having to do with making links) is trying to write to it. Thus, the program hangs but the timing is not always the same and sometimes a few more or less links are made before it hangs. It seems that making the hard links is done in a second phase after the real file transfers are done, at which time something must be closed. I could look through the source code and fix it myself but maybe by the time I get in tomorrow somebody that knows it well will have done so. Dave Dave Madole wrote: I am sorry if this has been covered before: I have done a couple of futile searches in the bug reporting system and is there any way to search the archive? I am having a strange symptom: I am synching directories (that have very long file names, by the time the full path is specified) and a lot of hard links. It seems that the directory is being copied piecemeal - that is, if I run rsync enough times, the entire contents ultimately get copied. It seems like I am running into some hard limit in the size of the filelist or something. I am running 2.4.6 on linux - the source directory is remote mounted on solaris, the destination is linux. For instance, I have a directory that is in the tree that contains 173 files at the source - most of which are hard links - here is the effect of an ls | wc on the destination after five succesive identical runs of rsync on the source (this is in a subdirectory of one of the directories in the command below). The directory did not exist before running the sync. [root@ks-s0-107-1- SC]# ls | wc 49 491286 [root@ks-s0-107-1- SC]# ls | wc 85 852234 [root@ks-s0-107-1- SC]# ls | wc 120 1203243 [root@ks-s0-107-1- SC]# ls | wc 152 1524112 [root@ks-s0-107-1- SC]# ls | wc 173 1734739 So that it seems to have synched 49, then 36, then 35, then 32, the finally the last 21 files in the directory. (The increment seems to vary if I try it again, i.e. from 49 to 90). I get no error mesages (that I can see). In fact, the first time I run the program, it seems to notice all the files and produces 163 messages that file blah-de-blah is a hard link, but then doesn't seem to make the link for file blah-de-blah - this behavior remains constant with each succesive run. Here is the rsync command (generated by a perl script and broken into little pieces by my mailer) rsync -e 'ssh -l root -p 8989' --rsync-path /usr/bin/rsync --stats --progress -tpgoCHr -v -v --include install/ --include install/kickstart/ --include install/kickstart/zantaz_upgrade/ --include install/kickstart/zantaz_upgrade/20011121/ --include install/redhat-7.1/ --include install/redhat-7.1/zantaz_rpm_upgrade/ --include install/redhat-7.1/zantaz_rpm_upgrade/DS05_00_00-SAM-SUN-20011121/ --include install/redhat-7.1/zantaz_rpm_upgrade/DS05_00_00-SAM-SUN-20011121-devel/ --include install/kickstart/zantaz_upgrade/20011121/** --include install/redhat-7.1/zantaz_rpm_upgrade/DS05_00_00-SAM-SUN-20011121/** --include install/redhat-7.1/zantaz_rpm_upgrade/DS05_00_00-SAM-SUN-20011121-devel/** --exclude * /net/swdev/staging/* [EMAIL PROTECTED]:/ Thanks, Dave