Re: Rsync decides to copy all old files to WinXP based server
Alex Janssen writes: > I've been using rsync to create backup copies of all my data files on my > Linux laptop to my Windows XP Home based desktop for about 6 months > now. Been working as it should, copying only files that changed since > the last backup. The first backup I ran after the time change to > Daylight Saving Time it wanted to copy all of the files regardless of > the timestamp. It copied old files that had not changed as well as the > files that had changed. All of the timestamps on the destination ended > up correctly set after the copy occurred but the appeared to be the same > before the copy began as well. I am stumped. > > I don't know what the system time has to do with it seeing as it is > comparing file timestamps. FAT file systems store time stamps in local time, so they change with DST. See JW Schultz's excellent write up: http://www.cygwin.com/ml/cygwin/2003-10/msg00995.html Craig -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: What would cause an unexpected massive transfer
Harry Putnam writes: > Yeah, nice write up. Am I correct in thinking that since I've gone > thru the long backup I'm now good till next time change? Yes. > Further, if I converted the fs on the external drive to NTFS or create > an ext3 partitions, this would never have happened? Yes. Craig -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: What would cause an unexpected massive transfer
Harry Putnam writes: > I've rsynced two directory structures back and forth a few times. > > [snip] > > The file systems in volved are (xp)NTFS on one end and Fat32 on the > external drive. This is the DST problem with how Fat32 represents mtime. Fat32 uses localtime, so the unix-derived (UTC) mtime with Fat32 changes with DST. Sad, huh? See the excellent write-up by the late JW Schultz: http://www.cygwin.com/ml/cygwin/2003-10/msg00995.html Craig -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: can I preserve UIDs/GIDs when transferring with rsync?
Tomasz Chmielewski writes: > I noticed that rsync can preserve most of the file's characteristics > when it is used with "-a" option (it includes -o and -g flags for > preserving owners and groups). > > However, when I transfer data between systems, it affects my UIDs/GIDs, > making the data hard to recover. > > [snip] > > Is it possible to transfer files with rsync, and to preserve the *exact* > UIDs and GIDs, rather than usernames/groupnames (which in turn point to > invalid UIDs and GIDs)? rsync --numeric-ids Craig -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsyncd server daemon not allowing connections
[EMAIL PROTECTED] writes: > Gang, I've read the manual(s), surfed google, spent about 5 hours on this, > to no avail > > I'm trying to run rsync in server mode and it appears to start normally, > but it refuses all connections (refuses connection when I tried telnetting > in on localhost 873!). > > I've turned off all firewalls on this server (do I dare tell you guys > that?...), which is fine: it is on a local network. > > I used the following command: > > rsync --daemon --server --config-file=/etc/rsyncd.conf . > It responds normally: @RSYNC 28 You should not use the --server option. Craig -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: An idea: rsyncfs, an rsync-based real-time replicated filesystem
Interesting ideas. > I envision the "VFS Change Logger" as a (hopefully very thin) middle-ware > that sits between the kernel's VFS interfaces and a real filesystem, like > ext3, reiser, etc. The "VFS Change Logger" will pass VFS calls to the > underlying filesystem driver, but it will make note of certain types of > calls... If I understand your description correctly, inotify does something close to this (although I'm not sure where it sits relative to VFS); see: http://www.kernel.org/pub/linux/kernel/people/rml/inotify/ It provides the information via /dev/notify based on ioctl requests. I vaguely recall inotify doesn't report hardlinks created via link() (at least based on looking at the utility example). Inotify will drop events if the application doesn't read them fast enough. So a major part of the design is how to deal with that case, and of course the related problem of how to handle a cold start (maybe just run rsync, although on a live file system it is hard to know how much has changed since rsync checked/updated each file/directory). Perhaps you would have a background program that slowly reads (or re-writes the first byte) of each file, so that over time all the files get checked (although file deletions won't be mirrored). Craig -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Rsyncing really large files
Lars Karlslund writes: > Also the numbers speak for themselves, as the --whole-file option is > *way* faster than the block-copy method on our setup. At the risk of jumping into the middle of this thread without remembering everything that was discussed... Remember that by default rsync writes a new file and then renames that file. So a single byte change to a file requires a complete read and write (plus the earlier read to generate the block checksums). The --inplace option is more efficient in terms of disk IO, but the drawback is that blocks earlier in the original file cannot be matched. I haven't looked at the code, but I'm guessing --inplace still does byte-by-byte matching. An additional optimization for --inplace would be to only try to match on multiples of the block size. Also, the matching only proceeds byte-by-byte when there is no match. Once a match is found then the entire block is skipped. So on a file with few changes, the byte-by-byte matching doesn't slow things down very much. Craig -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync 2.6.3 hang (was rsync 2.6.2 crash)
"jim" writes: > Thanks for the additional info. > > I actually have tried the --no-blocking-io option, but the sync > still hung. > > Since no one on Unix-like platforms are reporting an issue, do you > think it may be something in the Cygwin compatibility layer? Yes, I think so. When I tried to debug this some time ago with rsync over ssh using cygwin I found that data that was flushed by one end's rsync never arrived at the other end: the other end was still blocked on select. I presume that somewhere between rsync/ssh/cygwin, and cygwin/ssh/rsync on the other end, some buffer was not getting flushed properly. I don't know anything about cygwin internals, so I didn't look at this further. > Interestingly (to me anyway,) is that I have encountered the > problem with syncing across the network using ssh, and syncing > locally, but not over the network rsync to rsync. Maybe it's > just a matter of time I've never seen this with rsync/cygwin <--> rsync/cygwin, only with ssh. I haven't tested local rsync on cygwin that much. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: cwrsync and CPU usage
"Jose Luis Poza" writes: > I have a "problem" witch cwsrync and a questions. Does cwrsync process > (rsync.exe) use 100% (more or less) CPU in Windows 2000 server witch a high > level of kernel usage ? > I have syncronized 11 servers (unix and windos) witch all their unit´s > files, that proccess during approach 17 hours (the proccess is make every > day). Is this time normal?. (A client makes all the request and store in > local the files). Are you using the latest rsync (version 2.6.2)? It runs a lot faster on cygwin. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: cwRsync, Windows-2000, use of 'auth users': not working .... shou ld it?
"GUZZI, ANTHONY" writes: > Without a 'auth users' entry for a module, the sync go fine. With an 'auth > users' entry, I'm getting the '@ERROR: auth failed on module ' error > message. Make sure your RSYNC-USERS.TXT file ends in a newline. Rsync prior to 2.6.2 ignores the last line in the file if it doesn't end in a newline. Next, try full paths for the secrets file. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync to Mac OS X question
"Chris Heller" writes: > I ran into a problem today when I tested the system for the first time. > I am rsyncing from a remote Linux host using the following options to > rsync: -avv --rsh="" --exclude-from= file> --delete. > > The problem is when the files are moved over to the Mac OS X server > their owner/group ids change. > > For instance if I copy ~heller/ (uid: 500 gid: 500) to the Mac it > becomes uid: 504 gid: 504. > > This isn't too big a problem, but it messes up security when I go to > export the data via NFS. > > >From the rsync man page I was under the impression that -a will preserve > owner, group permissions. By default rsync maps uid/gid values by user/group name at each end of the transfer. Use --numeric-ids to just send the uid/gid without mapping. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: HP-UX 11i and largefiles on rsync 2.6.2
"Steve Bonds" writes: > This is what I would expect to see if the VXFS filesystem was not created > with the "largefiles" option-- but it was. (And I double-checked.) Other > utilities (e.g. "dd") can create large files just fine. > > I haven't seen anything obviously wrong with write_file or > flush_write_file in fileio.c (v. 1.15). > > Do you know what is meant by the "process' file size limit"? I don't know specifically about HP-UX, but must *nix systems have ulimit. See the man page. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: HP-UX 11i and largefiles on rsync 2.6.2
"Don Malloy" writes: > I just tried the build from the nightly tar file: > rsync-HEAD-20040720-1929GMT.tar.gz > > It failed at 2144075776 bytes each time I tried. I've attached the tail from > the tusc again. Here it the output of the rsync: I haven't been following this thread, so I might be way off base. Are you sure your destination file system supports large files, and that the destination file system has enough room? Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsyncP
"Paul Arch" writes: > does anyone know if File::RsyncP will operate under activeperl (windows?) > > This module is maintained by Craig Barratt, who I noticed is also on this > list :) I haven't tested it under activeperl, but it does work under perl + cygwin on WinXX. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: [PATCH] Batch-mode rewrite
Chris Shoemaker writes: > Do you see any reason to keep FIXED_CHECKSUM_SEED around? It doesn't > hurt anthing, but I don't see a use for it. So long as the --checksum-seed=N option remains, I'm ok getting rid of FIXED_CHECKSUM_SEED. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: [Bug 1463] New: poor performance with large block size
Wally writes: > I apologize to Craig. Chris is correct. No problem. > I had been reading so many of Chris's highly intelligent e-mails... Same here. > But, the comment seems to have been right on. I have re-run the > experiment with block sizes as small as 3000 (yes it took a long > time to complete) all the way up to block sizes of 10 with it > working in reasonable times. But, when the block size approaches > 170,000 or so, the performance degrades exponentially. > > I understand that I am testing at the very fringes of what we should > expect rsync to do. File sizes of 25Gig and 55Gig are beyond what was > originally envisioned (based on 64k hash buckets and a sliding window > of 256k). Here's a patch to try. It basically ensures that the window is at least 16 times the block size. Before I'd endorse this patch for CVS we need to make sure there aren't cases where map_ptr is called with a much bigger length, making the 16x a bit excessive. Perhaps I would be tempted to repeat the previous check that the window start plus the window size doesn't exceed the file length, although it must be at least offset + len - window_start as in the original code. In any case, I'd be curious if this fixes the problem. Craig --- rsync-2.6.2/fileio.cSun Jan 4 19:57:15 2004 +++ ../rsync-2.6.2/fileio.c Thu Jun 17 19:33:26 2004 @@ -193,8 +193,8 @@ if (window_start + window_size > map->file_size) { window_size = map->file_size - window_start; } - if (offset + len > window_start + window_size) { - window_size = (offset+len) - window_start; + if (offset + 16 * len > window_start + window_size) { + window_size = (offset + 16 * len) - window_start; } /* make sure we have allocated enough memory for the window */ -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: stalling during delta processing
"Wallace Matthews" writes: > I copy the 29 Gig full backup back into fedor//test/Kibbutz and issue > the command "time rsync -avv --rsh=rsh --stats --block-size=181272 > /test/Kibbutz/Kbup_1 fedor://test/Kibbutz" and it CRAWLS during delta > generation/transmittal at about 1 Megabyte per second. > > I have repeated the experiment 3 times; same result each time. > > The only thing that is different is --block-size= option. First, > time it isnt specified and I get a predictable answer. Second > time, I give it a block size that is about 1/2 of square root of > (29 Gig) and that is ok. But, explicitly give it something that > is approximately the square root of the 29 Gig and it CRAWLS. > > When I cancel the command, the real time is 86 minutes and the > user time is 84 minutes. This is similar to the issue I reported > on Friday that Chris suggested I remove the --write-batch= option > and that seemed to fix the CRAWL. If I understand the code correctly, map_ptr() in filio.c maintains a sliding window of data in memory. The window starts 64K prior to the desired offset, and the window length is 256K. So your block-size of 181272 occupies most of the balance of the window. Each time you hit the end of the window the data is memmoved and the balance needed is read. With such a large block size there will be a lot of memmoves and small reads. I doubt this issue explains the dramatic reduction in speed, but it might be a factor. Perhaps there is a bug with large block sizes? And, yes, your observation about the number of matching blocks needs to be explored. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: 2.6.2 not displaying permissions errors on client side
Wayne Davison writes: > On Sun, May 09, 2004 at 03:35:47AM -0700, Robert Helmer wrote: > > If there is an error writing to the remote file due to a "permission > > denied" error, rsync 2.6.1's client exits with an error code of 23, and > > an informative error message. > > ... and no error message logged in the server's log file. > > Rsync has historically been hesitent to return error messages from a > server to the client for fear of revealing too much information. The > 2.6.0 and 2.6.1 releases were returning error messages but failing to > log them in the server's log file. The 2.6.2 release reverts back to > the historical way this was handled. > > A better solution for the future would be to log all errors to the > server log and send some/most of them to the user as well. However, > that will be a complex change, and it has not been worked on yet. > > A simpler solution would be to duplicate ALL the messages (the lack of > selectivity make this change easy). The appended patch should do this, > if you so desire to go that route. Thanks for the patch. I strongly vote for rsync errors being delivered to the client and would like to see this default again in the next version (perhaps with a command-line switch if necessary?). In my case with BackupPC it emulates an rsync client and it needs to see client errors so it can log and count them, and for client read errors it needs to remove the bad file which is otherwise zero-filled (this happens with locked files on cygwin/WinXX). I saw your patch that returns a bad file checksum in the case of read errors. The drawback is the bad file requires two passes, since it will fail on both passes, but retrying the file is probably worthwhile in case the failure was intermittent. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: cwRync and Windows permissions
[EMAIL PROTECTED] writes: > Have a look at > > http://www.itefix.no/phpws/index.php?module=faq&FAQ_op=view&FAQ_id=12 > > In short : > > Right click My Computer Go to Properties Go to the Advanced Tab Click > Environment Variables In the bottom section (System variables), add the > new entry: CYGWIN, with value nontsec Restart the rsync service Make sure > the folders you are uploading to have the permissions you want the files > to inherit. Doing this, Ive found the uploaded files get the correct > permissions. It works for me too. Thanks. An alternative to the system-wide variable is to add --env CYGWIN=nontsec to the cygrunsrv command-line when you install rsyncd as a service, eg: cygrunsrv -I rsyncd --env CYGWIN=nontsec -p c:/cygwin/bin/rsync.exe -a "--config=/etc/rsyncd.conf --daemon --no-detach" Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]
Alberto Accomazzi writes: > What I'm referring to are those options that a client passes to the > server which influence file selection, checksum and block generation. I > haven't looked at the rsync source code in quite a while, but off the > top of my head here are the issues to look at when considering caching a > filesystem scan: > > 1. Exclude/include patterns: > -C, --cvs-exclude auto ignore files in the same way CVS does > --exclude=PATTERN exclude files matching PATTERN > --exclude-from=FILE exclude patterns listed in FILE > --include=PATTERN don't exclude files matching PATTERN > --include-from=FILE don't exclude patterns listed in FILE > --files-from=FILE read FILE for list of source-file names > > These should be easy to deal with: I would simply have the cache creator > ignore any --exclude options passed by the client (but probably honor > the ones defined in a daemon config file). > > 2. Other file selection options: > -x, --one-file-system don't cross filesystem boundaries > -S, --sparsehandle sparse files efficiently > -l, --links copy symlinks as symlinks > -L, --copy-linkscopy the referent of all symlinks > --copy-unsafe-links copy the referent of "unsafe" symlinks > --safe-linksignore "unsafe" symlinks > > It's possible that these can also be dealt with easily, but I'm not so > sure. Clearly -x influences what gets scanned, so how do you decide > what to cache? The other options are probably easier to deal with. > > 3. File checksums: > -c, --checksum always checksum > > Should the caching operation always checksum so that the checksums are > readily available when a client sets -c? This can lead to a lot of > computations and disk IO which may be unnecessary if the clients do not > use this option. > > 4. Block checksums: > -B, --block-size=SIZE checksum blocking size (default 700) > > It would be great if we could cache the rolling block checksums as they > are computed but this may be even harder (or impossible) to deal with. > And it looks like soon we'll have a new checksum-seed option which will > further complicate the issue (in fact I admit I have no idea about how > all of this works beyond versions 2.5.x; maybe somebody with more > knowledge on the subject will chime in). In fact, the checksum-seed option is critical to any scheme that caches the file list (with -C) or caches the block checksums. Without the checksum-seed option you will get a different checksum seed each time you run rsync more than a second apart (since the checksum seed defaults to time()). This means the whole-file and block checksums change every time. This is the reason the batch mode options force the checksum seed to a fixed value. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Fwd: Re: setting checksum_seed
Wayne Davison writes: > On Sat, May 15, 2004 at 02:25:11PM -0700, Craig Barratt wrote: > > Any feedback on this patch and the possibility of getting it > > into CVS or the patches directory? > > The file checksum-seed.diff was put into the patches dir on the 2nd of > May. Strangely, I don't seem to have sent any email indicating this > (my apologies about that). > > I think that this patch is a good candidate to go into the next > release. Unfortunately the checksum-seed.diff patch breaks authentication in rsyncd. The problem is that when you specify --checksum-seed=N on the client when connecting to an rsyncd server, the authentication response is based on an MD4 digest computed by calling sum_init(), sum_update() and sum_end(). sum_init() adds checksum_seed to the digest data. The problem at this point is the args have not been sent to the server (that happens after authentication), so the client has checksum_seed=N and the server still has checksum_seed=0, so authentication fails. Probably the best solution is to add a flag argument to sum_init(void) to request whether to add checksum_seed or not. authenticate.c calls sum_init(0) in two places, and match.c and receiver.c call sum_init(1). Other alternatives of adding a second sum_init_nochecksumseed() function or saving/restoring checksum_seed in authenticate.c seem ugly. If you agree with this fix I will have happy to submit a new patch in the next few days. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Fwd: Re: setting checksum_seed
Wayne Davison writes: > On Sat, May 15, 2004 at 02:25:11PM -0700, Craig Barratt wrote: > > Any feedback on this patch and the possibility of getting it > > into CVS or the patches directory? > > The file checksum-seed.diff was put into the patches dir on the 2nd of > May. Strangely, I don't seem to have sent any email indicating this > (my apologies about that). ...and my apologies for not checking CVS before I sent my email. > I think that this patch is a good candidate to go into the next > release. Thanks again! Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Fwd: Re: setting checksum_seed
Any feedback on this patch and the possibility of getting it into CVS or the patches directory? Thanks, Craig -- Forwarded message -- To: jw schultz <[EMAIL PROTECTED]> From: Craig Barratt <[EMAIL PROTECTED]> cc: [EMAIL PROTECTED] Date: Sat, 01 May 2004 17:06:10 -0700 Subject: Re: setting checksum_seed jw schultz writes: > > > There was some talk last year about adding a --fixed-checksum-seed > > > option, but no consensus was reached. It shouldn't hurt to make the > > > seed value constant for certain applications, though, so you can feel > > > free to proceed in that direction for what you're doing for your client. > > > > > > FYI, I just checked in some changes to the checksum_seed code that will > > > make it easier to have other options (besides the batch ones) specify > > > that a constant seed value is needed. > > > > I would really like a --fixed-csumseed option become a standard > > feature in rsync. Just using the batch value (32761) is fine. > > Can I contribute a patch? The reason I want this is the next > > release of BackupPC will support rsync checksum caching, so that > > backups don't need to recompute block or file checksums. This > > requires a fixed checksum seed on the remote rsync, hence the > > need for --fixed-csumseed. I've included this feature in a > > pre-built rsync for cygwin that I include on the SourceForge > > BackupPC downloads. > > 1. Yes, you may contribute a patch. I favor the idea of > being able to supply a checksum seed. > > 2. Lets get the option name down to a more reasonable > length. --checksum-seed should be sufficient. I submitted a patch in Feb 2004 to add a --fixedcsum-seed option (which only sets checksum_seed to 32761, the batch file value): http://lists.samba.org/archive/rsync/2004-February/008616.html Earlier, I submitted a patch (against 2.5.6pre1 in Jan 2003) for --checksum-seed=NUM: http://lists.samba.org/archive/rsync/2003-January/004845.html Since I posted both of these patches, there was an interesting thread started by Eran Tromer about potential block checksum collisions that could be exploited by someone to trigger first-pass failures. See: http://lists.samba.org/archive/rsync/2004-March/008821.html The consequence is just a performance penalty, since with very high probability the whole-file checksum fails, triggering the second pass with the full checksum size, which will succeed. Eran recommended that checksum_seed be more random than time(). BackupPC now supports rsync checksum caching, so I would really like an rsync command-line option to set the checksum_seed. Based on the thread started by Eran I am reverting to the --checksum-seed=NUM form, since this allows paranoid users to pick their own random value should they wish to avoid the issue raised by Eran, plus it also allows my BackupPC users to specify a fixed value so that caching is useful (subject to the same caveats raised by Eran). Here's a new patch against rsync-2.6.2. JW's earlier changes have simplified this patch. Could this be applied to CVS, or at a minimum added to the patches directory? Note: the patch does not allow the case of --checksum-seed=0, since the code in compat.c replaces the value 0 with time(0). I don't think it is necessary to support this case (which means disable adding the seed to the MD4 digests). If people feel strongly about this I can also support the case --checksum-seed=0, although it will make the code a little uglier (we'll need another global variable). Thanks, Craig --- options.c 2004-04-17 10:07:23.0 -0700 +++ options.c 2004-05-01 16:24:44.380672000 -0700 @@ -290,6 +290,7 @@ rprintf(F," --bwlimit=KBPS limit I/O bandwidth, KBytes per second\n"); rprintf(F," --write-batch=PREFIXwrite batch fileset starting with PREFIX\n"); rprintf(F," --read-batch=PREFIX read batch fileset starting with PREFIX\n"); + rprintf(F," --checksum-seed=NUM set block/file checksum seed\n"); rprintf(F," -h, --help show this help screen\n"); #ifdef INET6 rprintf(F," -4 prefer IPv4\n"); @@ -386,6 +387,7 @@ {"from0", '0', POPT_ARG_NONE, &eol_nulls, 0, 0, 0}, {"no-implied-dirs", 0, POPT_ARG_VAL,&implied_dirs, 0, 0, 0 }, {"protocol", 0, POPT_ARG_INT,&protocol_version, 0, 0, 0 }, + {"checksum-seed",0, POPT_ARG_INT,&checksum_seed, 0, 0, 0 }, #ifdef INET6 {0,'4', POPT_ARG_VAL,&default_af_hint, AF_INET, 0, 0 }, {0,'6', POPT_ARG_VAL,&default_af_hint, AF_INET6, 0,
Re: setting checksum_seed
jw schultz writes: > > > There was some talk last year about adding a --fixed-checksum-seed > > > option, but no consensus was reached. It shouldn't hurt to make the > > > seed value constant for certain applications, though, so you can feel > > > free to proceed in that direction for what you're doing for your client. > > > > > > FYI, I just checked in some changes to the checksum_seed code that will > > > make it easier to have other options (besides the batch ones) specify > > > that a constant seed value is needed. > > > > I would really like a --fixed-csumseed option become a standard > > feature in rsync. Just using the batch value (32761) is fine. > > Can I contribute a patch? The reason I want this is the next > > release of BackupPC will support rsync checksum caching, so that > > backups don't need to recompute block or file checksums. This > > requires a fixed checksum seed on the remote rsync, hence the > > need for --fixed-csumseed. I've included this feature in a > > pre-built rsync for cygwin that I include on the SourceForge > > BackupPC downloads. > > 1. Yes, you may contribute a patch. I favor the idea of > being able to supply a checksum seed. > > 2. Lets get the option name down to a more reasonable > length. --checksum-seed should be sufficient. I submitted a patch in Feb 2004 to add a --fixedcsum-seed option (which only sets checksum_seed to 32761, the batch file value): http://lists.samba.org/archive/rsync/2004-February/008616.html Earlier, I submitted a patch (against 2.5.6pre1 in Jan 2003) for --checksum-seed=NUM: http://lists.samba.org/archive/rsync/2003-January/004845.html Since I posted both of these patches, there was an interesting thread started by Eran Tromer about potential block checksum collisions that could be exploited by someone to trigger first-pass failures. See: http://lists.samba.org/archive/rsync/2004-March/008821.html The consequence is just a performance penalty, since with very high probability the whole-file checksum fails, triggering the second pass with the full checksum size, which will succeed. Eran recommended that checksum_seed be more random than time(). BackupPC now supports rsync checksum caching, so I would really like an rsync command-line option to set the checksum_seed. Based on the thread started by Eran I am reverting to the --checksum-seed=NUM form, since this allows paranoid users to pick their own random value should they wish to avoid the issue raised by Eran, plus it also allows my BackupPC users to specify a fixed value so that caching is useful (subject to the same caveats raised by Eran). Here's a new patch against rsync-2.6.2. JW's earlier changes have simplified this patch. Could this be applied to CVS, or at a minimum added to the patches directory? Note: the patch does not allow the case of --checksum-seed=0, since the code in compat.c replaces the value 0 with time(0). I don't think it is necessary to support this case (which means disable adding the seed to the MD4 digests). If people feel strongly about this I can also support the case --checksum-seed=0, although it will make the code a little uglier (we'll need another global variable). Thanks, Craig --- options.c 2004-04-17 10:07:23.0 -0700 +++ options.c 2004-05-01 16:24:44.380672000 -0700 @@ -290,6 +290,7 @@ rprintf(F," --bwlimit=KBPS limit I/O bandwidth, KBytes per second\n"); rprintf(F," --write-batch=PREFIXwrite batch fileset starting with PREFIX\n"); rprintf(F," --read-batch=PREFIX read batch fileset starting with PREFIX\n"); + rprintf(F," --checksum-seed=NUM set block/file checksum seed\n"); rprintf(F," -h, --help show this help screen\n"); #ifdef INET6 rprintf(F," -4 prefer IPv4\n"); @@ -386,6 +387,7 @@ {"from0", '0', POPT_ARG_NONE, &eol_nulls, 0, 0, 0}, {"no-implied-dirs", 0, POPT_ARG_VAL,&implied_dirs, 0, 0, 0 }, {"protocol", 0, POPT_ARG_INT,&protocol_version, 0, 0, 0 }, + {"checksum-seed",0, POPT_ARG_INT,&checksum_seed, 0, 0, 0 }, #ifdef INET6 {0,'4', POPT_ARG_VAL,&default_af_hint, AF_INET, 0, 0 }, {0,'6', POPT_ARG_VAL,&default_af_hint, AF_INET6, 0, 0 }, @@ -911,6 +913,11 @@ goto oom; args[ac++] = arg; } + if (checksum_seed) { + if (asprintf(&arg, "--checksum_seed=%d", checksum_seed) < 0) + goto oom; + args[ac++] = arg; + } if (keep_partial) args[ac++] = "--partial"; --- rsync.yo2004-04-30 11:02:43.0 -0700 +++ rsync.yo2004-05-01 16:59:48.546313600 -0700 @@ -348,6 +348,7 @@ --bwlimit=KBPS limit I/O bandwidth, KBytes per second --write-batch=PREFIXwrite batch fileset starting with PREFIX --read-batch=PREFIX read batch fileset starting with PRE
Re: Rsync hangs with XP machine used as sshd server
Agostino Russo writes: > I have a problem with rsync 2.6 protocol 27 (both client and server) > running over XP via Cygwin and sshd (on remote machine). It just hangs > almost randomly while transfering files after transfering a few > megabytes, not always on the same file. When the remote machine is a > Linux server I have no problems, bad news is I need also to rsync from > XP... I found the problem mentioned somewhere else doing a google > serach, see for instance: > > http://www.cygwin.com/ml/cygwin/2003-08/msg01065.html > > Which seems to be exactly the same issue, see the link for more info > about the situation. Unfortunately I wasn't able to find a solution so > far on the web. I was hoping that somebody here knows a way around this > problem (other than dropping XP :-) I've never found rsync + ssh + cygwin to be reliable; your symptoms are the same as mine. I recommend using rsync in daemon mode with cygwin. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Failed rsync -- two different files considered up to date
Greger Cronquist writes: > I've used rsync successfully for several years, syncing between two > Windows 2000 servers using daemon mode, but today I stumbled accross > something peculiar. I'm using cygwin with rsync 2.6.0 at both ends (the > latest available at this date) and I have a file that rsync considers up > to date even though both the md5 and a normal diff show differences. > I've tried calling rsync with several different options, most notably -c > for forcing checksum, but it fails to see a difference between the files. > > Are there any things I should try or information that I can include? All > -vvv gives me is "uptodate". How about -I (--ignore-times)? Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Speed up rsync ,cwRsync and replay changes against a file
> I recently installed and setup cwRsync on a Windows 2000 Server - > http://www.itefix.no/cwrsync/ -, and I was very impressed. I just > followed the instructions on the website and got it working.=20 > > I am using it to mirror 30Gb's of mailboxes everynight (only grabbing > the changes to each file), from a Windows 2000 box to a Linux box (RH9). > > The nightly replication takes approximately 8 hrs to complete, but the > actual size of the mailbox directory only increases by about 120Mb a > day. There are 750 mailboxes and each mailbox is between 50 and 200Mb in > size. > > I am using the following command line options: > > rsync -avz hostname::MailBoxes /mailboxreplica > > Can anyone recommend ways to speed this up - is there some extra > compression I can use, or a kind of "quick checksum" option that I could > use? If you are on a fast network, -z will probably slow you down. Rsync + cygwin is typically slow due to the system call overhead in cygwin. There is a performance patch (patches/craigb-perf.diff) included with the 2.5.6, 2.5.7 and 2.6.0 releases that makes a measurable improvement. This patch is now in CVS. So you should build rsync from releases sources after applying the patches/craigb-perf.diff patch (or build from CVS). Or you can try a pre-built executable with the patch, like the cygwin-rsyncd package at http://backuppc.sourceforge.net. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Not Again! (Was: Re: FAQ: Moving files between two machines using rsync)
Mauricio writes: > I can't believe this! I am having the very same problem I > had before. For those who do not remember, I was trying to rsync a > file from a Solaris 9 box(kushana) to a netbsd 1.6.1 (the rsync > server, katri) box, without much luck: > > [EMAIL PROTECTED]>rsync -vz \ > ? --password-file=/export/home/raub/nogo \ > ? /export/home/raub/sync-me \ > ? [EMAIL PROTECTED]::tmp > NetBSD 1.6.1 (GENERIC) #0: Tue Apr 8 21:00:42 UTC 2003 > > Welcome to NetBSD! > > @ERROR: auth failed on module tmp > rsync: connection unexpectedly closed (164 bytes read so far) > rsync error: error in rsync protocol data stream (code 12) at io.c(165) > [EMAIL PROTECTED]> One other thing to check is that the /etc/rsyncd.secrets file ends in a newline. The last entry will be ignored if that line doesn't end with a newline. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: [patch] Add `--link-by-hash' option (rev 2).
"Jason M. Felice" writes: > This patch adds the --link-by-hash=DIR option, which hard links received > files in a link farm arranged by MD4 file hash. The result is that the system > will only store one copy of the unique contents of each file, regardless of > the file's name. > > (rev 2) > * This revision is actually against CVS HEAD (I didn't realize I was working > from a stale rsync'd CVS). > * Apply permissions after linking (permissions were lost if we already had > a copy of the file in the link farm). I haven't studied your patch, but I have a couple of comments/questions: - If you update permissions, then all hardlinks will change too. Does that mean that all instances of an identical file will get the last mtime/permissions/ownership? Or does the link farm have unique entries for contents plus meta data (vs just contents)? - Some file systems have a hardlink limit of 32000. You will need to roll to a new file when that limit is exceeded (ie: link() fails). Also, empty files tend to be quite prevalent, so it is probably easier to just create those files and not link them (should be no difference in disk usage). - How does this patch interact with -H? Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: checksum_seed
jw schultz writes: > 1. Yes, you may contribute a patch. I favor the idea of > being able to supply a checksum seed. > > 2. Lets get the option name down to a more reasonable > length. --checksum-seed should be sufficient. I submitted a patch against 2.5.6pre1 last January for --checksum-seed=NUM: http://lists.samba.org/archive/rsync/2003-January/004845.html but in that thread Dave Dykstra correctly pointed out there wasn't much point in letting the user specify a particular value. Therefore, I switched to just a flag that forces the fixed value of 32761 (same as batch mode). I picked the option named --fixed-csumseed, which is long but hopefully informative. Here's a patch against CVS using --fixed-csumseed. I also added it to the usage and documentation, but it's not clear this option needs to be exposed to the user. Craig diff -bur rsync/options.c rsync-fixedcsum/options.c --- rsync/options.c Tue Feb 10 20:30:41 2004 +++ rsync-fixedcsum/options.c Mon Feb 16 12:32:23 2004 @@ -89,6 +89,7 @@ int modify_window = 0; int blocking_io = -1; int checksum_seed = 0; +int fixed_csumseed = 0; unsigned int block_size = 0; @@ -288,6 +289,7 @@ rprintf(F," --bwlimit=KBPS limit I/O bandwidth, KBytes per second\n"); rprintf(F," --write-batch=PREFIXwrite batch fileset starting with PREFIX\n"); rprintf(F," --read-batch=PREFIX read batch fileset starting with PREFIX\n"); + rprintf(F," --fixed-csumseeduse fixed MD4 block/file checksum seed\n"); rprintf(F," -h, --help show this help screen\n"); #ifdef INET6 rprintf(F," -4 prefer IPv4\n"); @@ -303,7 +305,7 @@ enum {OPT_VERSION = 1000, OPT_SENDER, OPT_EXCLUDE, OPT_EXCLUDE_FROM, OPT_DELETE_AFTER, OPT_DELETE_EXCLUDED, OPT_LINK_DEST, OPT_INCLUDE, OPT_INCLUDE_FROM, OPT_MODIFY_WINDOW, - OPT_READ_BATCH, OPT_WRITE_BATCH}; + OPT_READ_BATCH, OPT_WRITE_BATCH, OPT_FIXED_CSUMSEED}; static struct poptOption long_options[] = { /* longName, shortName, argInfo, argPtr, value, descrip, argDesc */ @@ -379,6 +381,7 @@ {"hard-links", 'H', POPT_ARG_NONE, &preserve_hard_links, 0, 0, 0 }, {"read-batch", 0, POPT_ARG_STRING, &batch_prefix, OPT_READ_BATCH, 0, 0 }, {"write-batch", 0, POPT_ARG_STRING, &batch_prefix, OPT_WRITE_BATCH, 0, 0 }, + {"fixed-csumseed", 0, POPT_ARG_NONE, 0, OPT_FIXED_CSUMSEED, 0, 0 }, {"files-from", 0, POPT_ARG_STRING, &files_from, 0, 0, 0 }, {"from0", '0', POPT_ARG_NONE, &eol_nulls, 0, 0, 0}, {"no-implied-dirs", 0, POPT_ARG_VAL,&implied_dirs, 0, 0, 0 }, @@ -564,6 +567,11 @@ checksum_seed = FIXED_CHECKSUM_SEED; break; + case OPT_FIXED_CSUMSEED: + fixed_csumseed = 1; + checksum_seed = FIXED_CHECKSUM_SEED; + break; + case OPT_LINK_DEST: #if HAVE_LINK compare_dest = (char *)poptGetOptArg(pc); @@ -931,6 +939,10 @@ args[ac++] = "--files-from=-"; args[ac++] = "--from0"; } + } + + if (fixed_csumseed) { + args[ac++] = "--fixed-csumseed"; } *argc = ac; diff -bur rsync/rsync.yo rsync-fixedcsum/rsync.yo --- rsync/rsync.yo Mon Feb 2 10:23:09 2004 +++ rsync-fixedcsum/rsync.yoMon Feb 16 12:36:08 2004 @@ -348,6 +348,7 @@ --bwlimit=KBPS limit I/O bandwidth, KBytes per second --write-batch=PREFIXwrite batch fileset starting with PREFIX --read-batch=PREFIX read batch fileset starting with PREFIX + --fixed-csumseeduse fixed MD4 block/file checksum seed -h, --help show this help screen @@ -879,6 +880,15 @@ dit(bf(--read-batch=PREFIX)) Apply a previously generated change batch, using the fileset whose filenames start with PREFIX. See the "BATCH MODE" section for details. + +dit(bf(--fixed-csumseed)) Set the MD4 checksum seed to the fixed +value 32761. This 4 byte checksum seed is included in each block and +file MD4 checksum calculation. By default the checksum seed is generated +by the server and defaults to the current time(), or 32761 if +bf(--write-batch) or bf(--read-batch) are specified. This default +causes the MD4 block and file checksums to be different each time rsync +is run. For applications that cache the block or file checksums the +checksum seed needs to be fixed each time rsync runs using this option. enddit() -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: checksum_seed
On Mon, Feb 09, 2004 at 09:14:06AM -0500, Jason M. Felice wrote: > I got the go-ahead from the client on my --link-by-hash proposal, and > the seed is making the hash unstable. I can't figure out why the seed > is there so I don't know whether to cirumvent it in my particular case > or calculate a separate, stable hash. I believe the checksum seed is meant to reduce the chance that different data could repeatedly produce the same md4 digest over multiple runs. If a collision happens the hope is that a different checksum seed will break the collision. However, my guess is that it doesn't make any difference. Certainly adding the seed at the end of the block won't change a collision even if the seed changes over multiple runs. File MD4 checksums add the seed at the beginning, which might help breaking collisions, although I'm not sure. Wayne Davison writes: > There was some talk last year about adding a --fixed-checksum-seed > option, but no consensus was reached. It shouldn't hurt to make the > seed value constant for certain applications, though, so you can feel > free to proceed in that direction for what you're doing for your client. > > FYI, I just checked in some changes to the checksum_seed code that will > make it easier to have other options (besides the batch ones) specify > that a constant seed value is needed. I would really like a --fixed-csumseed option become a standard feature in rsync. Just using the batch value (32761) is fine. Can I contribute a patch? The reason I want this is the next release of BackupPC will support rsync checksum caching, so that backups don't need to recompute block or file checksums. This requires a fixed checksum seed on the remote rsync, hence the need for --fixed-csumseed. I've included this feature in a pre-built rsync for cygwin that I include on the SourceForge BackupPC downloads. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
BackupPC 2.0.0beta0 released - now supports rsync
I just released version 2.0.0beta0 of BackupPC on SourceForge, see http://backuppc.sourceforge.net/ What is BackupPC? It is an enterprise-grade open-source package for backing up WinXX and *nix systems to disk. It supports transport via SMB, tar and now rsync over rsh/ssh and rsyncd. The backend features hard-linking of any identical files (not just files with the same name) and compression, giving a 6x to 10x reduction in disk storage. It also has a comprehensive web (CGI) interface. The rsync support in BackupPC is based on File::RsyncP, a perl rsync client; see http://perlrsync.sourceforge.net. A future version of BackupPC will also support block and file checksum caching for additional performance. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync vs. rcp
> I wasn't aware that it had this. Was it there at the time of the > original discussion (Oct 2002)? The people involved in the discussion > then didn't seem to know this. I wasn't aware of it in Oct 2002 during that discussion. I saw it in the code a month or two after that. I haven't checked the history, but it is definitely there in 2.5.5. > However, it's not really adequate. A 16K block size only really works > for files up to about 500M. Still... that's a lot better than I thought > it was at the time. Agreed. Checksum length matters a lot more than block size, as you pointed out in your earlier analysis. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync vs. rcp
> RSYNC DOES NOT WORK WITH 1GB+ FILES... unless you have a sufficiently > large block size. See the following; > > http://www.mail-archive.com/rsync@lists.samba.org/msg05219.html Let's be careful here. Rsync *does* work on 1GB+ files. What you probably meant to say was that rsync's first pass might fail on files that have 1GB+ of changes. But the second pass (which uses the full MD4 block checksum) will work correctly. So a more correct statement is that Rsync might work *slowly* on files with 1GB+ of changes because two passes are required. BTW, rsync already has an adaptive block size unless you set it on the command-line. The block size is roughly min(16384, max(700, file_size/1)) ie: file_size/1, but no less than 700 and no more than 16384. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Fast Cygwin binaries ?
> I read in the archives that somebody has a faster binary version floating > around. How might I get ahold of it? (If you have it, would it be possible > to e-mail me a copy?) Fetch 2.5.6 and apply the patch in patches/craigb-perf.diff before you build it. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync in cygwin as service
> Certanly, I tried --config > Could you tell me which rsync version do you use? rsync 2.5.5 and rsync 2.5.6 both work fine for me. Is it possible that rsync is already running as a service? It won't show up in cygwin's ps. For example, when rsync is running via cygrunsrv, if I type: rsync --daemon it exits with no error, but ps shows no process. But rsync is indeed running, eg: tcsh 438% telnet localhost 873 Trying 127.0.0.1... Connected to . Escape character is '^]'. @RSYNCD: 26 quit @ERROR: protocol startup error Connection closed by foreign host. You can also see rsync.exe in the windows task manager. You could also try a different port number to see if there is someone else on 873: craigslt 461% ps aux | egrep rsync craigslt 462% rsync --daemon --port=1234 craigslt 463% ps aux | egrep rsync 4020 14020 4020? 1005 23:29:08 /bin/rsync Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: duplicated file removal: call for comment
> This problem may be discussed now, because in versions before > rsync-2.5.6, the algorithm for removing the so called "duplicated files" > was broken. > That's why we expect nobody used it anyway in earlier versions - but who > knows.. I agree it should be the last argument that wins, but as Wayne points out your code and 2.5.6 have unpredictable behavior since qsort() could return identical names in any order. Another concern I have about this fix in 2.5.6 is that there is risk the change is not backward compatible with earlier protocol versions. The file list is sent (unsorted and uncleaned) from the sender to the receiver, and each side then sorts and cleans the list. Since the duplicate removal changed in 2.5.6, but the protocol number didn't change, it is possible that with duplicates the file lists are no longer identical. Specifically, with three or more duplicates, 2.5.5 and earlier will remove the even ones, while 2.5.6 correctly removes all but the first. Remember that the files are referred to as an integer index into the sorted file list, and the receiver skips NULL (duplicate) files. I suspect (but haven't checked) that if a 2.5.5 receiver is talking to a 2.5.6 sender then 2.5.5 will send the index for the 3rd file, which will be null_file on 2.5.6. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync in cygwin as service
> If I try to start rsync from command line it simply do nothig: > > $ rsync --daemon > > Administrator@dm-w2ks /usr/bin > > $ ps >PIDPPIDPGID WINPID TTY UIDSTIME COMMAND >480 1 480480 con 500 04:15:03 /usr/bin/bash > 1428 4801428 1420 con 500 05:26:46 /usr/bin/ps > > Administrator@dm-w2ks /usr/bin > > So I'm trying to set it as service: > > C:\cygwin\bin>cygrunsrv -I "RSYNC" -d "Rsync" -p /bin/rsync.exe -a > "--daemon --n o-detach" I've found on cygwin that I need to explicitly tell it where the config file is, both on the command line and with cygrunsrv. I haven't investigated; perhaps the platform default is some other file. These commands work for me: rsync --config=/etc/rsyncd.conf --daemon and cygrunsrv -I "RSYNC" -p /bin/rsync.exe -a '--config=/etc/rsyncd.conf --daemon --no-detach' Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync 1tb+ each day
> I am rsyncing 1tb of data each day. I am finding in my testing that > actually removing the target files each day then rsyncing is faster than > doing a compare of the source->target files then rsyncing over the delta > blocks. This is because we have a fast link between the two boxes, and > that are disk is fairly slow. I am finding that the creation of the temp > file (the 'dot file') is actually the slowest part of the operation. > This has to be done for each file because the timestamp and at least a > couple blocks are guaranteed to have changed (oracle files). How big are the individual files? If they are bigger than 1-2GB then it is possible rsync is failing on the first pass and repeating the file. You should be able to see from the output of -vv (you will see a message like "redoing fileName (nnn)"). The reason for this is that the first-pass block checksum (32 bits Adler + 16 bits of MD4) is too small for large files. There was a long thread about this a few months ago. The first message was from Terry Reed around mid Oct 2002 ("Problem with checksum failing on large files"). In any case, as your already note, if the network is fast and the disk is slow then copying the files will be faster. Rsync on the receiving side reads each file 1-2 times and writes each file once, while copying just requires a write on the receiving side. Another comment: rsync doesn't buffer its writes, so each write is a block (as little as 700 bytes, or up to 16K for big files). Buffering the writes might help. There is an optional buffering patch (patches/craigb-perf.diff) included with rsync 2.5.6 that improves the write buffering, plus other I/O buffering. That might improve the write performance, althought so far significant improvements have only been seen on cygwin. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync in-place (was Re: rsync 1tb+ each day)
> > Is it possible to tell rsync to update the blocks of the target file=20 > > 'in-place' without creating the temp file (the 'dot file')? I can=20 > > guarantee that no other operations are being performed on the file at=20 > > the same time. The docs don't seem to indicate such an option. > > No, it's not possible, and making it possible would require a deep > and fundamental redesign and re-implementation of rsync; the result > wouldn't resemble the current program much. I disagree. An --inplace option wouldn't be too hard to implement. The trick is that when --inplace is specified the block matching algorithm (on the sender) would only match blocks at or after that block's location (on the receiver). No protocol change is required. The receiver can then operate in-place since no matching blocks are earlier in the file. This could be relaxed to allow a fixed number of earlier blocks, based on the knowledge the receiver will buffer reads. But that is more risky. Caveat user: if you specify --inplace and the source file has a single byte added to the beginning then the entire file will be sent as literal data. Of course, a major issue with --inplace is that the file will be in an intermediate state if rsync is killed mid-transfer. Rsync currently ensures that every file is either the original or new. Another independent optimization would be to do lazy writes. Currently, if you specify -I (--ignore-times) the output file is written (to a tmp file and then renamed) even if the contents are identical. Instead, creation of the tmp file could be delayed until the output file is known to be different. This is detected either by an out-of-sequence block number from the sender, or any literal data. If the file contains only in-sequence block numbers and no literal data, then there is no need to write anything. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Proposal that we now create two branches - 2_5 and head
> I have several patches that I'm planning to check in soon (I'm waiting > to see if we have any post-release tweaking to and/or branching to do). > This list is off the top of my head, but I think it is complete: And I have several things I would like to work on and submit: - Fix the MD4 block and file checksums to comply with the rfc (currently MD4 is wrong for blocks of size 64*n, or files longer than 512MB). - Adaptive first pass checksum lengths: use 3 or more bytes of the MD4 block checksum for big files (instead of 2). This is to avoid almost certain first pass failures on very large files. (The block-size is already adaptive, increasing up to 16K for large files.) - Resubmit my --fixed-checksum-seed patch for consideration for 2.6.x. - Resubmit my buffering/performance patch for consideration for 2.6.x. - For --hard-links it is only necessary to send for files that have at least 2 links. Currently is sent for every file when the file list is sent. In a typical *nix file system only a very small percentage of files have at least 2 links. Unfortunately all the bits in the flag byte are used, so another flag byte (to indicate whether is present) would be necessary with --hard-links (unless someone has a better idea). This would save sending up to 7 bytes per file (or actually as many as 23 bytes per file for 64 bit ). Except for the last, all these items were discussed in this group over the last few months. The first two items and last item require a bump in the protocol number, so I would like to include all of them together. But before I work on these I would like to make sure there is interest in including them. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Incremental transfers: how to tell?
> James Kilton wrote: > > To follow up on this... I found the --stats option and > > here's what I'm getting: > > > > Number of files: 36 > > Number of files transferred: 36 > > Total file size: 10200816 bytes > > Total transferred file size: 10200816 bytes > > Literal data: 10200816 bytes > > Matched data: 0 bytes > > File list size: 576 > > Total bytes written: 10203996 > > Total bytes read: 596 > > > > So, I don't know why no parts of the files are > > matching. The files are the same save for 1 or 2 > > values changing every 5 minutes. I don't know if > > anyone here is familiar with RRD files, but they're > > database files commonly used for SNMP data collection. > > All the fields are created initially so the file size > > never changes -- the fields are populated as time goes > > on. > > > > Is RSync unable to do incremental transfers of > > non-text files? > > No, it is perfectlty capable of this. > > Is is possible that your files are changing in widely scattered places, such > that every block that rsync examines has changed? Since only 596 bytes were read, the receiving side clearly doesn't even see the old files and send the checksums (10MB of literal data should be around 10MB/700 * 6 bytes of checksums). So you appear to be rsync'ing to files on the receiving side that don't exist, or they cannot be read (permissions problem?). Please check your path names etc. What happens if you run the same command twice? Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Cygwin issues: modify-window and hangs
> Has *anybody* been able to figure out a fix for this that really works? Why does the receiving child wait in a loop to get killed, rather than just exit()? I presume cygwin has some problem or race condition in the wait loop, kill and wait_process(). The pipe to the parent will read 0 bytes (EOF) on the parent side after the child exits. Although I haven't tried it, I would guess this should be the reliable solution on all platforms. But there must be some good reason the wait loop, kill and wait_process() contortions appeared in the code (maybe some race condition with the remote side?)... Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Storage compression patch for Rsync (unfinished)
> Block checksums come from the receiver so cached block > checksums are only useful when sending to a server which had > better know it has block checksums cached. The first statement is true (block checksums come from the receiver), but the second doesn't follow. I need to cover the case where the client is the receiver and the client is caching the checksums. That needs a command-line switch, since the server would otherwise use time(NULL) as the checksum seed, which is then sent from the server to the client at protocol startup. I agree with your changes though: the command-line handling code can set checksum_seed if any of write-batch, read-batch, or fixed-checksum-seed are specified, avoiding the additional variable. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Storage compression patch for Rsync (unfinished)
> Is there any reason why caching programs would need to set the > value, rather than it just being a fixed value? > I think it is hard to describe what this is for and what it should be > set to. Maybe a --fixed-checksum-seed option would make some sense, > or for a caching mechanism to be built in to rsync if it is shown to > be very valuable. A fixed value would be perfectly ok; the same magic value that batch mode uses (32761) would make sense. > I know people have proposed some caching mechanisms in the past and > they've been rejected for one reason or another. One difficulty is that additional files, or new file formats, are needed for storing the checksums, and that moves rsync further away from its core purpose. > I don't think I'll include the option in 2.5.6. If I submitted a new patch with --fixed-checksum-seed, would you be willing to at least add it to the patches directory for 2.5.6? I will be adding block and file checksum caching to BackupPC, and that needs --fixed-checksum-seed. This will save me from providing a customized rsync (or rsync patches) as part of BackupPC; I would much rather tell people to get a vanilla 2.5.6 rsync release and apply the specific patch that comes with the release. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
possible typo/bug in receiver.c
The following code in receiver.c around line 421 (2.5.6pre1) contains some dead code: /* we initially set the perms without the setuid/setgid bits to ensure that there is no race condition. They are then correctly updated after the lchown. Thanks to [EMAIL PROTECTED] for pointing this out. We also set it initially without group access because of a similar race condition. */ fd2 = do_mkstemp(fnametmp, file->mode & INITACCESSPERMS); if (fd2 == -1) { rprintf(FERROR,"mkstemp %s failed: %s\n",fnametmp,strerror(errno)); receive_data(f_in,buf,-1,NULL,file->length); if (buf) unmap_file(buf); if (fd1 != -1) close(fd1); continue; } /* in most cases parent directories will already exist because their information should have been previously transferred, but that may not be the case with -R */ if (fd2 == -1 && relative_paths && errno == ENOENT && create_directory_path(fnametmp, orig_umask) == 0) { strlcpy(fnametmp, template, sizeof(fnametmp)); fd2 = do_mkstemp(fnametmp, file->mode & INITACCESSPERMS); } if (fd2 == -1) { rprintf(FERROR,"cannot create %s : %s\n",fnametmp,strerror(errno)); receive_data(f_in,buf,-1,NULL,file->length); if (buf) unmap_file(buf); if (fd1 != -1) close(fd1); continue; } If mkstemp() fails (for various reasons, including the directory not existing) then fd == -1. So the first if () executes, which flushes the data and does a continue. So the next two if () statements will never execute. It might be an editing error (not sure how old it is). It looks like the first if () statement was meant to be replaced by the next two; ie: the first if () statement should be eliminated. I haven't backed out a command-level example that shows the difference, but it relates to receiving into a path whose last two or more directories don't exist. Is rsync meant to create deep directories that don't exist? Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Storage compression patch for Rsync (unfinished)
> While the idea of rsyncing with compression is mildly > attractive i can't say i care for the new compression > format. It would be better just to use the standard gzip or > other format. If you are going to create a new file type > you could at least discuss storing the blocksums in it so > that the receiver wouldn't have to generate them. Yes! Caching the block checksums and file checksums could yield a large improvement for the receiver. However, an integer checksum seed is used in each block and file MD4 checksum. The default value is unix time() on the server, sent to the client at startup. So currently you can't cache block and file checksums (technically it is possible for block checksums since the checksum seed is appended at the end of each block, so you could cache the MD4 state prior to the checksum seed being added; for files you can't since the checksum seed is at the start). Enter a new option, --checksum-seed=NUM, that allows the checksum seed to be fixed. I've attached a patch below against 2.5.6pre1. The motivation for this is that BackupPC (http://backuppc.sourceforge.net) will shortly release rsync support, and I plan to support caching block and file checksums (in addition to the existing compression, hardlinking among any identical files etc). So it would be really great if this patch, or something similar, could make it into 2.5.6 or at a minimum the contributed patch area in 2.5.6. [Also, this option is convenient for debugging because it makes the rsync traffic identical between runs, assuming the file states at each end are the same too.] Thanks, Craig ### diff -bur rsync-2.5.6pre1/checksum.c rsync-2.5.6pre1-csum/checksum.c --- rsync-2.5.6pre1/checksum.c Mon Apr 8 01:31:57 2002 +++ rsync-2.5.6pre1-csum/checksum.c Thu Jan 16 23:38:47 2003 @@ -23,7 +23,7 @@ #define CSUM_CHUNK 64 -int checksum_seed = 0; +extern int checksum_seed; extern int remote_version; /* diff -bur rsync-2.5.6pre1/compat.c rsync-2.5.6pre1-csum/compat.c --- rsync-2.5.6pre1/compat.cSun Apr 7 20:50:13 2002 +++ rsync-2.5.6pre1-csum/compat.c Fri Jan 17 21:18:35 2003 @@ -35,7 +35,7 @@ extern int preserve_times; extern int always_checksum; extern int checksum_seed; - +extern int checksum_seed_set; extern int remote_version; extern int verbose; @@ -64,11 +64,14 @@ if (remote_version >= 12) { if (am_server) { - if (read_batch || write_batch) /* dw */ + if (read_batch || write_batch) { /* dw */ + if ( !checksum_seed_set ) checksum_seed = 32761; - else + } else { + if ( !checksum_seed_set ) checksum_seed = time(NULL); write_int(f_out,checksum_seed); + } } else { checksum_seed = read_int(f_in); } diff -bur rsync-2.5.6pre1/options.c rsync-2.5.6pre1-csum/options.c --- rsync-2.5.6pre1/options.c Fri Jan 10 17:30:11 2003 +++ rsync-2.5.6pre1-csum/options.c Thu Jan 16 23:39:17 2003 @@ -116,6 +116,8 @@ char *backup_dir = NULL; int rsync_port = RSYNC_PORT; int link_dest = 0; +int checksum_seed = 0; +int checksum_seed_set; int verbose = 0; int quiet = 0; @@ -274,6 +276,7 @@ rprintf(F," --bwlimit=KBPS limit I/O bandwidth, KBytes per second\n"); rprintf(F," --write-batch=PREFIXwrite batch fileset starting with PREFIX\n"); rprintf(F," --read-batch=PREFIX read batch fileset starting with PREFIX\n"); + rprintf(F," --checksum-seed=NUM set MD4 checksum seed\n"); rprintf(F," -h, --help show this help screen\n"); #ifdef INET6 rprintf(F," -4 prefer IPv4\n"); @@ -293,7 +296,7 @@ OPT_COPY_UNSAFE_LINKS, OPT_SAFE_LINKS, OPT_COMPARE_DEST, OPT_LINK_DEST, OPT_LOG_FORMAT, OPT_PASSWORD_FILE, OPT_SIZE_ONLY, OPT_ADDRESS, OPT_DELETE_AFTER, OPT_EXISTING, OPT_MAX_DELETE, OPT_BACKUP_DIR, - OPT_IGNORE_ERRORS, OPT_BWLIMIT, OPT_BLOCKING_IO, + OPT_IGNORE_ERRORS, OPT_BWLIMIT, OPT_BLOCKING_IO, OPT_CHECKSUM_SEED, OPT_NO_BLOCKING_IO, OPT_WHOLE_FILE, OPT_NO_WHOLE_FILE, OPT_MODIFY_WINDOW, OPT_READ_BATCH, OPT_WRITE_BATCH, OPT_IGNORE_EXISTING}; @@ -306,6 +309,7 @@ {"ignore-times",'I', POPT_ARG_NONE, &ignore_times , 0, 0, 0 }, {"size-only",0, POPT_ARG_NONE, &size_only , 0, 0, 0 }, {"modify-window",0, POPT_ARG_INT,&modify_window, OPT_MODIFY_WINDOW, 0, 0 }, + {"checksum-seed",0, POPT_ARG_INT,&checksum_seed, OPT_CHECKSUM_SEED, 0, 0 }, {"one-file-system", 'x', POPT_ARG_NONE, &one_file_system , 0, 0, 0 }, {"delete", 0, POPT_ARG_NONE, &delete_mode , 0, 0, 0 }, {"existing", 0, POPT_ARG_NONE, &only_existing , 0, 0, 0 }, @@ -489,6 +493,13 @
Initial release of PerlRsync (perl rsync client)
I have just released the first version 0.10 of File::RsyncP to SourceForge. See: http://perlrsync.sourceforge.net File::RsyncP is a perl implementation of an Rsync client. It is compatible with Rsync 2.5.5 (protocol version 26). It can send or receive files, either by running rsync on the remote machine, or connecting to an rsyncd deamon on the remote machine. What use is File::RsyncP? The main purpose is that File::RsyncP separates all file system I/O into a separate module, which can be replaced by any module of your own design. This allows rsync interfaces to non-filesystem data types (eg: databases) to be developed with relative ease. File::RsyncP was initially written to provide an Rsync interface for BackupPC, http://backuppc.sourceforge.net. See BackupPC for programming examples. File::RsyncP does not yet provide a command-line interface that mimics native Rsync. Instead it provides an API that makes it possible to write simple scripts that talk to rsync or rsyncd. The File::RsyncP::FileIO module contains the default file system access functions. File::RsyncP::FileIO may be subclassed or replaced by a custom module to provide access to non-filesystem data types. If you are interested there are a couple of mailing lists (perlrsync-announce and perlrsync-users) available on the SF project page. Merry Christmas, Happy Holidays, Happy Hannukkah etc. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Statistics appearing in middle of file list -- no errors
> Has anybody seen this? We want to seperate the statistics out from the > file list, and were using tail to grab the end of the file. the command > we run is: > >rsync -r -a -z --partial --suffix=".backup" --exclude="*.backup" \ > --stats -v /. 10.1.1.60::cds101/ > /var/log/rsync.log 2>&1 > > along with a number of excludes to skip the /tmp, /dev, /var and /proc > directories. The output in file /var/log/rsync.log is: > > building file list ... done > dev/ttyp0 > etc/cups/certs/ > etc/cups/certs/0 > etc/mail/statistics > root/.bash_history > smb_shares/var/lib/dhcp/ > smb_shares/var/lib/dhcp/dhcpd.leases > smb_shares/var/lib/dhcp/dhcpd.leases~ > smb_shares/var/log/debug > smb_shares/var/log/mail > smb_shares/var/log/messages > smb_shares/var/log/rsync.log > smb_shares/var/log/secure > smb_shares/var/run/utmp > smb_shares/var/spool/clientmqueue/ > smb_shares/var/spool/mail/ > smb_shares/var/spool/mail/root > smb_shares/var/spool/mqueue/ > usr/local/samba/var/locks/ > usr/local/samba/var/locks/browse.dat > > Number of files: 169315 > Number of files transferred: 13 > Total file size: 1714847358 bytes > Total transferred file size: 1013994 bytes > Literal data: 30552 bytes > Matched data: 983834 bytes > File list size: 3438061 > Total bytes written: 3442643 > Total bytes read: 8794 > > wrote 3442643 bytes read 8794 bytes 9094.70 bytes/sec > total size is 1714847358 speedup is 496.85 > dev/ > etc/cups/certs/ > etc/mail/ > root/ > smb_shares/var/lib/dhcp/ > smb_shares/var/log/ > smb_shares/var/run/ > smb_shares/var/spool/mail/ > usr/local/samba/var/locks/ > > Any ideas? Thanks! The final output appears to be from the final directory permission fixup. The child process on the receiving side generates the stats, then does deletes, hardlinks and a final fix of the directory mtimes. In 2.5.5 this output should be disabled, see line 290 of generator.c: /* f_out is set to -1 when doing final directory permission and modification time repair */ if (set_perms(fname,file,NULL,0) && verbose && (f_out != -1)) rprintf(FINFO,"%s/\n",fname); return; Are you running 2.5.5? Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync to 2000/NT servers?
> Watch out for pagefile.sys (i think!)... it's won't copy. (let me know about > any other's) Most important files won't copy. The registry files are locked and can't be read by rsync/cygin (nor are they served by smb). Similarly, the outlook.pst file used by Outlook (which contains all the email, attachments, calendar and address book info of an outlook user) is locked whenever outlook is open (which is most of the time). Exchange databases, SQL databases will be locked too. Any file open by a windows app is likely locked too. So you can get 99% of the files, but the 1% you miss are the most critical. > Now, can you think of a way to sync the win 2000 OS? (the WHOLE flippin' > system) so that if it were to go down one could restore the full installation > (bootstraps, bootloader, ect!!?) by means of the rsync'ed "backup". > please? thank you. ;-) I wish this was possible, but I don't know how to do this. Commercial products use an OFM (open file manager) to allow locked files to be accessed. Products are sold by companies like St. Bernard or Columbia Data Products. Apparently Veritas and Legato bundle this product with their commercial backup products. See for example http://www.stbernard.com/products/docs/ofm_whitepaperV8.pdf What we need is an open source OFM that is compatible with rsync. Then bare-metal WinXX recovery would be possible. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Rsync performance increase through buffering
I've been studying the read and write buffering in rsync and it turns out most I/O is done just a couple of bytes at a time. This means there are lots of system calls, and also most network traffic comprises lots of small packets. The behavior is most extreme when sending/receiving file deltas of identical files. The main case where I/O is buffered is writes from the server (when io multiplexing is on). These are usually buffered in 4092 byte chunks with a 4 byte header. However, reading of these packets is usually unbuffered, and writes from the client are generally not buffered. For example: when receiving 1st phase checksums (6 bytes per block), 2 reads are done: one of 4 bytes and one of 2 bytes, meaning there are 4 system calls (select/read/select/read) per 6 bytes of checksum data). One cost of this is some performance, but a significant issue is that unbuffered writes generate very short (and very many) ethernet packets, which means the overhead is quite large on slow network connections. The initial file_list writing is typically buffered, but reading it on the client is not. There are some other unneeded system calls: - One example is that show_progress() calls gettimeofday() even if do_progress is not set. show_progress() is called on every block, so there is an extra system call per (700 byte) block. - Another example is that file_write writes each matching (700 byte) block without buffering, so that's another system call per block. To study this behavior I used rsync-2.5.6cvs and had a benchmark area comprising around 7800 files of total size 530MB. Here are some results doing sends and receives via rsyncd, all on the same machine, with identical source and destination files. In each case --ignore-times (-I) is set, so that every file is processed: - Send test: strace -f rsync -Ir . localhost::test |& wc shows there are about 2,488,775 system calls. - Receive test: strace -f rsync -Ir localhost::test . |& wc shows there are about 1,615,931 system calls. - Rsyncd has a roughly similar numbers of system calls. - Send test from another machine (cygwin/WinXP laptop): tcpdump port 873 |& wc shows there are about 701,111 ethernet packets (many of them only have a 4 byte payload). Since the source and dest files are the same, the send test only wrote 1,738,797 bytes and read 2,139,848 bytes. These results are similar to rsync 2.5.5. Below is a patch to a few files that adds read and write buffering in the places where the I/O was unbuffered, adds buffering to write_file() and removes the unneeded gettimeofday() system call in show_progress(). The results with the patch are: - Send test: 46,835 system calls, versus 2,488,775. - Receive test: 138,367 system calls, versus 1,615,931. - Send test from another machine: 5,255 ethernet packets, versus 701,111. If the tcp/ip/udp/802.3 per-packet overhead is around 60 bytes, that means the base case transfers an extra 42MB of data, even though the useful data is only around 2MB. The absolute running time on the local rsyncd test isn't much different, probably because the test is really disk io limited and system calls on an unloaded linux system are pretty fast. However, on a network test doing a send from cygwin/WinXP to rsyncd on rh-linux the running time improves from about 700 seconds to 215 seconds (with a cpu load of around 17% versus 58%, if you believe cygwin's cpu stats). This is probably an extreme case since the system call penalty in cygwin is high. But I would suspect a significant improvement is possible with a slow network connection, since a lot less data is being sent. Note also that without -I rsync is already very fast, since it skips (most) files based on attributes. With or without this patch the test suite passes except for daemon-gzip-upload. One risk of buffering is the potential for a bug caused by a missing io_flush: deadlock is possible, so try the patch at your own risk... Craig ### diff -bur rsync/fileio.c rsync-craig/fileio.c --- rsync/fileio.c Fri Jan 25 15:07:34 2002 +++ rsync-craig/fileio.cSat Dec 7 22:21:10 2002 @@ -76,7 +76,35 @@ int ret = 0; if (!sparse_files) { - return write(f,buf,len); + static char *writeBuf; + static size_t writeBufSize; + static size_t writeBufCnt; + + if ( !writeBuf ) { + writeBufSize = MAX_MAP_SIZE; + writeBufCnt = 0; + writeBuf = (char*)malloc(MAX_MAP_SIZE); + if (!writeBuf) out_of_memory("write_file"); + } + ret = len; + do { + if ( buf && writeBufCnt < writeBufSize ) { + size_t copyLen = len; + if ( copyLen > writeBufSize - writeBu
Re: rsync as a bakcup tool and the case of rotated logs
> 1) have rsync "understand" that file names might have changed, maybe by > comparing files through their md5 signature instead of by their name, > that way rsync would see that /backup/syslog.198.gz is the same as > /var/log/syslog.197.gz and not retransfer it, The best choice is to rename the syslog files with a date, and don't repeatedly rename them, eg: syslog.MMDD.gz (eg: syslog.20021128.gz). Pruning old ones isn't too hard: simply reverse sort the names and remove everything after the first 213. It also makes it easier to find a particular log file. > 2) create hard-links to identical files in > --backup-dir=/backup/incremental-2002-11-27 when is detects that > /server/sylog.138.gz is the actually the same as > /backup/current/syslog.137.gz, BackupPC is one package that does this; see http://backuppc.sourceforge.net. (disclaimer: I'm the author). I'm in the process of adding rsync support. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: unexpected tag 90
> can anybody help? what does tag 90 mean? It looks like the sender and receiver are getting out of sync while the file list is being sent. The data sent in blocks. Each block starts with an 8 bit tag and a 24 bit length. The valid values of the tag are 7,8,9,10. Any other value (eg: 90) produces an error. See read_unbuffered() in io.c. Your strace shows: read(5, "sysa", 4) This should be the tag and length, which is clearly wrong. The 90 is 'a' - 7. Beyond this, I don't know why this is happening. One completely random thought: what is the LANG setting in /etc/sysconfig/i18n on the client machines? If it is UTF-8 I would suggest trying it with "en_US": LANG="en_US" Other than that, I would suggest running gdb or adding debug statements to see where it gets out of sync. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Speed problem
> > > You haven't really provided enough data to even guess what > > > is limiting your performance. How similar is the directory tree on the target (receiving) machine? There are three general possibilities: - It's empty. - It's present, and substantially similar to the sending end. - It's present, but substantially different to the sending end. In the first case rsync should be i/o limited (disk or network). In the second and third cases rsync could easily be cpu limited on the sending end. In the third case it could also be disk (specifically seek) limited on the receiving end. For example, you might dump a large database to a binary file, whose content (records) are similar, but the order might change dramatically. This could take a huge number of seeks on the receiving machine to rebuild the file, even though only a small amount of data is transferred. Unless I'm missing something, the behavior you observe could simply be rsync hitting files (or directories) that are in the different categories above. I'd try adding the -v option and see if the "slowdown" always happens on certain files. Then try running rsync on just those files. If it is slow right away then maybe this explanation is correct. If it still goes fast, then slows down, then there is something else going on. As another test, run rsync to an empty target directory. Rsync should be i/o limited for the entire running time. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Rsync help
> SUN box, 2gig ram, hard drive space to spare. Rsync 2.5.5, solaris 5.7 > version 7. > Half moon, I think it only seems to work on full moon nights. > > Here's the command I run as well . > /usr/local/bin/rsync --delete --partial -P -p -z -e /usr/local/bin/ssh /dir1 > systemname:/storage > > [snip] > > > I get the following transering a large file use rsync over ssh. > > > > root@pbodb bin$ ./ausbk.sh > > building file list ... > > 10 files to consider > > ERROR: out of memory in generate_sums > > rsync: connection unexpectedly closed (8 bytes read so far) > > rsync error: error in rsync protocol data stream (code 12) at io.c(150) How big are the files you are trying to rsync? It is probably failing here: if (verbose > 3) rprintf(FINFO,"count=%d rem=%d n=%d flength=%.0f\n", s->count,s->remainder,s->n,(double)s->flength); s->sums = (struct sum_buf *)malloc(sizeof(s->sums[0])*s->count); if (!s->sums) out_of_memory("generate_sums"); sizeof(s->sums[0]) is at least 32, and s->count is ceil() of file size divided by the block size (default is 700). So this malloc should be around 5% of the largest file size (eg: approx 500MB for a 10GB file). If VM is tight on your machine (you say it is intermittent) then this might fail. You could try - and see what the previous rprintf() shows -- unfortunately inside the loop below it also prints every checksum when verbose > 3 so you will get a huge amount of output; just tailing the output should be enough. A solution is to increase the block size (eg --block-size=4096), which reduces the malloc() needs proportionally. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: ERROR: buffer overflow in receive_file_entry
> has anyone seen this error: > > ns1: /acct/peter> rsync ns1.pad.com::acct > overflow: flags=0xe8 l1=3 l2=20709376 lastname=. > ERROR: buffer overflow in receive_file_entry > rsync error: error allocating core memory buffers (code 22) at util.c(238) > ns1: /acct/peter> Either something is wrong with your setup or configuration or this is a bug. The packed file list data sent right at the start is not being decoded correctly. l1=3 means that 3 bytes of the full name should be kept, but lastname = "." is just a single character long. Also, l2=20709376 looks like ascii, not a small integer. The flag value 0xe8 is maybe ok: long file name, same mtime, same dir, same_uid. It would be great if you could debug this further. I would first try to find a small set of files on which you get the error, then add some debug prints to writefd_unbuffered() to print what the sender is sending, and to read_unbuffered() to print what the receiver is reading. Then look for 0xe8 03 76 93 70 20 in the output (byte reversed from the error), and see what is a little before that. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Problem with checksum failing on large files
> > Would you mind trying the following? Build a new rsync (on both > > sides, of course) with the initial csum_length set to, say 4, > > instead of 2? You will need to change it in two places in > > checksum.c; an untested patch is below. Note that this test > > version is not compatible with standard rsync, so be sure to > > remove the executables once you try them. > > > > Craig > > > I changed csum_length=2 to csum_length=4 in checksum.c & this time rsync > worked on the first pass for a 2.7 GB file. Cool! > I'm assuming that this change forced rsync to use a longer checksum length > on the first pass, what checksum was actually used? Yes. It's now using adler32 + first 4 bytes of MD4 (64 bits total) for each block in the first pass, instead of adler32 + first 2 bytes of MD4 (48 bits total). With just two more bytes, the chance of first pass failure for random files of size 2.3GB with 700 byte block goes from more than 99% to 0.04%. This is in addition to the earlier problem: the chance of two different blocks of the old file having the same checksum goes from a couple of percent to vanishingly small. I agree with the earlier comments: checksum size is the key variable. Block size is secondary. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Problem with checksum failing on large files
> I tried "--block-size=4096" & "-c --block-size=4096" on 2 files (2.35 GB & > 2.71 GB) & still had the same problem - rsync still needed to do a second > pass to successfully complete. These tests were between Solaris client & AIX > server (both running rsync 2.5.5). Yes, for 2.35GB there is a 92% chance, on average, that it will fail with 4096 byte blocks. > As I mentioned in a previous note, a 900 MB file worked fine with just "-c" > (but required "-c" to work on the first pass). > > I'm willing to try the "fixed md4sum implementation", what do I need for > this? The "fixed md4sum" refers to some minor tweaks for block lengths of 64*n, plus files bigger than 512MB, to get correct md4 sums. But this shouldn't make a difference for you. Would you mind trying the following? Build a new rsync (on both sides, of course) with the initial csum_length set to, say 4, instead of 2? You will need to change it in two places in checksum.c; an untested patch is below. Note that this test version is not compatible with standard rsync, so be sure to remove the executables once you try them. Craig --- checksum.c 1999-10-25 15:04:09.0 -0700 +++ checksum.c.new 2002-10-14 09:40:34.0 -0700 @@ -19,7 +19,7 @@ #include "rsync.h" -int csum_length=2; /* initial value */ +int csum_length=4; /* initial value */ #define CSUM_CHUNK 64 @@ -120,7 +120,7 @@ void checksum_init(void) { if (remote_version >= 14) -csum_length = 2; /* adaptive */ +csum_length = 4; /* adaptive */ else csum_length = SUM_LENGTH; } -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Problem with checksum failing on large files
craig> My theory is that this is expected behavior given the check sum size. derek> Craig, derek> Excellent analysis! donovan> I was a bit concerned about his maths at first, but I did donovan> it myself from scratch using a different aproach and got donovan> the same figures... Ok, so the chance that two (different) blocks have the same first-pass 48 bit checksum is small, but significant (at least 6% for a 4GB file with 700 bytes blocks). This probably isn't enough to explain Terry's problem. But it just occurred to me that checksum collisions is only part of the story. Things are really a lot worse. Let's assume that the block checksums are unique (so we don't get tripped up from this first problem). Let's also assume that the old file is completely different to the new one, ie: no blocks at any offset really match. So rsync will compare the checksum at every byte offset in the file looking for any match. If there are nBlocks blocks, each check has an nBlocks / 2^48 chance of a false match. Since this test is repeated at every byte offset, the probability that the file has no false matches is: p = (1 - nBlocks / (2^48)) ^ fSize where fSize is the size of the file (more precisely the exponent should be (fSize - 700)). Now for some numbers for 700 byte blocks: - 100MB file (104857600 bytes). nBlocks = 149797. p = 0.945. - 500MB file (524288000 bytes). nBlocks = 748983. p = 0.248. - 1000MB file (1048576000 bytes). nBlocks = 1497966. p = 0.003. So, on average, if you have a random "new" 1GB file and a random "old" 1GB file, and you rsync then, the 1st phase will fail 99.7% of the time. Someone could test this theory: generate two random 500MB files and rsync them. Try it a few times. I claim that on average the first pass will fail around 75% of the time. Things get a lot better when the files are very similar. For each block that matches, rsync skips the whole block (eg, 700 bytes) before it starts looking for matching checksums. So for a file that is identical it only does nBlock checks, not fSize checks (700 times fewer). I recall from Terry's output that the number of bytes transferred after the two attempts was roughly the same as the file size, so about half the file is different. In this case, about fSize/2 lookups will be done p = (1 - nBlocks / (2^48)) ^ (fSize/2) which is about 0.06 (ie: a 94% chance the 1st pass fails). For a 1GB byte file with 4096 byte blocks and about half the file changed, the probability of the first pass working is about 62%, which is still not great. So just doing a single test with a 4096 block size might not confirm or contradict my hypothesis. The probability does go up to about 97% with a 64K block size. If my new hypothesis is correct we definitely need to increase the size of the first-pass checksum for files bigger than maybe 50MB. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Problem with checksum failing on large files
terry> I'm having a problem with large files being rsync'd twice terry> because of the checksum failing. terry> Is there a different checksum mechanism used on the second terry> pass (e.g., different length)? If so, perhaps there is an terry> issue with large files for what is used by default for the terry> first pass? The first pass block checksum is 48 bits: the 32 bit adler32 (rolling) checksum, plus the first 2 bytes of the MD4 block checksum. The second pass is 160 bits: the same 32 bit adler32 (rolling) plus the entire 128 bit MD4 block checksum. donovan> I wonder if this is related to the rsync md4sum not producing donovan> "correct" md4sums for files larger than 512M? donovan> donovan> The existing rsync md4sum implementation does not produce donovan> md4sums the same as the RSA implementation for files larger donovan> than 512M... but I thought it was consistant with itself so donovan> this didn't affect anything. I doubt this matters, for just the reason you mention: it is consistent and statistically it is still well behaved, so it won't matter. My theory is that this is expected behavior given the check sum size. Now, 48 bits sounds like a lot. Let's start with an analogy. If I have 23 (randomly-selected) people in a room, what is the probability that some pair of people have the same birthday? You might guess it is quite small, maybe 23/365. But that's wrong. It's actually more than 50%. The probability that 3 people have different birthdays is: 364/365 * 363/365. Similarly, the probability that 23 people all have unique birthdays is 364/365 * 363/365 * * 343/365, which is less than 0.5 (50%). So, back to our first pass checksum. A 4GB file has 2^32 / 700 blocks. (The blocks are like the people, each birthday is the checksum, and the 2^48 possible checksums are like the 365 days in the year.) Let's assume the 48 bit checksums are random. What's the chance that two blocks have the same checksum? It sounds very unlikely, but the chance is around 6.5%. For an 8GB file it's 23%. In reality, the block checksums are not completely random, so the real probabilities of a collision will be higher. If we increase the block size to 2048, the probabilities drop to 0.8% for a 4GB file and 3% for an 8GB file. For a block size of 4096 we get 0.2% for a 4GB file and 0.8% for an 8GB file. To test this theory, try a bigger --block-size (eg: 4096). If you still see a similar number of files needing a repeat then my theory is wrong, and a bug could be the cause. If the theory is supported by your tests (ie: most/all files work on the first pass) then rsync could use an adaptive-length first pass checksum: use one or two more bytes from the MD4 block checksum (ie: 56 or 64 bits total) for files bigger than, say, 256MB and 2GB. Both sides know the file size. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: MD4 bug in rsync for lengths = 64 * n
> This is the first detailed description of the problem I've seen. I've heard > it mentioned several times before, and thought that the md4 code in librsync > was the same as in rsync. I've looked and tweaked the md4 code in librsync > and could never see the bug so I thought it was a myth. I also thought that > samba used this code I wonder what variant it is using :-) Samba looks right to me. Anyhow, I looked at the archives and found this message, so I have simply rediscovered the same bug as Tridge: http://www.mail-archive.com/rsync@lists.samba.org/msg03919.html > > > The fix is easy: a couple of ">" checks should be ">=". I can send > > > diffs if you want. But of course this can't be rolled in unless it > > > is coupled with a bump in the protocol version. > > > > Another bump in the protocol version is no problem. Please submit a patch. > > I can submit patches if required for the md4code as tweaked/fixed for > librsync. The fixed code is faster as well as correct :-) Sure, that would be great. Otherwise, I would be happy to recreate and test a patch. > > > email about fixing MD4 to handle files >= 512MB (I presume this > > > relates to the 64-bit bit count in the final block). Perhaps this > > > change can be made at the same time? > > > > Could you please post a reference to that email? It isn't familiar to me > > and I didn't find it through google. There have been other problems we've > > been seeing with with the end of large files and zlib compression, though. > > I wonder if it can somehow be related. > > It may not have been on the rsync list, but on the librsync list... Please > note that there are several variants of the md4 patch floating around. I've > been meaning to seperate the latest md4 patch from my bigger librsync "delta > refactor patch" for some time. I must be spacing. I can't find the earlier post either. And I also can't find my original post in the archives... Anyhow, the bug occurs for in the file MD4 digest for file lengths >= 512MB. Step 2 in the RFC for the MD4 algorithm specifies that the lower 64 bits (not 32 bits) of the data's bit length is embedded in the tail buffer; see: http://www.faqs.org/rfcs/rfc1186.html Both librsync and rsync use a 32 bit unsigned int for counting the number of bytes processed. This is then multiplied by 8 (to get bits) and this is embedded in the tail buffer when MD4 finishes up. So for files bigger than 4GB bits (512MB) the 32 bit unsigned int overflows. Again, a benign bug but a little disconcerting if you are using another program to check MD4 digests of large files. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html