Thanks so much for the info. It does appears as though rsync scans the entire subdir before doing anything, which seems pretty inefficient, perhaps this will be improved in a future release. Although, maybe it has to be this way, so that the --delete commands can work?
On Thu, Jul 26, 2012 at 4:42 PM, Lars Ellenberg <lars.ellenb...@linbit.com>wrote: > On Thu, Jul 19, 2012 at 01:51:43PM -0400, Cary Lewis wrote: > > I want to use rsync with a cloud based rsync provider to do off-site > > backing up of a large (1TB) dataset which consists of 32 million+ files > > spread out in 300 directories. So the amount of files in any one > directory > > can be quite large (upwards of 2 million). > > You realize that stat() is a costly operation, > especially if the inodes are cache cold, even more so if something else > stresses the IO and VM subsystems on the box. > > On a moderately loaded box, recursively stating 3 million files > occasionally took 90 minutes and more. Doing the same once the inodes > are cache-hot takes the same box under the same overall stress 30 to 90 > *seconds*. > > Holding 3 Millon dentries and inodes cache-hot requires (on that box, > anyways) ~ 5 Gigabyte of slab memory (of 128 G available...). > > So if you want to regularly recursively stat (and that's what rsync > needs to do) 32 millon files, you better add more ram, much more ram, > to your box. > > Also, you mention Cygwin. > IIRC, by default, that will still treat file names as case*in*sensitive, > so you get really bad (maybe O N^2?) behaviour > when walking large directories. > > There was some setting which I do not remember right now, > to tell rsync and/or cygwin to treat this as casesensitive, > which can seriously improve behaviour with large directories. > > > Rsync doesn't seem to cope with this well - even doing local copies in a > > directory with several thousands of files takes a long time to initiate > any > > transferring. > > I'm speculating here. > But I thought the file list generation is still per sub-directory, so > would need to scan the current subdir fully before starting to work on > the resulting partial file list. > > > I though that with version 3, rsync was supposed to start transferring > > before fully testing all of the files in a directory? > > > > I am using version 3.0.9 under Cygwin. > > > > Is there a command line switch I am supposed to use to force rsync to > start > > transferring more quickly? > > > > Any insight / suggestions would be most appreciated. > > > -- > : Lars Ellenberg > : LINBIT | Your Way to High Availability > : DRBD/HA support and consulting http://www.linbit.com > -- > Please use reply-all for most replies to avoid omitting the mailing list. > To unsubscribe or change options: > https://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html >
-- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html