Thanks so much for the info. It does appears as though rsync scans the
entire subdir before doing anything, which seems pretty inefficient,
perhaps this will be improved in a future release. Although, maybe it has
to be this way, so that the --delete commands can work?

On Thu, Jul 26, 2012 at 4:42 PM, Lars Ellenberg
<lars.ellenb...@linbit.com>wrote:

> On Thu, Jul 19, 2012 at 01:51:43PM -0400, Cary Lewis wrote:
> > I want to use rsync with a cloud based rsync provider to do off-site
> > backing up of a large (1TB) dataset which consists of 32 million+ files
> > spread out in 300 directories. So the amount of files in any one
> directory
> > can be quite large (upwards of 2 million).
>
> You realize that stat() is a costly operation,
> especially if the inodes are cache cold, even more so if something else
> stresses the IO and VM subsystems on the box.
>
> On a moderately loaded box, recursively stating 3 million files
> occasionally took 90 minutes and more.  Doing the same once the inodes
> are cache-hot takes the same box under the same overall stress 30 to 90
> *seconds*.
>
> Holding 3 Millon dentries and inodes cache-hot requires (on that box,
> anyways) ~ 5 Gigabyte of slab memory (of 128 G available...).
>
> So if you want to regularly recursively stat (and that's what rsync
> needs to do) 32 millon files, you better add more ram, much more ram,
> to your box.
>
> Also, you mention Cygwin.
> IIRC, by default, that will still treat file names as case*in*sensitive,
> so you get really bad (maybe O N^2?) behaviour
> when walking large directories.
>
> There was some setting which I do not remember right now,
> to tell rsync and/or cygwin to treat this as casesensitive,
> which can seriously improve behaviour with large directories.
>
> > Rsync doesn't seem to cope with this well - even doing local copies in a
> > directory with several thousands of files takes a long time to initiate
> any
> > transferring.
>
> I'm speculating here.
> But I thought the file list generation is still per sub-directory, so
> would need to scan the current subdir fully before starting to work on
> the resulting partial file list.
>
> > I though that with version 3, rsync was supposed to start transferring
> > before fully testing all of the files in a directory?
> >
> > I am using version 3.0.9 under Cygwin.
> >
> > Is there a command line switch I am supposed to use to force rsync to
> start
> > transferring more quickly?
> >
> > Any insight / suggestions would be most appreciated.
>
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> --
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
>
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to