Mark,

I reverted back to rsync 2.3.2 after a massive complain from our user trying to get 
their data from their machine to the target sites.  Since rsync been running very 
smoothly.

Just as a background.  I have been using rsync for about 3 years now and been using 
2.3.2 since it came out.  I transfer updated data daily from 40 sources to 70 targets 
globally including Australia, Europe, South America and the U.S., and the size of the 
data is about 320 GB per target where the size of the total update can be in the range 
of 50 GB per target site per day.  Our files are CAD files and the file name can be in 
access of 100 character long (Which used to be a problem in one of the releases of 
rsync, if I recall)

Regards / Mit freundlichen Gruessen
Sam Safi                        
EAI/Ford Motor Co.                      Alpha, A401     
Digital Buck & Visual Collaboration     *  <mailto:[EMAIL PROTECTED]>
Data Management & Security              *  313-39-01744




-----Original Message-----
From: Wilson, Mark - MST [mailto:[EMAIL PROTECTED]]
Sent: Sunday, May 13, 2001 11:12 PM
To: 'Dave Dykstra'
Cc: RSync List (E-mail)
Subject: RE: Problem with large include files 


Dave

A couple of points:

1. I think you are telling me that if I go back to 2.3.2 my problems should
go away. Is this correct?

2. I rather oversimplified how I am using rsync... Perhaps I had better
explain. If I can do some testing for you I am happy to do so, however there
is quite a bit of pressure for me to get my problem fixed. This I must do
first.

Environment:
I am shifting large applications from a Sun E10k domain backended with a
NetApp F840 Filer to another Sun E10k backended with a NetApp F840 Filer.
The source machine is in Auckland, New Zealand and the destination machine
is in Melbourne, Australia. The link is a 10Mbit pipe that is burstable to
30Mbit (actually I can run it full time at 30Mbit at the moment, I'm sure
that wont last). The latency on the link is about 20ms one way.

Original Problem:
Because of the "slow" link and requirement to maintain all the files, links,
etc we were limited to a few mechanisms. Originally we were ftping tgz files
but finding sufficient scratch space was a problem. Also the compress time
made things slow. We also tried various versions of rsh tar gzip, etc.
Eventually I tried rsync because it did on the fly compression and correctly
handled permissions, files, links, etc.

Locally, especially across our 1Gbit links, rsync flew. Everyone was very
happy. However it wasn't so fast across the Tasman (sea between NZ and Oz).
In fact with a bit of sniffing by the network guys we found that rsync
wasn't even using 1Mbit of the link. Hmmmm, latency methinks. So I wrote a
fancy perl script that took the required directories to be transferred and
split all the files into 20 (configurable) balanced streams. This meant 20
copies of rsync running at the same time. Of course the balancing was done
on the uncompressed file sizes so took no account of how "dense" the files
were. This is unfortunate as different streams take different amounts of
times. However the results were quite spectacular! At best, on DB files, we
were getting disk to disk transfer rates of up to 1.5Gbyte per minute. Not
bad over a 10Mbit link... It completely soaked the link and maxed out the 12
processors. Fun having all that power at your command!

The current problem:
However the euphoria rapidly wore off when I tried to transfer 128Gb of test
data, some of it quite dense with many, many small files. In fact 104755
files in total. Unfortunately if there are too many files in an include file
(--include-from) the streams quit part way through. Very upsetting... I
tried turning up the debugging and re running it. I didn't find out anything
except how many files it did before it quit. Interestingly enough, with
heaps of debug, it processed more files... Weird.

Anyway, the purpose of all this verbosity is two fold. Firstly you need to
tell me, given my environment, how you want your testing done and secondly
if you had any ideas of how to fix my problem. If we can't fix it we will
have to do the backup to tape and send it on a plane method -which we really
want to avoid. 

As a thought, have you or any of the other developers thought of getting
rsync to operate over a number of streams or to use "sliding windows" to
overcome latency effects?

I look forward to your reply.

Cheers

Mark
-----Original Message-----
From: Dave Dykstra [mailto:[EMAIL PROTECTED]]
Sent: Saturday, 12 May 2001 01:43
To: Wilson, Mark - MST
Cc: RSync List (E-mail)
Subject: Re: Problem with large include files


On Fri, May 11, 2001 at 11:41:41AM +1200, Wilson, Mark - MST wrote:
> Hi there
>  
> I recently tried to do a transfer of a directory tree with about 120,000
> files. I needed to select various files and used the --include-from option
> to select the files I needed to transfer. The include file had 103,555
> filenames in it. The problem I have is that the transfer quit after
> transferring some of the files. I am running the remote end in daemon
mode.
> Interestingly the remote daemon spawned for this job was left behind and
did
> not quit. I had to kill it when I noticed it many hours later.
Unfortunately
> I didn't have any -v options so didn't get any information as to what
caused
> it. I will be doing further tests to see if I can get more information.
>  
> Are there any restrictions on the amount of files you can have in an
include
> file?
>  
> The two machines are Sun E10000 domains with 12 processors and 12288
> Megabytes of RAM. 
>  
> Any ideas on how to crack this would be appreciated.



Ah, perhaps we finally have somebody to perform the test I have been asking
for for months.

Some background: prior to rsync version 2.4.0 there was an optimization in
rsync, which I put in back when I was officially maintaining rsync, that
would kick in whenever there was a list of non-wildcard include patterns
followed by an exclude '*' (and when not using --delete).  The optimization
bypassed the normal recursive traversal of all the files and directly
opened the included files and sent the list over.  A side effect was that
it did not require that all parent directories of included files be
explicitly included, and Andrew didn't like the fact that it behaved
differently when the optimization was in or out, so he removed the
optimization in 2.4.0.  I tried to pursuade him to leave it in, and he
asked me to prove that it made a significant performance difference.  I
tried with a list of files that was about as long as I thought I'd ever
need with my application, and I couldn't honestly say that it made a big
difference so it stayed out.

Meanwhile, people on this list have been asking that rsync get a new option
--files-from which would just take a list of files to send.  Many people
want it for convenience and not just performance, but I want to also know
what the performance impact would be.  I offered to implement it in
essentially the same way that my include/exclude '*' optimization was, but
only if somebody would measure the performance difference in their
environment and report the results.  Nobody has done that yet.

So what I'd like you to do is go back to rsync 2.3.2, and report timing
results with and without the optimization.  To turn off the optimization,
all you need to do is add a wildcard to one of the paths.  I'm pretty sure
rsync 2.3.2 only needs to be on the sending side, but to be safe it would
be better to run it on both sides.  Since you say it fails completely with
such a long list of files, perhaps you'll have to cut the list down to some
shorter list until it works without the optimization to do a fair
comparison.

- Dave Dykstra

____________________________________________________________________
CAUTION - This message may contain privileged and confidential 
information intended only for the use of the addressee named above.
If you are not the intended recipient of this message you are hereby 
notified that any use, dissemination, distribution or reproduction 
of this message is prohibited. If you have received this message in 
error please notify Air New Zealand immediately. Any views expressed 
in this message are those of the individual sender and may not 
necessarily reflect the views of Air New Zealand.
_____________________________________________________________________
For more information on the Air New Zealand Group, visit us online
at http://www.airnewzealand.com or http://www.ansett.com.au
_____________________________________________________________________

Reply via email to