Re: problems encountered in 2.4.6

Phil Howard Fri, 25 May 2001 13:18:05 -0700
Dave Dykstra wrote:

> > 2 =============================================================================
> > When syncronizing a very large number of files, all files in a large
> > partition, rsync frequently hangs.  It's about 50% of the time, but
> > seems to be a function of how much work there was to be done.  That
> > is, if I run it soon after it just ran, it tends to not hang, but if
> > I run it after quite some time (and lots of stuff to syncronize) it
> > tends to hang.  It appears to have completed all the files, but I
> > don't get any stats.  There are 3 rsync processes sitting idle with
> > no files open in the source or target trees.
> > 
> > At last count there were 368827 files and 8083 symlinks in 21749
> > directories.
> > 
> > df shows:
> > /dev/hda4             42188460  38303916   3884544  91% /home
> > /dev/hdb4             42188460  38301972   3886488  91% /mnt/hdb/home
> > 
> > df -i shows:
> > /dev/hda4            2662400  398419 2263981   15% /home
> > /dev/hdb4            2662400  398462 2263938   15% /mnt/hdb/home
> > 
> > The df numbers are not exact because change is constantly happening
> > on this active server.  Drives hda and hdb are identical and are
> > partitioned alike.
> > 
> > The command line is echoed from the script that runs it:
> > 
> > rsync -axv --stats --delete /home/. /mnt/hdb/home/.  
>1>'/home/root/backup-hda-to-hdb/home.log' 2>&1
> 
> 
> Use the -W option to disable the rsync algorithm.  We really ought to make
> that the default when both the source and destination are local.

I don't want to copy everything every time.  That's why I am using
rsync to do this in the first place.  I don't understand why this
would be what's hanging.

> > A deadly embrace?  It seems possible.
> 
> 
> No, the receiving side of an rsync transaction splits itself into two
> processes for the sake of pipelining: one to generate checksums and one to
> accept updates.  When you're sending and receiving to the same machine then
> you've got one sender and 2 receivers.

Right.  But what I was suggesting was a deadly embrace in that the
process killed was waiting for something, and the parent was waiting
for something.

I'm not using the "c" option, so why would checksum be generated?

> > I'm also curious why 26704 has no fd 1.
> 
> I don't know.  When I tried it all 3 processes had an fd 1.

Were you looking at it after it hung?  Or is it not hanging for you?
I am curious if the lack of fd 1 is related to the hang.  It is being
started with 1> and 2> redirected to a log file _and_ the whole thing
is being run via the "script" command for a "big picture" logfile.
It was set up this way with the intent to run it from cron, although
I haven't actually added it to crontab, yet, due to the problems.


> > 3 =============================================================================
> > @ERROR: max connections (16) reached - try again later
> > 
> > This occurs after just one connection is active.  It behaves as if
> > I had specified "max connections = 1".  On another server I set it
> > to 40, and it showed:
> > 
> > @ERROR: max connections (40) reached - try again later
> > 
> > so it obvious is parsing and keeping the value I configure, but it
> > isn't using it correctly.
> > 
> > Also, if I ^C the client, then I get this error every time until I
> > restart the daemon (running in standalone daemon mode, not inetd).
> > So it seems like it counts clients wrong.  But I can't get more
> > that 1 right after restarting the server, so it's a little more
> > than that somewhere.
> 
> I don't know, I never used max connections.  Could indeed be a bug.
> The code looks pretty tricky.  It's trying to lock pieces of the file
> /var/run/rsyncd.lock in order for independent processes to coordinate. 
> Are you running as root (the lsof above suggests you are)?  If not, you
> probably need to specify another file that your daemon has access to in the
> "lock file" option.  Otherwise it would probably help for you to run some
> straces.

I would have presumed since there was a daemon process running
(as opposed to running from inetd) that the daemon itself could
simply track the connection count.

One possibility here is that I do have /var/run symlinked to /ram/run
which is on a ramdisk.  So the lock file is there.  The file is there
but it is empty.  Should it have data in it?  BTW, it was in ramdisk
in 2.4.4 and this max connections problem did not exist, so if there
is a ramdisk sensitivity, it's new since 2.4.4.

-- 
-----------------------------------------------------------------
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/     |
-----------------------------------------------------------------
Re: problems encountered in 2.4.6

Reply via email to