I'm experiencing a hang with rsync 2.4.6 on Solaris. Inititating and target
hosts are both Solaris 2.6. It looks like there might be some network
latency issues, but the parent rsh process has been blocking on the same
write() for several hours now, so I don't think that's quite it. It also
looks like something's quite hung up, because the 15-minute timeout isn't
timing out.
This is for an rsync push of a large directory tree. The command is:
/usr/local/bin/rsync \
-avzHlW \
--rsync-path=/usr/local/bin/rsync \
--timeout=900 \
--delete \
--exclude (some excludes here) \
/local/directory/name/* \
remotehost:/remote/directory/name
The TCP queue on the sending host looks like this:
Local Address Remote Address Swind Send-Q Rwind Recv-Q State
-------------------- -------------------- ----- ------ ----- ------ -------
thishost.1018 remotehost.shell 8760 0 0 0 ESTABLISHED
thishost.1017 remotehost.1022 8760 0 8760 0 ESTABLISHED
The TCP queue on the receiving host looks like this:
Local Address Remote Address Swind Send-Q Rwind Recv-Q State
-------------------- -------------------- ----- ------ ----- ------ -------
remotehost.shell thishost.1018 1 0 8760 0 ESTABLISHED
remotehost.1022 thishost.1017 8760 0 8760 0 ESTABLISHED
The "rsync --avzHlW" process on the sending host is looping on something like
this:
poll(0xEFFFD580, 0, 20) = 0
poll(0xEFFFD580, 0, 1) = 0
waitid(P_PID, 3019, 0xEFFFF588, WEXITED|WTRAPPED|WNOHANG) = 0
poll(0xEFFFD580, 0, 20) = 0
waitid(P_PID, 3019, 0xEFFFF588, WEXITED|WTRAPPED|WNOHANG) = 0
poll(0xEFFFD580, 0, 20) = 0
poll(0xEFFFD580, 0, 1) = 0
waitid(P_PID, 3019, 0xEFFFF588, WEXITED|WTRAPPED|WNOHANG) = 0
poll(0xEFFFD580, 0, 20) = 0
waitid(P_PID, 3019, 0xEFFFF588, WEXITED|WTRAPPED|WNOHANG) = 0
poll(0xEFFFD580, 0, 20) = 0
poll(0xEFFFD580, 0, 9) = 0
waitid(P_PID, 3019, 0xEFFFF588, WEXITED|WTRAPPED|WNOHANG) = 0
poll(0xEFFFD580, 0, 20) = 0
poll(0xEFFFD580, 0, 1) = 0
The parent rsh process on the sending host is stuck in:
write(1, " p a r t o f a f i l e n a m e".., 285) (sleeping...)
The child rsh process on the sending host is stuck in:
read(0, 0xEFFFF410, 1024) (sleeping...)
The "rsync --server" process on the receiving host is stuck in:
poll(0xEFFFC110, 1, 60000) (sleeping...)
The "csh --c /usr/local/bin/rsync" process on the receiving host is stuck in:
sigsuspend(0xEFFFF938) (sleeping...)
The "in.rshd" process on the receiving host is stuck in:
poll(0xEFFFD7F8, 2, -1) (sleeping...)
So, any ideas? Like I said, it looks like write() is blocking for no
particular reason, and that's causing us to sit and spin.
Thoughts?
Hal