Re: [Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2018-05-16 Thread Dave Gordon via rsync
On 17/04/18 17:47, just subscribed for rsync-qa from bugzilla via rsync
wrote:
> https://bugzilla.samba.org/show_bug.cgi?id=5478
> 
> --- Comment #34 from David Nelson  ---
> FYI. Although when rsync ver. 3.0.9 is called in Ubuntu 12.04LTS to "push"
> files to the server, e.g.,
> 
> rsync / server::Backups/ 
> 
> it fails with the error:
> 
> rsync error: error in rsync protocol data stream (code 12) at io.c(441)
> [sender=3.0.9 [NB server rsync version is 3.1.0 running in Ubuntu 14.04LTS]
> 
> However, curiously, when issued as a "pull" from the server as:
> 
> rsync client::Sharetobackup//Backups/
> 
> it succeeds.
> 
> Furthermore, pushes from the client work when issued one-letter-as-a-time;
> e.g.,
> 
> for i in {A..Z}; do \
> rsync /${i}* server::Backups/ \
> done
> 
> I discovered this while trying to isolate which, if any, of the files were
> causing it to choke. Turns out none of them were, but identifying them all 
> with
> a wildcard * was.

Hi,

did you mean to add this comment to Bug 5478, which had been closed for
~5 years? It looks like it should belong to some other (more recently
active) bug - maybe 2957, 12732, or 13109?

.Dave.


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2018-04-17 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #34 from David Nelson  ---
FYI. Although when rsync ver. 3.0.9 is called in Ubuntu 12.04LTS to "push"
files to the server, e.g.,

rsync / server::Backups/ 

it fails with the error:

rsync error: error in rsync protocol data stream (code 12) at io.c(441)
[sender=3.0.9 [NB server rsync version is 3.1.0 running in Ubuntu 14.04LTS]

However, curiously, when issued as a "pull" from the server as:

rsync client::Sharetobackup//Backups/

it succeeds.

Furthermore, pushes from the client work when issued one-letter-as-a-time;
e.g.,

for i in {A..Z}; do \
rsync /${i}* server::Backups/ \
done

I discovered this while trying to isolate which, if any, of the files were
causing it to choke. Turns out none of them were, but identifying them all with
a wildcard * was.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2013-07-10 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #33 from michaeljoya lo...@michaeljoya.com 2013-07-10 19:26:56 
UTC ---
I got this error while creating a backup copy between two drives on the same
machine. It seems that most people on this list are wrongly attributing the
problem to differing rsync versions on client and server, but it happens in the
context of a single machine.

In my own case, it turned out that the destination filesystem was simply full.

I had used an absolute path instead of a relative path in the destination, and
I was working in a systemrescue environment whose root fs was completely in
memory.
So, when the destination fs decides that it can't accept any more writes, it
will also produce this error. When I fixed the path to write to the correct
path with a greater capacity, the problem immediately went away.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2013-05-22 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #29 from Scott Wood woodystr...@hotmail.com 2013-05-22 06:05:43 
UTC ---
As a follow up to my second question, we added the firewall rules to allow
SYN,RST,ACK,FIN and ACK traffic form the server in question and it did not
solve the problem.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2013-05-22 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #30 from Loïc Gomez samba-b...@kyoshiro.org 2013-05-22 07:02:09 
UTC ---
Did you try with one of these options : --no-checksum --no-compress
--blocking-io ?

Just a check since you have a 4G limit : I suppose you're not rsync-ing to a
FAT32 filesystem ?

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2012-11-02 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

Sebastian s...@open-t.co.uk changed:

   What|Removed |Added

 CC||s...@open-t.co.uk

--- Comment #27 from Sebastian s...@open-t.co.uk 2012-11-02 09:06:27 UTC ---
I have a number of setups running backups using rsync to either usb memory
sticks or usb external hard drives. I have only encountered this problem on the
sites which use usb memory sticks. Would this be related to the sender not
being able to cope properly with the slower speed of the receiver - governed by
the slower write speed of usb memory sticks?

Anyway, after trying all sorts of things, what has cured it for me is using
--bwlimit. So far, with a value of 1000 to 2000 (KBytes/second) it has
stopped the errors completely. Of course, this slows down the backup
dramatically - but I've got all night :-) 

I haven't tested to see how far I can push the --bwlimit before the errors
show up again.

I use the following options --delete --recursive --timeout=60 --max-delete=200
--links --perms --owner --group

Hope the above helps somebody.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2012-11-01 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #26 from Justin Patrin papercr...@reversefold.com 2012-11-01 
19:40:37 UTC ---
Correction: this problem *mostly* went away by turning off compression. I still
get this error, 100% reproducable right around the same point in certain large
file transfers. Adding --inplace --partial allowed me to keep retrying the
rsync and have it eventually finish.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2012-10-30 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #25 from Justin Patrin papercr...@reversefold.com 2012-10-31 
01:32:36 UTC ---
FWIW, this problem went away for me by just removing -z and adding
--no-compress.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2012-10-23 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #24 from Loïc Gomez samba-b...@kyoshiro.org 2012-10-23 11:24:52 
UTC ---
I also had the same issue and it was(In reply to comment #11)
 I need to change my answer here. The files that cause it to choke are larger
 than most, so I would have to say that as a general observation the files that
 fail are large(r). Not huge, but large.
 
 I also need to correct my earlier statement/thinking. While the process did
 fail on the same file on consecutive runs, leading me to believe that the
 content of the file was causing the problem, I've determined that this is not
 always the case. Subsequent runs have transferred these files successfully, so
 it's not the content of the data that's a problem

I ran into a similar issue recently while transferring large files (40GB).
After a few tests, it seems - in my case at least - to be related to the
delta-xfer algorithm. The bug does not happen anymore with the -W option.

I don't know if this will resolve your issue, but you can also try looking into
these options : --no-checksum --no-compress --blocking-io. These were not the
source of my problems, but the functions they're related to might rise a
network timeout.

I hope it helps, anyways, good luck solving your issue.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2012-10-22 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #23 from Chip Schweiss c...@innovates.com 2012-10-22 16:23:10 UTC 
---
I came here when I was experiencing this problem.   I later found the problem.

If you are running on Redhat/Centos 5/6, there is a long standing bug in the
kernel distributed by Redhat that will cause these broken pipes when rsyncing
over a latent network.

There are ack packets that are getting rejected frequently, that should be
allow via connection tracking.  Insert a rule before the reject line on the
input table:

iptables -I {line} -p tcp --tcp-flags SYN,RST,ACK,FIN ACK -j ACCEPT

do an 'iptables -L -n -v --line-numbers' to figure out which line to insert at.

This affects all TCP connections not just rsync over ssh.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2012-10-01 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #22 from Gaute Hope e...@gaute.vetsj.com 2012-10-01 10:20:42 UTC 
---
(In reply to comment #21)
 After all the discussion of network matters that I saw from your thread I
 thought (as I'm working in my LAN) I could turn of openSUSE's firewall for a
 try. And: since that moment everything runs fine! And even fast. rsync doesn't
 stall and files arrive properly, independent from size (I also got some larger
 files).

I experienced a plethora of errors which all seemed to be related to this one.
The server is running on a router and when I turned off the firewall and
addressed it on its LAN interface (assuming the outside address also resulted
in the rsync connection, through ssh, passing through the firewall) it has been
working nicely. If I can address it on LAN and turn the firewall back on
everything should be fine.

I noticed that when the error happens client side rsync hangs for the timeout
and when checking the remote, before timeout, there is no rsync process
running. Perhaps the connection is severed by the firewall. I assume rsync has
no mechanism for 'retrying' the connection without restarting the entire
process?

- Gaute

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2012-07-13 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #21 from Martin Scherbaum sa...@scherbaum.org 2012-07-13 19:14:03 
UTC ---
Hello together,

although the last posting has been already done a while ago, out of respect for
the meticulous work of Eric (and Tim and Wayne) - which was a pleasure to read
for me - I'd like to add an observation I made for the very same strange rsync
behaviour.

I tried to rsync some 225GB of image data (mainly 8 to 12Mpix JPG files) from
my notebook (openSUSE 11.4) over a WLAN (ca. 120MBit raw) to a RaidSonic NAS
(connected to my router with 100Mbit cable, has officially a 1GBit interfacw)
which writes the data to a RAID1 configuration of 3.5 hard drives.
So, no real large files. But a not so fast NAS.
It hung up again and again with the above mentioned message rsync:
writefd_unbuffered failed to write 4092 bytes [sender... for different files.

After all the discussion of network matters that I saw from your thread I
thought (as I'm working in my LAN) I could turn of openSUSE's firewall for a
try. And: since that moment everything runs fine! And even fast. rsync doesn't
stall and files arrive properly, independent from size (I also got some larger
files).

Although I'm not able to make a deep analysis as the writers before, I hope
this observation helps one or the other who googles for the error message...

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2011-10-07 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #20 from Tim Taiwanese Liim tim.l...@alcatel-lucent.com 
2011-10-07 21:57:54 UTC ---
I agree with Wayne and Eric that Eric's issue is outside of rsync,
somewhere in the transport.

Eric,
Have you tried to check the TCP buffers of the ssh process on both
ends?  For example,
p=192.168.51.98:22
while true; do date; netstat -tn | grep $p; sleep 1; done
Proto Recv-Q Send-Q Local AddressForeign Address State
tcp0  19328 192.168.51.98:22 192.168.51.51:53010 ESTABLISHED
In this example, the sender has 19328 bytes in its TCP sending buffer.


You can also use tcpdump and wireshark to graph how well the tcp pipe
goes:
# catch 100 bytes of each packet on eth0, write to t.pcap.  We
# need only first 100 bytes because we care only about the TCP
# sequence number, but not the actual file content.
# need root access to sniff packets.
tcpdump -i eth0 -s 100 -w t.pcap
then feed the trace to wireshark:
wireshark t.pcap
# then select menu Statistics -- TCP Stream Graph -- 
# Time-Sequence Graph (Stevens)
With this you can visualize how the TCP flow goes (smooth or stalled
or fluctuates or excessive retries).  With tcpdump from both ends, you
can also check for lost packets.  (A few years ago I worked on a case
of stalled ssh; turned out the NIC firmware was defective, causing
excessive packet loss) Comparison between the two targets (good and
bad one) may show the difference.  Do your two targets run on the same
host machine?  Or host machines of the same configuration (same NIC
etc)?  Could one has bad NIC (eg. working but excessive packet loss in
bursts)?

As Wayne pointed out, there is pipe (or unix domain socket) between
rsync and ssh as well.  I don't know how to track the queue size in
pipe yet, so let's track the known ones (TCP) first.  

 Notice, the last successful read on the target side happened .06
 seconds before the last write on the source side, which is pretty
 much at the same time.
This is an important clue, although I don't know what to make out of
it yet; maybe a few lost tcp acks in a row?

How did your bwlimit=32 test go?


BTW, I am not rsync developer; I do use rsync a lot and tried to help
when possible, so please do not take my response as official.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2011-10-06 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #19 from Eric Shubert e...@shubes.net 2011-10-06 15:28:32 UTC ---
(In reply to comment #14)
 So, looking at your trace, it looks like the sender is waiting to write more
 data (its select lists the write and error FDs).  The receiving side's 2 pids
 are waiting to read more data.  So, something in between has either failed to
 deliver the data, or blocked, or something (it would appear). Are there other
 processes in between the 2 sides (e.g. ssh) that could have blocked?  Or
 perhaps a failed pipe (or unix domain socket)?

After further testing, I'm tending to agree that the problem lies outside of
rsync. Somehow, the select() is getting stuck (not receiving a signal from the
kernel?). I'm not yet certain which side gets stuck first, but I would guess
that it's the target, given that in one test there was about 2MB of data in
the pipe, that had been written by the source but not read yet by the target.

Working on the idea that the failure happens when the pipe is saturated, I
decided to try the bwlimit option to slow things down a bit. The upload speed
at the source is limited to about 100KBPS (DSL, shared with other users), so I
tried bwlimit=64. This worked noticeably better, although it also failed after
a few hours. There may have been usage spikes that maxed out the available bw
at the time, but I can't tell if that happened or not. So I reduced it to
bwlimit=32, and it's been running for 7 hours now without failure (hasn't
finished yet, but I'm optimistic).

So I'm inclined to think that this problem lies in the transport somewhere. It
also appears to come to light when the transport is saturated.

Does anyone have any ideas regarding how to go about tracking this down
further, or where to go for further help? While this bug is definitely
affecting rsync, I think the solution is going to end up being elsewhere, in
glibc or the kernel perhaps.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2011-09-29 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #17 from Eric Shubert e...@shubes.net 2011-09-29 17:16:38 UTC ---
(In reply to comment #16)
 
 So now I'm beginning to think that the problem is (indeed) outside of the 
 rsync
 code. I've examined the 2 target hosts (one that worked and one that doesn't),
 and tried to determine what's different?. These hosts are all VMs that are
 running under VMware Server2. The host that worked is running CentOS 5.6, 
 while
 the one that's failing is running CentOS 5.4 (450 days uptime). I'm going to
 upgrade the failing host OS to CentOS 5.7 and see if that changes anything. I
 know that some versions of glibc ( 2.5-34,  2.5.58) had mysterious problems
 with VMware Server 2 (guests terminating for no apparent reason). I'm 
 wondering
 if there's something else in glibc that's perhaps having a problem with VMware
 Server 2. (I realize that Server 2 is essentially dead and will be migrating 
 to
 KVM in the near future, although Server2 has been quite stable for me.)
 

I've upgrade the target host, and still no dice. Same failure. It seems odd to
me that the transmission fails at different points in the process, anywhere
from 3.5 minutes to 14.75 minutes.

I'm presently thinking that the problem is not with rsync itself, but with some
aspect of the network connection. I'm seeing no corresponding ethernet errors
on either end though, and the network seems to be fine otherwise.

I'd have to say at this point that the failures appear to be happening only
with large files.

I could sure use some pointers with identifying what the cause of this failure
is. I have a test file that fails regularly (if not consistently), and should
be able to tell easily when it's fixed.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2011-09-29 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #18 from Eric Shubert e...@shubes.net 2011-09-29 20:39:54 UTC ---
(In reply to comment #17)
 
 I'm presently thinking that the problem is not with rsync itself, but with 
 some
 aspect of the network connection. I'm seeing no corresponding ethernet errors
 on either end though, and the network seems to be fine otherwise.
 
 I'd have to say at this point that the failures appear to be happening only
 with large files.
 

I've (re)read through all 5 bug reports that Tim identified. A general
consensus is that this is happening with relatively large files, and relatively
slow writes (ie loaded targets).

Upon closer examination of a pair of straces, I see this on the target side:
6814  11:34:25.018774 read(0,
\3037sE\303{\264\16\377\333\357\375\7\274}1c\21\306\377\215\256\202\7\376\245\26\24\302{\16c...,
4092) = 4092
6814  11:34:25.020201 select(1, [0], [], NULL, {60, 0} unfinished ...
6814  11:35:25.009953 ... select resumed ) = 0 (Timeout)

I found the corresponding block of data in the strace on the source side:
940   11:33:57.494543 write(4,
\3037sE\303{\264\16\377\333\357\375\7\274}1c\21\306\377\215\256\202\7\376\245\26\24\302{\16c...,
4092 unfinished ...
940   11:33:57.494681 ... write resumed ) = 4092

I had previously set both hosts to use ntp with the same time server, so their
times should be very close.

In the source side strace, I counted ~510 (successful) writes of data between
the one listed above, and the point at which the source host timed out:
940   11:34:25.073394 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left
{60, 0})
940   11:34:25.073514 write(4, \374\17\0\7, 4) = 4
940   11:34:25.073620 select(5, NULL, [4], [4], {60, 0}) = 0 (Timeout)

Notice, the last successful read on the target side happened .06 seconds before
the last write on the source side, which is pretty much at the same time. At
this point, there were ~510 x 4K blocks of data (or ~2MB) in the pipe, that
had been written, but not read.

So I'm wondering, is it possible that the pipe (buffers) get filled up, and
when they do, that the select function is unable to handle it? I don't (yet)
know how these things work, so I could be totally off base. At the same time,
this would seem to explain things. 

I'll continue to poke at this to see what I can come up with. In the meantime,
any feedback from Wayne and Tim will be appreciated.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2011-09-28 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #16 from Eric Shubert e...@shubes.net 2011-09-28 20:40:15 UTC ---
I've done some further testing and would like to report my findings.

I tried the rsync to a different target host on a different IP address via a
different ISP, and lo and behold, it worked. I conclude from this that the
problem is outside of the source host, either in the network transport
somewhere or in the the target host.

I have a quickbooks backup (.qbb) file that fails pretty regularly, so I took a
closer look at the strace on the target host for this file.
On one failure, I get:
3060  19:43:19.737510 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
3060  19:43:19.737768 read(0,
\252\220\346\2\f\216\t3\\\256(\t\202\3\222\211\16\0246\310H\1\342\202\203\2\374\346\0273l\317...,
4092) = 4092
3060  19:43:19.738449 select(1, [0], [], NULL, {60, 0} unfinished ...
3059  19:43:33.847680 ... select resumed ) = 0 (Timeout)
3059  19:43:33.847918 select(4, [3], [], NULL, {60, 0} unfinished ...
3060  19:44:19.730067 ... select resumed ) = 0 (Timeout)

On a subsequent run a few minutes later, I also get a failure, but
approximately 3 seconds later. Curiously, I see this in the 2nd run at the
point of failure in the first run (notice the identical data):
3170  20:08:03.378981 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
3170  20:08:03.379237 read(0,
\252\220\346\2\f\216\t3\\\256(\t\202\3\222\211\16\0246\310H\1\342\202\203\2\374\346\0273l\317...,
4092) = 4092
3170  20:08:03.379744 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59,
64})
3170  20:08:03.728257 read(0, \374\17\0\7, 4) = 4

The 2nd run continues for 3 seconds, then times out like the previous run:
3170  20:08:07.624052 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
3170  20:08:07.624424 read(0,
\3,\234\37J\341\374\240\v\347\227W\30\277\220\342\377\222\v\347\27W\30\277\214\342\377\202\v\347\247\366...,
4092) = 4092
3170  20:08:07.625000 select(1, [0], [], NULL, {60, 0} unfinished ...
3169  20:08:48.205338 ... select resumed ) = 0 (Timeout)
3169  20:08:48.205547 select(4, [3], [], NULL, {60, 0} unfinished ...
3170  20:09:07.627769 ... select resumed ) = 0 (Timeout)

I thought perhaps the left {59, 64} was a clue given that it happened at
the exact same place as a previous failure, but I see the same sort of thing
after every 8 reads, so I don't expect it's indicative of a problem.


So now I'm beginning to think that the problem is (indeed) outside of the rsync
code. I've examined the 2 target hosts (one that worked and one that doesn't),
and tried to determine what's different?. These hosts are all VMs that are
running under VMware Server2. The host that worked is running CentOS 5.6, while
the one that's failing is running CentOS 5.4 (450 days uptime). I'm going to
upgrade the failing host OS to CentOS 5.7 and see if that changes anything. I
know that some versions of glibc ( 2.5-34,  2.5.58) had mysterious problems
with VMware Server 2 (guests terminating for no apparent reason). I'm wondering
if there's something else in glibc that's perhaps having a problem with VMware
Server 2. (I realize that Server 2 is essentially dead and will be migrating to
KVM in the near future, although Server2 has been quite stable for me.)

I'll report my findings when the upgrade is complete.
Any pointers toward how to troubleshoot this thing would be more than welcome.
TIA.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2011-09-23 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #15 from Eric Shubert e...@shubes.net 2011-09-23 15:55:27 UTC ---
(In reply to comment #14)
 It's a bit unclear what pid corresponds to what function, especially with both
 sides having 2 pids in the trace output.  What is the command-line used?

  strace -tt -f -o /tmp/rsync-$$.out rsync \
--verbose \
--exclude stor/hosts \
--exclude var/run \
--exclude var/spool \
--rsh=ssh -p $backoff_port \
--rsync-path=sudo /usr/local/bin/rsync-debug \
--recursive \
--links \
--perms \
--times \
--owner \
--group \
--numeric-ids \
--xattrs \
--delete \
--compress \
--human-readable \
--stats \
--exclude='udat/backup*/*' \
--exclude='udat/migrate/*' \
/mnt/stor \
$userid@$backoff_url:$destdir 21

As you can see, it's pushing from the local host via ssh to the remote host.

 So, looking at your trace, it looks like the sender is waiting to write more
 data (its select lists the write and error FDs).  The receiving side's 2 pids
 are waiting to read more data.  So, something in between has either failed to
 deliver the data, or blocked, or something (it would appear).  Are there other
 processes in between the 2 sides (e.g. ssh) that could have blocked?  Or
 perhaps a failed pipe (or unix domain socket)?

Here is another run, with processes. Both sides are rsync-3.0.8-1.el5.rfx.

First the sending side processes:
root  4921  4914  5 07:38 tty1 00:00:13 strace -tt -f -o
/tmp/rsync-4914.out rsync --verbose --exclude stor/hosts --exclude var/run
--exclude var/spool --rsh=ssh -p 17113 --rsync-path=sudo
/usr/local/bin/rsync-debug --recursive --links --perms --times --owner --group
--numeric-ids --xattrs --delete --compress --human-readable --stats
--exclude=udat/backup*/* --exclude=udat/migrate/* /mnt/stor
r...@backoff.shubes.net:/home/r3i/backup/tacs-udat
root  4923  4921  7 07:38 tty1 00:00:17 rsync --verbose --exclude
stor/hosts --exclude var/run --exclude var/spool --rsh=ssh -p 17113
--rsync-path=sudo /usr/local/bin/rsync-debug --recursive --links --perms
--times --owner --group --numeric-ids --xattrs --delete --compress
--human-readable --stats --exclude=udat/backup*/* --exclude=udat/migrate/*
/mnt/stor r...@backoff.shubes.net:/home/r3i/backup/tacs-udat
root  4924  4923  0 07:38 tty1 00:00:01 ssh -p 17113 -l r3i
backoff.shubes.net sudo /usr/local/bin/rsync-debug --server -vlogtpXrze.isf
--delete --numeric-ids . /home/r3i/backup/tacs-udat

The sending side strace:
4923  07:47:21.380396 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left
{60, 0})
4923  07:47:21.380537 write(4,
4\25\t\20\342Q\353\177A\0039rC\272\301\372\335\364\17?\313g\241\215\313\370%\222s\367F|...,
4092) = 4092
4923  07:47:21.380683 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left
{60, 0})
4923  07:47:21.380806 write(4, \374\17\0\7, 4) = 4
4923  07:47:21.380914 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left
{60, 0})
4923  07:47:21.381034 write(4,
\351\4n\317\346;f\321Pn~[\333\337\200\214#\33\233\365\3\5\322P\1\221\364\255\230\16...,
4092) = 4092
4923  07:47:21.382583 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left
{60, 0})
4923  07:47:21.382777 write(4, \374\17\0\7, 4) = 4
4923  07:47:21.382897 select(5, NULL, [4], [4], {60, 0} unfinished ...
4924  07:47:21.499524 ... select resumed ) = 1 (out [3])
4924  07:47:21.499685 write(3,
D\3143?I\322+\362S\231\311Xe\302\264\1\235|R\231Wyw\214\302\211\206\365^\337;\24...,
131552) = 66240
4924  07:47:21.500711 select(7, [3], [3], NULL, NULL) = 1 (out [3])
4924  07:47:22.518796 write(3,
\350l\250\255\361YB\241`__\302oM\333\346\247\223\tdcM\203O\217aA\260\t\377gm...,
98128) = 54720
4924  07:47:22.519874 select(7, [3], [3], NULL, NULL unfinished ...
4923  07:48:21.377865 ... select resumed ) = 0 (Timeout)
4923  07:48:21.377989 select(5, NULL, [4], [4], {60, 0}) = 0 (Timeout)
4923  07:49:21.379500 select(5, NULL, [4], [4], {60, 0}) = 0 (Timeout)
4923  07:50:21.378390 select(5, NULL, [4], [4], {60, 0}) = 0 (Timeout)


The receiving side processes:
root 16837 16836  0 07:39 ?00:00:00 /bin/sh
/usr/local/bin/rsync-debug --server -vlogtpXrze.i --delete --numeric-ids .
/home/r3i/backup/tacs-udat
root 16848 16837 18 07:39 ?00:01:03 strace -tt -f -o
/tmp/rsync-16837.out rsync --server -vlogtpXrze.i --delete --numeric-ids .
/home/r3i/backup/tacs-udat
root 16849 16848 15 07:39 ?00:00:51 rsync --server -vlogtpXrze.i
--delete --numeric-ids . /home/r3i/backup/tacs-udat
root 16850 16849  1 07:39 ?00:00:04 rsync --server -vlogtpXrze.i
--delete --numeric-ids . /home/r3i/backup/tacs-udat

The receiving side strace:
16850 07:47:22.332291 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
16850 07:47:22.332556 read(0, \374\17\0\7, 4) = 4
16850 07:47:22.332808 select(1, [0], [], NULL, {60, 0}) = 1 (in 

[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2011-09-22 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #11 from Eric Shubert e...@shubes.net 2011-09-22 16:54:14 UTC ---
(In reply to comment #9)
 (In reply to comment #8)
  By any chance do you consider these files big files?  I am asking
  because I saw similar timeout when sending big files (bug7757,
  bug7195).
 
 Not really. Most problem files appear to be 20-30MB, but a few are 10M and
 there's one which is only 25K. I'll keep this in mind though.
 

I need to change my answer here. The files that cause it to choke are larger
than most, so I would have to say that as a general observation the files that
fail are large(r). Not huge, but large.

I also need to correct my earlier statement/thinking. While the process did
fail on the same file on consecutive runs, leading me to believe that the
content of the file was causing the problem, I've determined that this is not
always the case. Subsequent runs have transferred these files successfully, so
it's not the content of the data that's a problem.

I've been making many (10s of) runs with strace on the target, and have noticed
a pattern, which I'll attempt to describe (not having looked at the source
code).

When transmitting these large files, the maximum size read at once is 4092
bytes. In most cases of failure, there are 7 of these 4092 byte chunks,
followed by 2 smaller chunks, which vary in size. In the sample below, they are
3699, then 393. Other value pairs I've seen are 3697 and 395, 2514 and 1578,
2758 and 1334, 1039 and 3053, 1085 and 3007, 4087 and 5, etc (all pair sums are
4092). These are followed by another 7 4092 byte chunks, then another 3699 and
393, and so on. When the process hangs, it always times out waiting for the
first of the two smaller chunks (after 7 reads of 4092 bytes).

On one failure, all reads were 4092 bytes, with no smaller chunks. In this
case, the failure occurred after 7 reads of 4092 bytes which followed a write.

I hope that this gives a clue as to where the problem might be. Please let me
know if there's something else I can provide. I'll continue to investigate.

Thanks.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2011-09-22 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #12 from Eric Shubert e...@shubes.net 2011-09-22 18:35:06 UTC ---
(In reply to comment #11)
 When the process hangs, it always times out waiting for the
 first of the two smaller chunks (after 7 reads of 4092 bytes).
 
Correction. It times out on the select function following the successful read
of the first of the two smaller chunks:
13018 10:38:47.708267 read(0, \374\17\0\7, 4) = 4
13018 10:38:47.708522 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
13018 10:38:47.708757 read(0,
\6B#\377a\357\223\360\237\251J\2477q\216+\252s\\\363X\223\264\244pC\241y\350\321\240R...,
4092) = 4092
13018 10:38:47.709034 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
13018 10:38:47.709264 read(0, \374\17\0\7, 4) = 4
13018 10:38:47.709619 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
13018 10:38:47.709862 read(0,
`\337\24f\220\0379\223\y\340\10\247\240\3702\333\360\311e/\362$\333\215\216r\24sw\361...,
4092) = 3699
13018 10:38:47.710137 select(1, [0], [], NULL, {60, 0} unfinished ...
13017 10:39:22.695271 ... select resumed ) = 0 (Timeout)
13017 10:39:22.695484 select(4, [3], [], NULL, {60, 0} unfinished ...
13018 10:39:47.701076 ... select resumed ) = 0 (Timeout)
13018 10:39:47.701266 select(1, [0], [], NULL, {60, 0} unfinished ...
13017 10:40:22.696550 ... select resumed ) = 0 (Timeout)

This is what happens in the normal flow:
13018 10:38:47.498236 read(0, \374\17\0\7, 4) = 4
13018 10:38:47.498460 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
13018 10:38:47.498710 read(0,
\266\222\6\301\257q\224\0)\302\363R\273\rJ\264g\235\0\230\311sq\277\243\260\277\322\32\313\213...,
4092) = 4092
13018 10:38:47.499589 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
13018 10:38:47.499852 read(0, \374\17\0\7, 4) = 4
13018 10:38:47.500054 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
13018 10:38:47.500278 read(0,
z\33\0\276\304\177o\246\274\225gH\30\271|\317\365(h\234\314)\223\306\260?\370\274\355\272@\21...,
4092) = 3699
13018 10:38:47.500599 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59,
81})
13018 10:38:47.698206 read(0,
\274x\365\1\247\252Y\177\7e\17\227\325\207BL\177{\354_\317\2655\352~\20\303\240\226\303...,
393) = 393
1

I'm presently attempting to capture a corresponding strace on the source side
to see what's going on there.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2011-09-22 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #13 from Eric Shubert e...@shubes.net 2011-09-22 19:25:19 UTC ---
(In reply to comment #12)
 I'm presently attempting to capture a corresponding strace on the source side
 to see what's going on there.

Here are matching straces from the sending and receiving sides.

First the receiving (time on this host is slightly ahead):
13265 12:01:56.418051 read(0, \374\17\0\7, 4) = 4
13265 12:01:56.418251 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
13265 12:01:56.418544 read(0,
\341\2373}P\226ZL\316n\273\220\344\246\r\n\215r\245I\241Z\351\225\273t\224\302p\375\375\316...,
4092) = 4092
13265 12:01:56.418855 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
13265 12:01:56.419119 read(0, \374\17\0\7, 4) = 4
13265 12:01:56.419319 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
13265 12:01:56.419564 read(0,
vmb/\216L\360\227\343\31\240x\306[T\376Dx-\354\360v0\263\f0\266#\305b\363...,
4092) = 4092
13265 12:01:56.421406 select(1, [0], [], NULL, {60, 0} unfinished ...
13264 12:02:55.255790 ... select resumed ) = 0 (Timeout)
13264 12:02:55.256043 select(4, [3], [], NULL, {60, 0} unfinished ...
13265 12:02:56.421901 ... select resumed ) = 0 (Timeout)
13265 12:02:56.422114 select(1, [0], [], NULL, {60, 0} unfinished ...
13264 12:03:55.257239 ... select resumed ) = 0 (Timeout)
13264 12:03:55.257426 select(4, [3], [], NULL, {60, 0} unfinished ...
13265 12:03:56.420390 ... select resumed ) = 0 (Timeout)

Now the sending:
2611  12:01:55.348752 write(4, \374\17\0\7, 4) = 4
2611  12:01:55.348862 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left
{60, 0})
2611  12:01:55.348983 write(4,
\17m\304\365\313M\344:\26\253Ogu\311\200Q\2\303\254\373\233T;\221\351\375\207\264\305\365z...,
4092) = 4092
2611  12:01:55.349115 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left
{60, 0})
2611  12:01:55.349236 write(4, \374\17\0\7, 4) = 4
2611  12:01:55.349344 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left
{60, 0})
2611  12:01:55.349465 write(4,
\10\2074\tC\36\336r\364\233\0_\336]\311\17\N\352\22\241O\2450\250\271\305\204Q\202\316\344...,
4092) = 4092
2611  12:01:55.353308 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left
{60, 0})
2611  12:01:55.353471 write(4, \374\17\0\7, 4) = 4
2611  12:01:55.353586 select(5, NULL, [4], [4], {60, 0} unfinished ...
2612  12:01:56.089897 ... select resumed ) = 1 (out [3])
2612  12:01:56.090021 write(3,
\326\n\324q\327dR\6\215\201?\227\234\217K:\221c\336\204|@\273\2431f9.\17KTr...,
158192) = 97920
2612  12:01:56.091620 select(7, [3], [3], NULL, NULL unfinished ...
2611  12:02:55.349271 ... select resumed ) = 0 (Timeout)
2611  12:02:55.349381 select(5, NULL, [4], [4], {60, 0}) = 0 (Timeout)
2611  12:03:55.347803 select(5, NULL, [4], [4], {60, 0}) = 0 (Timeout)
2611  12:04:55.346412 select(5, NULL, [4], [4], {60, 0} unfinished ...
2612  12:05:06.185778 ... select resumed ) = ? ERESTARTNOHAND (To be
restarted)
2611  12:05:06.185871 ... select resumed ) = ? ERESTARTNOHAND (To be
restarted)
2612  12:05:06.185926 --- SIGINT (Interrupt) @ 0 (0) ---
2611  12:05:06.186867 --- SIGINT (Interrupt) @ 0 (0) ---
2612  12:05:06.187011 sigreturn( unfinished ...

I don't know what's supposed to be happening here, so I'll be grateful for any
pointers. I suspect at this point that the problem is with the sending side,
but that's not saying much.

Thanks.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2011-09-22 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #14 from Wayne Davison way...@samba.org 2011-09-23 02:19:04 UTC 
---
It's a bit unclear what pid corresponds to what function, especially with both
sides having 2 pids in the trace output.  What is the command-line used?

So, looking at your trace, it looks like the sender is waiting to write more
data (its select lists the write and error FDs).  The receiving side's 2 pids
are waiting to read more data.  So, something in between has either failed to
deliver the data, or blocked, or something (it would appear).  Are there other
processes in between the 2 sides (e.g. ssh) that could have blocked?  Or
perhaps a failed pipe (or unix domain socket)?

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2011-09-20 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #9 from Eric Shubert e...@shubes.net 2011-09-20 14:42:37 UTC ---
(In reply to comment #8)
 By any chance do you consider these files big files?  I am asking
 because I saw similar timeout when sending big files (bug7757,
 bug7195).

Not really. Most problem files appear to be 20-30MB, but a few are 10M and
there's one which is only 25K. I'll keep this in mind though.

 Also in your next strace output, could you use -tt to
 show timestamps?  Thanks.

Here's one with timestamps:
1636  07:15:20.138388 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
1636  07:15:20.138646 read(0, \374\17\0\7, 4) = 4
1636  07:15:20.138883 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
1636  07:15:20.139121 read(0,
\211\313Wq\361\315ogBR\371\346\351\327\335\r@\20\373\276\303\372F\373\222\333\340\353\272\261\3...,
4092) = 4092
1636  07:15:20.139466 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
1636  07:15:20.139714 read(0, \374\17\0\7, 4) = 4
1636  07:15:20.139956 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
1636  07:15:20.140195 read(0,
\307\371e\245(K\17\351\201O\357\2139\221\322%\345)\201.\252G\364\221\326\256\306u(\2353\346...,
4092) = 4092
1636  07:15:20.140475 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
1636  07:15:20.140716 read(0, \374\17\0\7, 4) = 4
1636  07:15:20.140943 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
1636  07:15:20.141180 read(0,
\221\367\307\26yg0\31\t\314\250Gr\366e\301\231]\211~\253F-\320\240T\177T.\201\206\341...,
4092) = 1223
1636  07:15:20.141502 select(1, [0], [], NULL, {60, 0} unfinished ...
1635  07:15:51.190750 ... select resumed ) = 0 (Timeout)
1635  07:15:51.190975 select(4, [3], [], NULL, {60, 0} unfinished ...
1636  07:16:20.139240 ... select resumed ) = 0 (Timeout)
1636  07:16:20.139459 select(1, [0], [], NULL, {60, 0} unfinished ...
1635  07:16:51.194224 ... select resumed ) = 0 (Timeout)
1635  07:16:51.194447 select(4, [3], [], NULL, {60, 0} unfinished ...

Thanks for having a look at this Tim.

I'm going to put a problem/test file on the target machine's hdd and continue
testing with a local2local transfer. I should have mentioned that the target in
this case is a pair of USB2 HDDs in a raid-1 configuration. I seem to recall
another posting of this error involved writing to a USB drive as well. I expect
at this point that USB might be a contributing factor. I'm stumped though as to
why it has worked fine for over year before coming across this problem.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2011-09-20 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #10 from Eric Shubert e...@shubes.net 2011-09-20 15:39:39 UTC ---
(In reply to comment #9)
 
 I'm going to put a problem/test file on the target machine's hdd and continue
 testing with a local2local transfer.

I'm holding off on the local test for the moment. After closer look at the
strace, the problem appears to be with the transport/ssh mechanism (input side
on the target).

Early on in the strace, after a series (10x) of 4092-byte reads, I see this:
1636  07:12:20.682841 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
1636  07:12:20.683211 read(0, \374\17\0\7, 4) = 4
1636  07:12:20.683964 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
1636  07:12:20.684226 read(0,
\222E\23\272R\224\256\0I`\365\177\247\204\207Jf\370\277=\\q\24q\357\202\z\307\207H...,
4092) = 4092
1636  07:12:20.684653 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
1636  07:12:20.685065 read(0, \374\17\0\7, 4) = 4
1636  07:12:20.685291 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60,
0})
1636  07:12:20.685627 read(0,
v5z\260\341\346\260\335w\22\311P\37|%:2N\2177\304\0264\215#\344\274P^\307u\313...,
4092) = 1223
1636  07:12:20.685984 select(1, [0], [], NULL, {60, 0} unfinished ...
1635  07:12:20.686201 ... getdents64 resumed /* 2 entries */, 32768) = 48
1635  07:12:20.687710 getdents64(0, /* 0 entries */, 32768) = 0
1635  07:12:20.687953 close(0)  = 0

This looks like the same situation as the timeout, except that there is
getdents64 resumed instead of select resumed after the select statement.
Don't know if it's related or not.

I'll keep looking to see what I can tell from the strace.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2011-09-19 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

Eric Shubert e...@shubes.net changed:

   What|Removed |Added

 CC||e...@shubes.net

--- Comment #7 from Eric Shubert e...@shubes.net 2011-09-20 01:16:05 UTC ---
I've been running 3.0.7 setup for months with no problem, until June 2011 when
this bug showed up. I subsequently upgraded to 3.0.8, but the error persists.

Seems to be triggered by certain file content. The error happened most
frequently with files that were individual email messages (Maildir format)
containing attached pictures. When I remove the offending source file and
restart, more files are processed ok until another offending file is reached.
If I restart w/out removing the offending file, it always gets stuck on the
same file until it's removed from the list to be processed.

Then it happened with a quickbooks backup (.qbb) file as well. My guess is that
there's something in the data stream that's causing something i/o wise to seize
up.

I have kept examples of files that cause this. I set up the rsync-debug script
on the target server (not running daemon mode, just ssh), and the strace around
the error at the end is such:
31031 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0})
31031 read(0,
H\10\26}\23565\6\300RV\331\23\262M\357Q\210ngTQ\373\3523\326\6\36
2\246\\1\375..., 4092) = 4092
31031 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0})
31031 read(0, \374\17\0\7, 4) = 4
31031 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0})
31031 read(0,
+R~\344\374I\204\344_\311\235\307\340iT|q\32z=\373(\243\320\332\
32527x\336b..., 4092) = 4092
31031 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0})
31031 read(0, \374\17\0\7, 4) = 4
31031 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0})
31031 read(0,
\206/\227-W\370+\337\367\251\\h\247\346\330\233\356s\353\254\276\356\365a\206%wS\357\234:\377...,
4092) = 4092
31031 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0})
31031 read(0, \374\17\0\7, 4) = 4
31031 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0})
31031 read(0,
\262\231\222\225%\370\374\262.\262\21A\222V\300\336\0226\262\246\325=*\314\234\314\204Z[(\351w...,
4092) = 4092
31031 select(1, [0], [], NULL, {60, 0} unfinished ...
31030 ... select resumed )= 0 (Timeout)
31030 select(4, [3], [], NULL, {60, 0} unfinished ...
31031 ... select resumed )= 0 (Timeout)
31031 select(1, [0], [], NULL, {60, 0} unfinished ...
31030 ... select resumed )= 0 (Timeout)
31030 select(4, [3], [], NULL, {60, 0} unfinished ...
31031 ... select resumed )= 0 (Timeout)
31031 select(1, [0], [], NULL, {60, 0} unfinished ...
31030 ... select resumed )= 0 (Timeout)
31030 select(4, [3], [], NULL, {60, 0} unfinished ...
31031 ... select resumed )= 0 (Timeout)
etc.

If there's anything I can do to help further to get this bug fixed, please let
me know.

Thanks.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2011-09-19 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #8 from Tim Taiwanese Liim tim.l...@alcatel-lucent.com 2011-09-20 
02:12:09 UTC ---
Re: Comment #7
 The error happened most frequently with files ...  containing
 attached pictures.
Eric,
By any chance do you consider these files big files?  I am asking
because I saw similar timeout when sending big files (bug7757,
bug7195).  Also in your next strace output, could you use -tt to
show timestamps?  Thanks.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


DO NOT REPLY [Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2010-10-26 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478


tim.l...@alcatel-lucent.com changed:

   What|Removed |Added

 CC||tim.l...@alcatel-lucent.com




--- Comment #6 from tim.l...@alcatel-lucent.com  2010-10-27 00:28 CST 
---
These bugs seem to be related (having similar error messages:
connection unexpectedly closed, broken pipe, timeout).

bug7757
with big file, rsync times out out when it should not; the 
sender is still responsive

bug2783
Random high loads during syncs (server side) / client stream errors
rsync: connection unexpectedly closed (2761332 bytes received so far)
[generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(365)

bug5478
rsync: writefd_unbuffered failed to write 4092 bytes [sender]: 
Broken pipe (32)

rsync: writefd_unbuffered failed to write 4092 bytes [sender]: 
Broken pipe (32)
io timeout after 30 seconds -- exiting
rsync error: timeout in data send/receive (code 30) at io.c(239) 
[sender=3.0.2]

bug5695
improve keep-alive code to handle long-running directory scans

./io.c:void maybe_send_keepalive(void)

bug6175
write last transfer status when timeout or other error happens
rsync: writefd_unbuffered failed to write 4 bytes [sender]: 
   Broken pipe (32)
rsync: connection unexpectedly closed (99113 bytes received so 
   far) [sender]
rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.5]

bug7195
timeout reached while sending checksums for very large files


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


DO NOT REPLY [Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2008-07-28 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478





--- Comment #5 from [EMAIL PROTECTED]  2008-07-28 20:43 CST ---
Any more info on this?  You could try the latest 3.1.0dev version and try using
the --msgs2stderr option to see if it reveals a remote error that was getting
lost when things closed down.


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


DO NOT REPLY [Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2008-05-21 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478


[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|INVALID |




--- Comment #2 from [EMAIL PROTECTED]  2008-05-21 09:24 CST ---
(In reply to comment #1)
 Bug 1959 has nothing whatsoever to do with an io timeout.  That is a value
 specified either by the client or by the daemon.  Since the --timeout=0 value
 on the client didn't turn off timeouts, I have to imagine that you're using a
 daemon,

No, I'm not using a daemon. I'm using a foreground rsync command, where both
the source and the destination are directories in the local filesystem.

 and the daemon has a 30-second timeout specified (which the client is
 not allowed to override).  If so, you must specify the same (or lower, but not
 0) timeout to the client for the rsync client to be able to help out with
 keep-alive activites.  A timeout of 30 seconds is also rather short, so I'd
 suggest increasing it in the daemon's config.
 


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


DO NOT REPLY [Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2008-05-21 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478





--- Comment #3 from [EMAIL PROTECTED]  2008-05-21 09:30 CST ---
My intuition says that the bug more lies in io.c (
http://git.samba.org/?p=rsync.git;a=blob;f=io.c;hb=73cb6738b33846130c21f2903b2200fa3f1903ab
):


#define SELECT_TIMEOUT 30

...

void set_io_timeout(int secs)
{
io_timeout = secs;

if (!io_timeout || io_timeout  SELECT_TIMEOUT)
select_timeout = SELECT_TIMEOUT;
else
select_timeout = io_timeout;

allowed_lull = read_batch ? 0 : (io_timeout + 1) / 2;
}

secs has only an effect if (secs!=0)(secs=30)

Thus, supplying a timeout greater 30 or a timeout of 0 actually does not have
any effect.


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


DO NOT REPLY [Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2008-05-21 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478





--- Comment #4 from [EMAIL PROTECTED]  2008-05-21 09:55 CST ---
Interesting.  The 30-second timeout is therefore the one turned on in the error
handling to try to get a message back from the receiving side.  Usually the
receiving side vanishes before the end of the 30 seconds, so it is unusual to
see that error in this context.

You will need to diagnose why one of the receiving processes (presumably the
receiver, not the generator) is vanishing.  I'd suggest attaching strace
(strace -o FILENAME -p PID) and/or enabling core dumps (ulimit -c unlimited).


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html