[Bug 15163] rsync timeout non-effective and incorrect behaviour

2022-12-13 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=15163

--- Comment #6 from roland  ---
nobody has a clue?

i think proper rsync timeout handling is important.

i have had whole nightly backup procedures hung for the whole night because
rsync got stuck and didn't get timeout, i.e. machines did not get proper backup
because of this. 

i would like to help making this more reliable. i think i have provided some
useful input for it.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 15163] rsync timeout non-effective and incorrect behaviour

2022-09-11 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=15163

--- Comment #5 from roland  ---
apparently, this is causing the problem:

if (am_receiver) {
return;
}



if i comment out the return statement, things work again.


@wayne, what is the reason that timeout checking is being skipped when
"am_receiver" = true ? 

I don't understand the code well enough 

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 15163] rsync timeout non-effective and incorrect behaviour

2022-09-11 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=15163

--- Comment #4 from roland  ---
here with debug=all


working correctly
# /root/rsync/rsync-3.2.5/rsync -avi --timeout=5 --exclude='/proc'
--exclude='/dev/' --exclude='/sys' --debug=all --msgs2stderr  
root@172.20.37.189:/iscsipool /zfspool/backup/172.20.37.189/backup
opening connection using: ssh -l root 172.20.37.189 rsync --server --sender
-vlogDtpre.iLsfxCIvu --msgs2stderr --timeout=5 . /iscsipool  (12 args)
(Client) Protocol versions: remote=31, negotiated=31
Client negotiated checksum: md5
receiving incremental file list
[Receiver] send_msg(0, 36)
[Receiver] io timeout after 6 seconds -- exiting
rsync error: timeout in data send/receive (code 30) at io.c(197)
[Receiver=3.2.5]
[Receiver] _exit_cleanup(code=30, file=io.c, line=197): about to call exit(30)



hang forever
# /root/rsync/rsync-3.2.5/rsync -avi --timeout=5 --exclude='/proc'
--exclude='/dev/' --exclude='/sys' --debug=all --msgs2stderr  
root@172.20.37.189:/ /zfspool/backup/172.20.37.189/backup 2>&1 |head -n50
opening connection using: ssh -l root 172.20.37.189 rsync --server --sender
-vlogDtpre.iLsfxCIvu --msgs2stderr --timeout=5 . /  (12 args)
(Client) Protocol versions: remote=31, negotiated=31
Client negotiated checksum: md5
receiving incremental file list
[Receiver] send_msg(0, 36)
[Receiver] got msg=0, len=0
[Receiver] got msg=0, len=30051
get_local_name count=25 /zfspool/backup/172.20.37.189/backup
[Receiver] change_dir(/zfspool/backup/172.20.37.189/backup)
generator starting pid=9452
delta-transmission enabled
recv_generator(.,0)
recv_files(25) starting
recv_generator(.,1)



recv_generator(etc/ssl/certs/f39fc864.0,1770)
recv_generator(etc/ssl/certs/f51bb24c.0,1771)
recv_generator(etc/ssl/certs/fc5a8f99.0,1772)
recv_generator(etc/ssl/certs/fe8a2cd8.0,1773)
recv_generator(etc/ssl/certs/ff34af3f.0,1774)

[receiver] send_msg(0, 0)
[generator] send_msg(0, 0)
[generator] got msg=0, len=0


[generator] send_msg(0, 0)
[generator] send_msg(0, 0)
[generator] send_msg(0, 0)
[generator] send_msg(0, 0)
[generator] send_msg(0, 0)
[generator] send_msg(0, 0)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 15163] rsync timeout non-effective and incorrect behaviour

2022-09-11 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=15163

--- Comment #3 from roland  ---
ah, getting a clue

in io.c

static void check_timeout(BOOL allow_keepalive, int keepalive_flags)
{
time_t t, chk;

/* On the receiving side, the generator is now the one that decides
 * when a timeout has occurred.  When it is sifting through a lot of
 * files looking for work, it will be sending keep-alive messages to
 * the sender, and even though the receiver won't be sending/receiving
 * anything (not even keep-alive messages), the successful writes to
 * the sender will keep things going.  If the receiver is actively
 * receiving data, it will ensure that the generator knows that it is
 * not idle by sending the generator keep-alive messages (since the
 * generator might be blocked trying to send checksums, it needs to
 * know that the receiver is active).  Thus, as long as one or the
 * other is successfully doing work, the generator will not timeout. */
if (!io_timeout)
return;

t = time(NULL);

if (allow_keepalive) {
/* This may put data into iobuf.msg w/o flushing. */
maybe_send_keepalive(t, keepalive_flags);
}

if (!last_io_in)
last_io_in = t;

if (am_receiver)
return;

chk = MAX(last_io_out, last_io_in);
if (t - chk >= io_timeout) {
if (am_server)
msgs2stderr = 1;
rprintf(FERROR, "[%s] io timeout after %d seconds --
exiting\n",
who_am_i(), (int)(t-chk));
exit_cleanup(RERR_TIMEOUT);
}
}


so , apparently when syncing "/" instead of "/iscsipool", keepalive messages
come into the play

i see keepalive messages getting sent, but the are NOT replied by the remote
rsync machine, as strace shows. there is only sshd process actively processing
data.


 | 0  00 00 00 07    |
[pid 131555] 09:57:47.454101 select(14, [4 5 11 13], [], NULL, NULL) = 1 (in
[4])
[pid 131555] 09:57:50.455771 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
[pid 131555] 09:57:50.455887 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 131555] 09:57:50.455963 read(4, "\0\0\0
\v\10\372\371(\256V\16$\17\375Z5\252Km-\357\311~A\352\202\232\365\234\33>"...,
16384) = 44
[pid 131555] 09:57:50.456064 select(14, [4 5 11 13], [10], NULL, NULL) = 1 (out
[10])
[pid 131555] 09:57:50.456225 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
[pid 131555] 09:57:50.456745 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 131555] 09:57:50.456950 write(10, "\0\0\0\7", 4) = 4
 | 0  00 00 00 07    |
[pid 131555] 09:57:50.457163 select(14, [4 5 11 13], [], NULL, NULL) = 1 (in
[4])
[pid 131555] 09:57:53.459934 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
[pid 131555] 09:57:53.460391 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 131555] 09:57:53.460602 read(4, "\0\0\0
\303M\204S\275D\23r\225a\210\247Q\314\256\373]\\C\27\246\241\265\212\235\210\312o"...,
16384) = 44
[pid 131555] 09:57:53.460817 select(14, [4 5 11 13], [10], NULL, NULL) = 1 (out
[10])
[pid 131555] 09:57:53.461023 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
[pid 131555] 09:57:53.461221 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 131555] 09:57:53.461314 write(10, "\0\0\0\7", 4) = 4
 | 0  00 00 00 07    |
[pid 131555] 09:57:53.461526 select(14, [4 5 11 13], [], NULL, NULL) = 1 (in
[4])
[pid 131555] 09:57:56.463410 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
[pid 131555] 09:57:56.463858 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 131555] 09:57:56.464211 read(4, "\0\0\0
\367\270\356\21\304-u\200cQ\16\350UAo\264*\231B\303\275\250\221Nzo\33\231"...,
16384) = 44
[pid 131555] 09:57:56.464336 select(14, [4 5 11 13], [10], NULL, NULL) = 1 (out
[10])
[pid 131555] 09:57:56.464684 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
[pid 131555] 09:57:56.464915 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 131555] 09:57:56.465239 write(10, "\0\0\0\7", 4) = 4
 | 0  00 00 00 07    |
[pid 131555] 09:57:56.465354 select(14, [4 5 11 13], [], NULL, NULL^Z
[1]+  Angehalten  strace -f -p 131431 -tt -e write=all


root@debian:~# ps -ef|grep 131555
root  131555  131431  0 09:57 ?00:00:00 sshd: root@notty
root  131562  131555  3 09:57 ?00:00:02 rsync --server --sender
-vlogDtpre.iLsfxCIvu --timeout=5 . /
root  131565  131540  0 09:58 pts/100:00:00 grep 131555

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: 

[Bug 15163] rsync timeout non-effective and incorrect behaviour

2022-09-11 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=15163

--- Comment #2 from roland  ---
here is another bugreport, where timeout is not effective/working

https://bugzilla.redhat.com/show_bug.cgi?id=1944132

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html