Re: [Dovecot] dsync timeout?
Sean Kamath writes: > On Jan 30, 2013, at 3:46 PM, micah anderson wrote: >> Seems that only the above process was still around and no other dsync >> processes. I have three machines that all have this happening it seems. >> >> I wonder if there is a ssh configuration option I could set to make >> these die off. > > If the ssh process isn't sending anything, and just waiting for read()s, and > keepalives are turned off, the SSH session might never know the remote side > is long gone. . . This time I managed to capture a process that was stuck and look at it from the server side, and the client side: on the server: 2000 19470 0.0 0.0 7512 3816 ?Ss Feb05 0:01 /usr/bin/dsync dsync-server -E -u foo # strace -s 1024 -F -p 19470 Process 19470 attached - interrupt to quit write(2, "dsync-remote(foo): Error: mdbox /srv/maildirbackups/foo/daily.1/storage: Duplicate GUID 96860517f68aa94f8b5197f19f0b in m.41:682501 and m.37:653225\n", 167 on the client: root 19001 0.0 0.0 41308 1600 ?SFeb05 0:00 ssh -i /root/.ssh/backmaildir_id_rsa backmaildir@hoopoe-pn /usr/bin/dsync -u foo server # strace -s 1024 -F -p 19001 Process 19001 attached - interrupt to quit select(8, [4], [], NULL, NULL interestingly, now that I've been watching this more, the same users keep getting wedged. When I attempt to do a dsync of that user by hand, I get this: dsync-local(foo): Error: Unexpected reply from server: 13 d2a100118c45d24f760f97f19f0b3561128 \Recent 1353980259 I tried one of the other users that was stuck, and it gave me: dsync-remote(bar): Error: Corrupted dbox file /srv/maildirbackups/bar/daily.1/storage/m.130 (around offset=22532): msg header has bad magic value This looks like there is something corrupted with the dbox for the user on the client side, is there something I can do to repair those? > If any data were transmitted, it would discover the remote side is turned off. One thing I am doing is using a ssh controlmaster socket, and if I kill the process on the client's side, the server side process also dies. micah
Re: [Dovecot] dsync timeout?
On Feb 1, 2013, at 8:09 AM, micah anderson wrote: > Sean Kamath writes: > >> On Jan 30, 2013, at 3:46 PM, micah anderson wrote: >>> Seems that only the above process was still around and no other dsync >>> processes. I have three machines that all have this happening it seems. >>> >>> I wonder if there is a ssh configuration option I could set to make >>> these die off. >> >> If the ssh process isn't sending anything, and just waiting for read()s, and >> keepalives are turned off, the SSH session might never know the remote side >> is long gone. . . >> >> If any data were transmitted, it would discover the remote side is turned >> off. >> >> See man ssh_config and the option TCPKeepAlive. >> >> BTW: Since it's not on the command line, it's likely in /etc/ssh_config or >> /etc/ssh/ssh_config. Or ~/.ssh/config. > > In /etc/ssh/sshd_config on the server I'm sending to, TCPKeepAlive yes > is set. Did you check ~/.ssh/config for the user running the dsync? > The default on this system, according to the man page, seems to be to > have TCPKeepAlive set. > > Perhaps I should set ServerAliveInterval? Perhaps. That states how long to send the KeepAlive packet. There are many settings that can affect this, including ServerAliveCountMax ServerAliveInterval TCPKeepAlive There is also the sshd_config settings ClientAliveCountMax ClientAliveInterval TCPKeepAlive At this point, I think you need to see what's happening on both sides of the SSH connection. I don't recall what system you're on, but for linux you can use netstat -anp (as root) to find out what process is connected to which port, and on linux and other systems you can use lsof to find out what is connected to ports. Maybe the TCP port is open and valid and there's no data coming through? This can happen if, for example, you have any port forwarding or X session forwarding through SSH (i.e., if ssh -X is the default) and something accidentally is holding that port open (this can happen in your regular shell if, for example, you have something open an X application and you forget (because you backgrounded it) -- you're logout of the server will hang until the X applications are closed. Note that it isn't always a visible client that will do this. :-(). Sean
Re: [Dovecot] dsync timeout?
Sean Kamath writes: > On Jan 30, 2013, at 3:46 PM, micah anderson wrote: >> Seems that only the above process was still around and no other dsync >> processes. I have three machines that all have this happening it seems. >> >> I wonder if there is a ssh configuration option I could set to make >> these die off. > > If the ssh process isn't sending anything, and just waiting for read()s, and > keepalives are turned off, the SSH session might never know the remote side > is long gone. . . > > If any data were transmitted, it would discover the remote side is turned off. > > See man ssh_config and the option TCPKeepAlive. > > BTW: Since it's not on the command line, it's likely in /etc/ssh_config or > /etc/ssh/ssh_config. Or ~/.ssh/config. In /etc/ssh/sshd_config on the server I'm sending to, TCPKeepAlive yes is set. The default on this system, according to the man page, seems to be to have TCPKeepAlive set. Perhaps I should set ServerAliveInterval? micah
Re: [Dovecot] dsync timeout?
On Jan 30, 2013, at 3:46 PM, micah anderson wrote: > Seems that only the above process was still around and no other dsync > processes. I have three machines that all have this happening it seems. > > I wonder if there is a ssh configuration option I could set to make > these die off. If the ssh process isn't sending anything, and just waiting for read()s, and keepalives are turned off, the SSH session might never know the remote side is long gone. . . If any data were transmitted, it would discover the remote side is turned off. See man ssh_config and the option TCPKeepAlive. BTW: Since it's not on the command line, it's likely in /etc/ssh_config or /etc/ssh/ssh_config. Or ~/.ssh/config. Sean
Re: [Dovecot] dsync timeout?
Timo Sirainen writes: > On 31.1.2013, at 0.06, Micah Anderson wrote: > >> I'm using dsync for a regular backup. The backup system flocks so that >> two cannot run at the same time, which is generally a good thing. The >> problem is that it seems like dsync sometimes goes off into the weeds >> and never comes back, leaving a process running and doing nothing >> forever, hogging the lock and causing my backups never to run again. I >> just finally figured out that was what was causing the backups not to >> run on this system was this process: >> >> root 17836 0.0 0.0 40888 1600 ?S 2012 0:00 ssh -i >> /root/.ssh/backmaildir_id_rsa backmaildir@arg /usr/bin/dsync -u foobar server >> >> yeah, that has been running since 2012 :( > > So that's the ssh process. What about the dsync process that started it? > Does/did it exist? Seems that only the above process was still around and no other dsync processes. I have three machines that all have this happening it seems. I wonder if there is a ssh configuration option I could set to make these die off. >> There doesn't seem to be a timeout in dsync, but perhaps there should >> be? At this point my only option is to write a cronjob that will look >> for dsync processes that are over a certain amount of time old and then >> kill them, after I do that I will need to take a shower because that is >> a very dirty solution :P > > There is a 15 minute timeout in dsync after which it stops itself. Normally > the child process should also die.. v2.2 now will make sure that the child > process dies: http://hg.dovecot.org/dovecot-2.2/rev/070ca24e5846 Interesting... I wonder why the child is not dying off properly, maybe the wrong signal is sent? looking forward to using 2.2! micah --
Re: [Dovecot] dsync timeout?
On 31.1.2013, at 0.06, Micah Anderson wrote: > I'm using dsync for a regular backup. The backup system flocks so that > two cannot run at the same time, which is generally a good thing. The > problem is that it seems like dsync sometimes goes off into the weeds > and never comes back, leaving a process running and doing nothing > forever, hogging the lock and causing my backups never to run again. I > just finally figured out that was what was causing the backups not to > run on this system was this process: > > root 17836 0.0 0.0 40888 1600 ?S 2012 0:00 ssh -i > /root/.ssh/backmaildir_id_rsa backmaildir@arg /usr/bin/dsync -u foobar server > > yeah, that has been running since 2012 :( So that's the ssh process. What about the dsync process that started it? Does/did it exist? > There doesn't seem to be a timeout in dsync, but perhaps there should > be? At this point my only option is to write a cronjob that will look > for dsync processes that are over a certain amount of time old and then > kill them, after I do that I will need to take a shower because that is > a very dirty solution :P There is a 15 minute timeout in dsync after which it stops itself. Normally the child process should also die.. v2.2 now will make sure that the child process dies: http://hg.dovecot.org/dovecot-2.2/rev/070ca24e5846
[Dovecot] dsync timeout?
I'm using dsync for a regular backup. The backup system flocks so that two cannot run at the same time, which is generally a good thing. The problem is that it seems like dsync sometimes goes off into the weeds and never comes back, leaving a process running and doing nothing forever, hogging the lock and causing my backups never to run again. I just finally figured out that was what was causing the backups not to run on this system was this process: root 17836 0.0 0.0 40888 1600 ?S 2012 0:00 ssh -i /root/.ssh/backmaildir_id_rsa backmaildir@arg /usr/bin/dsync -u foobar server yeah, that has been running since 2012 :( root:/tmp# strace -p 17836 Process 17836 attached - interrupt to quit select(8, [4], [], NULL, NULL very exciting... There doesn't seem to be a timeout in dsync, but perhaps there should be? At this point my only option is to write a cronjob that will look for dsync processes that are over a certain amount of time old and then kill them, after I do that I will need to take a shower because that is a very dirty solution :P thanks for any ideas, or help! micah --