Re: [mosh-devel] Mosh connections didn't come back after ca. 18000 sec over 2x NAT
Axel: FYI, we were able to reproduce your bug (it was also triggered by a long one-way MIT network outage today). The issue comes down to a bad interaction where the client and server both think the other guy is trying to DOS them and start ignoring each other's packets. Working on a fix, but just wanted to let you know we figured this one out. Thanks again for the report. Cheers, Keith On Mon, Dec 31, 2012 at 11:50 AM, Keith Winstein kei...@mit.edu wrote: On Mon, 31 Dec 2012, Axel Beckert wrote: Hi, On Mon, Dec 31, 2012 at 01:36:56AM -0500, Keith Winstein wrote: On Mon, Dec 31, 2012 at 12:47 AM, Axel Beckert a...@deuxchevaux.org wrote: It looks like the data sent from the client is not (correctly) received on the server side. Do you think you could reproduce this? It would be wonderful to capture what the server thinks is happening (i.e. the debugging output of mosh-server new -v). I've now started two mosh sessions with --server=most-server new -v to the host where already three confused mosh sessions are running in the hope that I can reporduce it there most likely. (I though can't promise anything.) It just goes to standard error, so you probably want something like mosh hostname --server='mosh-server new -v 2/tmp/moshlog.txt' Thanks, hope it happens again! -Keith ___ mosh-devel mailing list mosh-devel@mit.edu http://mailman.mit.edu/mailman/listinfo/mosh-devel ___ mosh-devel mailing list mosh-devel@mit.edu http://mailman.mit.edu/mailman/listinfo/mosh-devel
Re: [mosh-devel] Mosh connections didn't come back after ca. 18000 sec over 2x NAT
On Mon, 31 Dec 2012, Axel Beckert wrote: Hi, On Mon, Dec 31, 2012 at 01:36:56AM -0500, Keith Winstein wrote: On Mon, Dec 31, 2012 at 12:47 AM, Axel Beckert a...@deuxchevaux.org wrote: It looks like the data sent from the client is not (correctly) received on the server side. Do you think you could reproduce this? It would be wonderful to capture what the server thinks is happening (i.e. the debugging output of mosh-server new -v). I've now started two mosh sessions with --server=most-server new -v to the host where already three confused mosh sessions are running in the hope that I can reporduce it there most likely. (I though can't promise anything.) It just goes to standard error, so you probably want something like mosh hostname --server='mosh-server new -v 2/tmp/moshlog.txt' Thanks, hope it happens again! -Keith ___ mosh-devel mailing list mosh-devel@mit.edu http://mailman.mit.edu/mailman/listinfo/mosh-devel
Re: [mosh-devel] Mosh connections didn't come back after ca. 18000 sec over 2x NAT
Hi Quentin, On Sun, Dec 30, 2012 at 11:22:42AM -0500, Quentin Smith wrote: On Sun, 30 Dec 2012, Axel Beckert wrote: Any idea what could have cause such a bad lockup in a mosh connection? IIRC as of now, mosh does DNS lookups only once at start, so it couldn't be a cached bad DNS reply or such. Just to check the obvious first - to your knowledge, the server resolved to the same IP address before and after you restarted the AP? (That is, the server didn't appear to move for any reason?) That would have meant a compromise of DNS servers in at least two domains. :-) Nevertheless, I checked all for machines where this happened and for two of them I know the IP addresses by mind and they didn't change. And the other three look at least familiar. What version of mosh are you using? Mosh 1.2.3 adds a new behavior where it will try opening a new connection on a new port if it hasn't heard from the server in a while (I think 10 seconds?). This is to work around some braindead NAT devices that have behavior similar to what you're describing. It was always 1.2.3 (from the offical Debian Wheezy package) on the client side and 2x 1.2.3 (same package) on the server side and 3x 1.2.2 from Debian Backports on the server side. I still kept one of the non-responding mosh sessions open (now at 21521 seconds), so I can possibly debug that session. If you capture a tcpdump on both the client and server for ~30 seconds or so, that should show conclusively if your network is eating the packets, or the client or server is confused. Looks like the latter. I captured this on the client side: # tcpdump -i wlan0 host 78.46.73.201 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on wlan0, link-type EN10MB (Ethernet), capture size 65535 bytes 18:08:38.183852 IP xenlink.noone.org.60001 c-crosser.local.55013: UDP, length 116 18:08:38.295591 IP c-crosser.local.55013 xenlink.noone.org.60001: UDP, length 71 18:08:39.143872 IP xenlink.noone.org.60001 c-crosser.local.55013: UDP, length 133 18:08:39.253217 IP c-crosser.local.53665 xenlink.noone.org.60001: UDP, length 82 18:08:40.064163 IP xenlink.noone.org.60001 c-crosser.local.55013: UDP, length 132 18:08:40.172792 IP c-crosser.local.53665 xenlink.noone.org.60001: UDP, length 78 18:08:40.865203 IP xenlink.noone.org.60001 c-crosser.local.55013: UDP, length 133 18:08:40.97 IP c-crosser.local.53665 xenlink.noone.org.60001: UDP, length 77 18:08:41.785033 IP xenlink.noone.org.60001 c-crosser.local.53665: UDP, length 124 18:08:41.894189 IP c-crosser.local.53665 xenlink.noone.org.60001: UDP, length 69 18:08:41.963862 IP xenlink.noone.org.60001 c-crosser.local.53665: UDP, length 114 18:08:42.074045 IP c-crosser.local.53665 xenlink.noone.org.60001: UDP, length 81 18:08:42.725320 IP xenlink.noone.org.60001 c-crosser.local.53665: UDP, length 121 18:08:42.837176 IP c-crosser.local.53665 xenlink.noone.org.60001: UDP, length 79 18:08:44.046276 IP xenlink.noone.org.60001 c-crosser.local.53665: UDP, length 124 18:08:44.162170 IP c-crosser.local.53665 xenlink.noone.org.60001: UDP, length 81 18:08:44.665467 IP xenlink.noone.org.60001 c-crosser.local.53665: UDP, length 114 18:08:44.774038 IP c-crosser.local.53665 xenlink.noone.org.60001: UDP, length 83 18:08:45.983868 IP xenlink.noone.org.60001 c-crosser.local.53665: UDP, length 126 18:08:46.094821 IP c-crosser.local.53665 xenlink.noone.org.60001: UDP, length 81 18:08:46.602694 IP xenlink.noone.org.60001 c-crosser.local.53665: UDP, length 127 18:08:46.716342 IP c-crosser.local.53665 xenlink.noone.org.60001: UDP, length 75 18:08:47.544721 IP xenlink.noone.org.60001 c-crosser.local.53665: UDP, length 130 18:08:47.651795 IP c-crosser.local.53665 xenlink.noone.org.60001: UDP, length 76 18:08:49.003503 IP xenlink.noone.org.60001 c-crosser.local.53665: UDP, length 120 18:08:49.110928 IP c-crosser.local.53665 xenlink.noone.org.60001: UDP, length 78 18:08:49.746999 IP xenlink.noone.org.60001 c-crosser.local.53665: UDP, length 129 18:08:49.855992 IP c-crosser.local.45626 xenlink.noone.org.60001: UDP, length 69 Nevertheless, the top line of that mosh session (and I only have one to that host) still says mosh: Last reply 26184 seconds ago. [To quit: Ctrl-^ .] Kind regards, Axel -- /~\ Plain Text Ribbon Campaign | Axel Beckert \ / Say No to HTML in E-Mail and News| a...@deuxchevaux.org (Mail) X See http://www.asciiribbon.org/ | a...@noone.org (Mail+Jabber) / \ I love long mails: http://email.is-not-s.ms/ | http://noone.org/abe/ (Web) ___ mosh-devel mailing list mosh-devel@mit.edu http://mailman.mit.edu/mailman/listinfo/mosh-devel
Re: [mosh-devel] Mosh connections didn't come back after ca. 18000 sec over 2x NAT
Something is awry here but I am a little confused, especially about why packets are being sent so often. I wonder if somehow two sessions got slotted in to the same spot, somehow, on the intermediate NATted link. But really I don't quite understand what is going on. Is the display (visible on the client) updating the screen often or at all? Do you know what is running on the server (is it continually changing the screen state)? We may need to figure out a way for you to break in to the server with gdb and turn on verbose debugging. Is the server running the wheezy 1.2.3 package on x86, amd64, or another architecture? -Keith On Sun, 30 Dec 2012, Axel Beckert wrote: Hi Keith, On Sun, Dec 30, 2012 at 12:31:46PM -0500, Keith Winstein wrote: Thanks for the detailed report. Last reply means that the _server_ is not getting (or at least not acknowledging) packets from the _client_. (If the client were not getting packets at all, it would say Last contact.) I see. So the client-side tcpdump is somewhat as expected. Are you able to send a similar tcpdump from the server side? I hope that might help resolve the mystery. Yeah, I can do that via SSH. :-) # tcpdump -i eth0 host 212.23.103.125 and not port 22 and not host 78.46.73.207 and not host 178.63.92.236 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 19:20:26.118258 IP xenlink.noone.org.60001 212.23.103.125.52834: UDP, length 114 19:20:26.648257 IP 212.23.103.125.52834 xenlink.noone.org.60001: UDP, length 67 19:20:27.117409 IP xenlink.noone.org.60001 212.23.103.125.52834: UDP, length 120 19:20:28.120633 IP xenlink.noone.org.60001 212.23.103.125.52834: UDP, length 114 19:20:28.383553 IP 212.23.103.125.52834 xenlink.noone.org.60001: UDP, length 79 19:20:28.923537 IP 212.23.103.125.52834 xenlink.noone.org.60001: UDP, length 73 19:20:29.119180 IP xenlink.noone.org.60001 212.23.103.125.52834: UDP, length 124 19:20:30.117066 IP xenlink.noone.org.60001 212.23.103.125.52834: UDP, length 140 19:20:30.162439 IP 212.23.103.125.52834 xenlink.noone.org.60001: UDP, length 77 19:20:31.121057 IP xenlink.noone.org.60001 212.23.103.125.52834: UDP, length 134 19:20:32.118384 IP xenlink.noone.org.60001 212.23.103.125.52834: UDP, length 133 19:20:32.165360 IP 212.23.103.125.52834 xenlink.noone.org.60001: UDP, length 73 19:20:33.120720 IP xenlink.noone.org.60001 212.23.103.125.52834: UDP, length 123 19:20:33.745353 IP 212.23.103.125.52834 xenlink.noone.org.60001: UDP, length 82 19:20:34.118490 IP xenlink.noone.org.60001 212.23.103.125.52834: UDP, length 122 19:20:34.165574 IP 212.23.103.125.52834 xenlink.noone.org.60001: UDP, length 68 19:20:34.866453 IP 212.23.103.125.48736 xenlink.noone.org.60001: UDP, length 71 19:20:34.963754 IP 212.23.103.125.48736 xenlink.noone.org.60001: UDP, length 76 19:20:35.118092 IP xenlink.noone.org.60001 212.23.103.125.48736: UDP, length 126 19:20:36.121004 IP xenlink.noone.org.60001 212.23.103.125.48736: UDP, length 116 19:20:36.423953 IP 212.23.103.125.48736 xenlink.noone.org.60001: UDP, length 68 19:20:37.117481 IP xenlink.noone.org.60001 212.23.103.125.48736: UDP, length 129 19:20:37.783474 IP 212.23.103.125.48736 xenlink.noone.org.60001: UDP, length 75 19:20:38.120108 IP xenlink.noone.org.60001 212.23.103.125.48736: UDP, length 123 19:20:38.241532 IP 212.23.103.125.48736 xenlink.noone.org.60001: UDP, length 69 19:20:39.119904 IP xenlink.noone.org.60001 212.23.103.125.48736: UDP, length 126 19:20:39.284710 IP 212.23.103.125.48736 xenlink.noone.org.60001: UDP, length 72 19:20:39.882758 IP 212.23.103.125.48736 xenlink.noone.org.60001: UDP, length 81 19:20:40.118030 IP xenlink.noone.org.60001 212.23.103.125.48736: UDP, length 126 19:20:41.119489 IP xenlink.noone.org.60001 212.23.103.125.48736: UDP, length 121 19:20:41.343944 IP 212.23.103.125.48736 xenlink.noone.org.60001: UDP, length 78 19:20:41.905053 IP 212.23.103.125.48736 xenlink.noone.org.60001: UDP, length 81 19:20:42.117682 IP xenlink.noone.org.60001 212.23.103.125.48736: UDP, length 125 19:20:43.121191 IP xenlink.noone.org.60001 212.23.103.125.48736: UDP, length 124 19:20:43.342843 IP 212.23.103.125.48736 xenlink.noone.org.60001: UDP, length 74 19:20:44.121087 IP xenlink.noone.org.60001 212.23.103.125.48736: UDP, length 118 19:20:44.903687 IP 212.23.103.125.58270 xenlink.noone.org.60001: UDP, length 71 19:20:45.120348 IP xenlink.noone.org.60001 212.23.103.125.58270: UDP, length 127 19:20:45.765318 IP 212.23.103.125.58270 xenlink.noone.org.60001: UDP, length 74 19:20:46.122775 IP xenlink.noone.org.60001 212.23.103.125.58270: UDP, length 132 19:20:46.708392 IP 212.23.103.125.58270 xenlink.noone.org.60001: UDP, length 76 19:20:46.710505 IP 212.23.103.125.58270 xenlink.noone.org.60001: UDP, length 70