Re: [mosh-devel] Mosh connections didn't come back after ca. 18000 sec over 2x NAT

2013-01-13 Thread Keith Winstein
Axel: FYI, we were able to reproduce your bug (it was also triggered
by a long one-way MIT network outage today). The issue comes down to a
bad interaction where the client and server both think the other guy
is trying to DOS them and start ignoring each other's packets. Working
on a fix, but just wanted to let you know we figured this one out.
Thanks again for the report.

Cheers,
Keith

On Mon, Dec 31, 2012 at 11:50 AM, Keith Winstein kei...@mit.edu wrote:
 On Mon, 31 Dec 2012, Axel Beckert wrote:

 Hi,

 On Mon, Dec 31, 2012 at 01:36:56AM -0500, Keith Winstein wrote:
 On Mon, Dec 31, 2012 at 12:47 AM, Axel Beckert a...@deuxchevaux.org wrote:
 It looks like the data sent from the client is not (correctly)
 received on the server side.

 Do you think you could reproduce this? It would be wonderful to
 capture what the server thinks is happening (i.e. the debugging output
 of mosh-server new -v).

 I've now started two mosh sessions with --server=most-server new -v
 to the host where already three confused mosh sessions are running in
 the hope that I can reporduce it there most likely. (I though can't
 promise anything.)

 It just goes to standard error, so you probably want something like

 mosh hostname --server='mosh-server new -v 2/tmp/moshlog.txt'

 Thanks, hope it happens again!

 -Keith
 ___
 mosh-devel mailing list
 mosh-devel@mit.edu
 http://mailman.mit.edu/mailman/listinfo/mosh-devel
___
mosh-devel mailing list
mosh-devel@mit.edu
http://mailman.mit.edu/mailman/listinfo/mosh-devel


Re: [mosh-devel] Mosh connections didn't come back after ca. 18000 sec over 2x NAT

2012-12-31 Thread Keith Winstein
On Mon, 31 Dec 2012, Axel Beckert wrote:

 Hi,

 On Mon, Dec 31, 2012 at 01:36:56AM -0500, Keith Winstein wrote:
 On Mon, Dec 31, 2012 at 12:47 AM, Axel Beckert a...@deuxchevaux.org wrote:
 It looks like the data sent from the client is not (correctly)
 received on the server side.

 Do you think you could reproduce this? It would be wonderful to
 capture what the server thinks is happening (i.e. the debugging output
 of mosh-server new -v).

 I've now started two mosh sessions with --server=most-server new -v
 to the host where already three confused mosh sessions are running in
 the hope that I can reporduce it there most likely. (I though can't
 promise anything.)

It just goes to standard error, so you probably want something like

mosh hostname --server='mosh-server new -v 2/tmp/moshlog.txt'

Thanks, hope it happens again!

-Keith
___
mosh-devel mailing list
mosh-devel@mit.edu
http://mailman.mit.edu/mailman/listinfo/mosh-devel


Re: [mosh-devel] Mosh connections didn't come back after ca. 18000 sec over 2x NAT

2012-12-30 Thread Axel Beckert
Hi Quentin,

On Sun, Dec 30, 2012 at 11:22:42AM -0500, Quentin Smith wrote:
 On Sun, 30 Dec 2012, Axel Beckert wrote:
  Any idea what could have cause such a bad lockup in a mosh connection?
  IIRC as of now, mosh does DNS lookups only once at start, so it
  couldn't be a cached bad DNS reply or such.
 
 Just to check the obvious first - to your knowledge, the server
 resolved to the same IP address before and after you restarted the
 AP? (That is, the server didn't appear to move for any reason?)

That would have meant a compromise of DNS servers in at least two
domains. :-)

Nevertheless, I checked all for machines where this happened and for
two of them I know the IP addresses by mind and they didn't change.
And the other three look at least familiar.

 What version of mosh are you using? Mosh 1.2.3 adds a new behavior
 where it will try opening a new connection on a new port if it
 hasn't heard from the server in a while (I think 10 seconds?). This
 is to work around some braindead NAT devices that have behavior
 similar to what you're describing.

It was always 1.2.3 (from the offical Debian Wheezy package) on the
client side and 2x 1.2.3 (same package) on the server side and 3x
1.2.2 from Debian Backports on the server side.

  I still kept one of the non-responding mosh sessions open (now at
  21521 seconds), so I can possibly debug that session.

 If you capture a tcpdump on both the client and server for ~30
 seconds or so, that should show conclusively if your network is
 eating the packets, or the client or server is confused.

Looks like the latter. I captured this on the client side:

# tcpdump -i wlan0 host 78.46.73.201
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wlan0, link-type EN10MB (Ethernet), capture size 65535 bytes
18:08:38.183852 IP xenlink.noone.org.60001  c-crosser.local.55013: UDP, length 
116
18:08:38.295591 IP c-crosser.local.55013  xenlink.noone.org.60001: UDP, length 
71
18:08:39.143872 IP xenlink.noone.org.60001  c-crosser.local.55013: UDP, length 
133
18:08:39.253217 IP c-crosser.local.53665  xenlink.noone.org.60001: UDP, length 
82
18:08:40.064163 IP xenlink.noone.org.60001  c-crosser.local.55013: UDP, length 
132
18:08:40.172792 IP c-crosser.local.53665  xenlink.noone.org.60001: UDP, length 
78
18:08:40.865203 IP xenlink.noone.org.60001  c-crosser.local.55013: UDP, length 
133
18:08:40.97 IP c-crosser.local.53665  xenlink.noone.org.60001: UDP, length 
77
18:08:41.785033 IP xenlink.noone.org.60001  c-crosser.local.53665: UDP, length 
124
18:08:41.894189 IP c-crosser.local.53665  xenlink.noone.org.60001: UDP, length 
69
18:08:41.963862 IP xenlink.noone.org.60001  c-crosser.local.53665: UDP, length 
114
18:08:42.074045 IP c-crosser.local.53665  xenlink.noone.org.60001: UDP, length 
81
18:08:42.725320 IP xenlink.noone.org.60001  c-crosser.local.53665: UDP, length 
121
18:08:42.837176 IP c-crosser.local.53665  xenlink.noone.org.60001: UDP, length 
79
18:08:44.046276 IP xenlink.noone.org.60001  c-crosser.local.53665: UDP, length 
124
18:08:44.162170 IP c-crosser.local.53665  xenlink.noone.org.60001: UDP, length 
81
18:08:44.665467 IP xenlink.noone.org.60001  c-crosser.local.53665: UDP, length 
114
18:08:44.774038 IP c-crosser.local.53665  xenlink.noone.org.60001: UDP, length 
83
18:08:45.983868 IP xenlink.noone.org.60001  c-crosser.local.53665: UDP, length 
126
18:08:46.094821 IP c-crosser.local.53665  xenlink.noone.org.60001: UDP, length 
81
18:08:46.602694 IP xenlink.noone.org.60001  c-crosser.local.53665: UDP, length 
127
18:08:46.716342 IP c-crosser.local.53665  xenlink.noone.org.60001: UDP, length 
75
18:08:47.544721 IP xenlink.noone.org.60001  c-crosser.local.53665: UDP, length 
130
18:08:47.651795 IP c-crosser.local.53665  xenlink.noone.org.60001: UDP, length 
76
18:08:49.003503 IP xenlink.noone.org.60001  c-crosser.local.53665: UDP, length 
120
18:08:49.110928 IP c-crosser.local.53665  xenlink.noone.org.60001: UDP, length 
78
18:08:49.746999 IP xenlink.noone.org.60001  c-crosser.local.53665: UDP, length 
129
18:08:49.855992 IP c-crosser.local.45626  xenlink.noone.org.60001: UDP, length 
69

Nevertheless, the top line of that mosh session (and I only have one
to that host) still says mosh: Last reply 26184 seconds ago. [To
quit: Ctrl-^ .]

Kind regards, Axel
-- 
/~\  Plain Text Ribbon Campaign   | Axel Beckert
\ /  Say No to HTML in E-Mail and News| a...@deuxchevaux.org  (Mail)
 X   See http://www.asciiribbon.org/  | a...@noone.org (Mail+Jabber)
/ \  I love long mails: http://email.is-not-s.ms/ | http://noone.org/abe/ (Web)
___
mosh-devel mailing list
mosh-devel@mit.edu
http://mailman.mit.edu/mailman/listinfo/mosh-devel


Re: [mosh-devel] Mosh connections didn't come back after ca. 18000 sec over 2x NAT

2012-12-30 Thread Keith Winstein
Something is awry here but I am a little confused, especially about why 
packets are being sent so often. I wonder if somehow two sessions got 
slotted in to the same spot, somehow, on the intermediate NATted link. But 
really I don't quite understand what is going on.

Is the display (visible on the client) updating the screen often or at 
all? Do you know what is running on the server (is it continually changing 
the screen state)?

We may need to figure out a way for you to break in to the server with gdb 
and turn on verbose debugging. Is the server running the wheezy 1.2.3 
package on x86, amd64, or another architecture?

-Keith

On Sun, 30 Dec 2012, Axel Beckert wrote:

 Hi Keith,

 On Sun, Dec 30, 2012 at 12:31:46PM -0500, Keith Winstein wrote:
 Thanks for the detailed report. Last reply means that the _server_
 is not getting (or at least not acknowledging) packets from the
 _client_. (If the client were not getting packets at all, it would
 say Last contact.)

 I see.

 So the client-side tcpdump is somewhat as expected. Are you able to
 send a similar tcpdump from the server side? I hope that might help
 resolve the mystery.

 Yeah, I can do that via SSH. :-)

 # tcpdump -i eth0 host 212.23.103.125 and not port 22 and not host 
 78.46.73.207 and not host 178.63.92.236
 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
 listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
 19:20:26.118258 IP xenlink.noone.org.60001  212.23.103.125.52834: UDP, 
 length 114
 19:20:26.648257 IP 212.23.103.125.52834  xenlink.noone.org.60001: UDP, 
 length 67
 19:20:27.117409 IP xenlink.noone.org.60001  212.23.103.125.52834: UDP, 
 length 120
 19:20:28.120633 IP xenlink.noone.org.60001  212.23.103.125.52834: UDP, 
 length 114
 19:20:28.383553 IP 212.23.103.125.52834  xenlink.noone.org.60001: UDP, 
 length 79
 19:20:28.923537 IP 212.23.103.125.52834  xenlink.noone.org.60001: UDP, 
 length 73
 19:20:29.119180 IP xenlink.noone.org.60001  212.23.103.125.52834: UDP, 
 length 124
 19:20:30.117066 IP xenlink.noone.org.60001  212.23.103.125.52834: UDP, 
 length 140
 19:20:30.162439 IP 212.23.103.125.52834  xenlink.noone.org.60001: UDP, 
 length 77
 19:20:31.121057 IP xenlink.noone.org.60001  212.23.103.125.52834: UDP, 
 length 134
 19:20:32.118384 IP xenlink.noone.org.60001  212.23.103.125.52834: UDP, 
 length 133
 19:20:32.165360 IP 212.23.103.125.52834  xenlink.noone.org.60001: UDP, 
 length 73
 19:20:33.120720 IP xenlink.noone.org.60001  212.23.103.125.52834: UDP, 
 length 123
 19:20:33.745353 IP 212.23.103.125.52834  xenlink.noone.org.60001: UDP, 
 length 82
 19:20:34.118490 IP xenlink.noone.org.60001  212.23.103.125.52834: UDP, 
 length 122
 19:20:34.165574 IP 212.23.103.125.52834  xenlink.noone.org.60001: UDP, 
 length 68
 19:20:34.866453 IP 212.23.103.125.48736  xenlink.noone.org.60001: UDP, 
 length 71
 19:20:34.963754 IP 212.23.103.125.48736  xenlink.noone.org.60001: UDP, 
 length 76
 19:20:35.118092 IP xenlink.noone.org.60001  212.23.103.125.48736: UDP, 
 length 126
 19:20:36.121004 IP xenlink.noone.org.60001  212.23.103.125.48736: UDP, 
 length 116
 19:20:36.423953 IP 212.23.103.125.48736  xenlink.noone.org.60001: UDP, 
 length 68
 19:20:37.117481 IP xenlink.noone.org.60001  212.23.103.125.48736: UDP, 
 length 129
 19:20:37.783474 IP 212.23.103.125.48736  xenlink.noone.org.60001: UDP, 
 length 75
 19:20:38.120108 IP xenlink.noone.org.60001  212.23.103.125.48736: UDP, 
 length 123
 19:20:38.241532 IP 212.23.103.125.48736  xenlink.noone.org.60001: UDP, 
 length 69
 19:20:39.119904 IP xenlink.noone.org.60001  212.23.103.125.48736: UDP, 
 length 126
 19:20:39.284710 IP 212.23.103.125.48736  xenlink.noone.org.60001: UDP, 
 length 72
 19:20:39.882758 IP 212.23.103.125.48736  xenlink.noone.org.60001: UDP, 
 length 81
 19:20:40.118030 IP xenlink.noone.org.60001  212.23.103.125.48736: UDP, 
 length 126
 19:20:41.119489 IP xenlink.noone.org.60001  212.23.103.125.48736: UDP, 
 length 121
 19:20:41.343944 IP 212.23.103.125.48736  xenlink.noone.org.60001: UDP, 
 length 78
 19:20:41.905053 IP 212.23.103.125.48736  xenlink.noone.org.60001: UDP, 
 length 81
 19:20:42.117682 IP xenlink.noone.org.60001  212.23.103.125.48736: UDP, 
 length 125
 19:20:43.121191 IP xenlink.noone.org.60001  212.23.103.125.48736: UDP, 
 length 124
 19:20:43.342843 IP 212.23.103.125.48736  xenlink.noone.org.60001: UDP, 
 length 74
 19:20:44.121087 IP xenlink.noone.org.60001  212.23.103.125.48736: UDP, 
 length 118
 19:20:44.903687 IP 212.23.103.125.58270  xenlink.noone.org.60001: UDP, 
 length 71
 19:20:45.120348 IP xenlink.noone.org.60001  212.23.103.125.58270: UDP, 
 length 127
 19:20:45.765318 IP 212.23.103.125.58270  xenlink.noone.org.60001: UDP, 
 length 74
 19:20:46.122775 IP xenlink.noone.org.60001  212.23.103.125.58270: UDP, 
 length 132
 19:20:46.708392 IP 212.23.103.125.58270  xenlink.noone.org.60001: UDP, 
 length 76
 19:20:46.710505 IP 212.23.103.125.58270  xenlink.noone.org.60001: UDP, 
 length 70