Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
On Thu, Oct 01 2015 14:29:32 +0200, Richard PALO wrote: > >> Actually I still notice some problems.. This morning in the direction OI > >> => omnios > >> things seemed okay. > >> Now, omnios => OI I just now experienced the hang again, and it is > >> repeatable. > >> > >> Could it be that your workaround is only useful for outbound connections > >> (relative to OI)? > > > > Yeah, it's possible. Whoever sends the SYN expresses their capability to > > timestamp by including the tsopt, and you can disable that with the ndd > > options. I assumed that the ndd options would affect SYNACK as well, but > > I didn't actually read the code; I guess that's not the case after all, > > so inbound connections still get timestamping negotiated. I don't have a > > workaround for this, sorry. > > > > Too bad. Naturally it isn't feasible to turn things off via ndd on omnios > for just one target. > Is there any way to do that differently? That is, for only one target (and > primarily ssh)? Not that I know of. -- Lauri Tirkkonen | lotheac @ IRCnet ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
Le 01/10/15 14:23, Lauri Tirkkonen a écrit : > On Thu, Oct 01 2015 13:49:03 +0200, Richard PALO wrote: >> Le 01/10/15 11:58, Lauri Tirkkonen a écrit : >>> On Thu, Oct 01 2015 11:50:03 +0200, Richard PALO wrote: >> In that case, wouldn't setting tcp_tstamp_always on OI to '1' be better >> in >> this case (or would OI not honour that setting correctly)? > > It wouldn't work. From what I can tell, those ndd settings only affect > the SYN segments (ie. timestamp negotiation); pre-5850 illumos will > always stop timestamping mid-connection if it receives a non-timestamped > segment. > Okay, I set tcp_tstamp_if_wscale to 0 and it does seem to work fine. >>> >>> Thanks, that pretty much confirms the issue is what I suspected it is. >>> (Hoping there isn't any fallout from doing this now...) >>> >>> As long as that middlebox has been mucking with your traffic in the way >>> it is, timestamps have been getting turned off mid-connection for your >>> pre-5850 box. I recommend you to ugprade to post-5850 if you can, or to >>> scream loudly at whoever is modifying your traffic :) >>> >> >> Actually I still notice some problems.. This morning in the direction OI => >> omnios >> things seemed okay. >> Now, omnios => OI I just now experienced the hang again, and it is >> repeatable. >> >> Could it be that your workaround is only useful for outbound connections >> (relative to OI)? > > Yeah, it's possible. Whoever sends the SYN expresses their capability to > timestamp by including the tsopt, and you can disable that with the ndd > options. I assumed that the ndd options would affect SYNACK as well, but > I didn't actually read the code; I guess that's not the case after all, > so inbound connections still get timestamping negotiated. I don't have a > workaround for this, sorry. > Too bad. Naturally it isn't feasible to turn things off via ndd on omnios for just one target. Is there any way to do that differently? That is, for only one target (and primarily ssh)? Unfortunately as well seems my inquiry to the OI list went unheard, even after subscribing (again). Must not have any moderators any longer... oh bother. The easiest would be to have 5850 integrated into OI. -- Richard PALO ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
On Thu, Oct 01 2015 13:49:03 +0200, Richard PALO wrote: > Le 01/10/15 11:58, Lauri Tirkkonen a écrit : > > On Thu, Oct 01 2015 11:50:03 +0200, Richard PALO wrote: > In that case, wouldn't setting tcp_tstamp_always on OI to '1' be better > in > this case (or would OI not honour that setting correctly)? > >>> > >>> It wouldn't work. From what I can tell, those ndd settings only affect > >>> the SYN segments (ie. timestamp negotiation); pre-5850 illumos will > >>> always stop timestamping mid-connection if it receives a non-timestamped > >>> segment. > >>> > >> > >> Okay, I set tcp_tstamp_if_wscale to 0 and it does seem to work fine. > > > > Thanks, that pretty much confirms the issue is what I suspected it is. > > > >> (Hoping there isn't any fallout from doing this now...) > > > > As long as that middlebox has been mucking with your traffic in the way > > it is, timestamps have been getting turned off mid-connection for your > > pre-5850 box. I recommend you to ugprade to post-5850 if you can, or to > > scream loudly at whoever is modifying your traffic :) > > > > Actually I still notice some problems.. This morning in the direction OI => > omnios > things seemed okay. > Now, omnios => OI I just now experienced the hang again, and it is repeatable. > > Could it be that your workaround is only useful for outbound connections > (relative to OI)? Yeah, it's possible. Whoever sends the SYN expresses their capability to timestamp by including the tsopt, and you can disable that with the ndd options. I assumed that the ndd options would affect SYNACK as well, but I didn't actually read the code; I guess that's not the case after all, so inbound connections still get timestamping negotiated. I don't have a workaround for this, sorry. -- Lauri Tirkkonen | lotheac @ IRCnet ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
Le 01/10/15 11:58, Lauri Tirkkonen a écrit : > On Thu, Oct 01 2015 11:50:03 +0200, Richard PALO wrote: In that case, wouldn't setting tcp_tstamp_always on OI to '1' be better in this case (or would OI not honour that setting correctly)? >>> >>> It wouldn't work. From what I can tell, those ndd settings only affect >>> the SYN segments (ie. timestamp negotiation); pre-5850 illumos will >>> always stop timestamping mid-connection if it receives a non-timestamped >>> segment. >>> >> >> Okay, I set tcp_tstamp_if_wscale to 0 and it does seem to work fine. > > Thanks, that pretty much confirms the issue is what I suspected it is. > >> (Hoping there isn't any fallout from doing this now...) > > As long as that middlebox has been mucking with your traffic in the way > it is, timestamps have been getting turned off mid-connection for your > pre-5850 box. I recommend you to ugprade to post-5850 if you can, or to > scream loudly at whoever is modifying your traffic :) > Actually I still notice some problems.. This morning in the direction OI => omnios things seemed okay. Now, omnios => OI I just now experienced the hang again, and it is repeatable. Could it be that your workaround is only useful for outbound connections (relative to OI)? -- Richard PALO ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
On Thu, Oct 01 2015 11:50:03 +0200, Richard PALO wrote: > >>In that case, wouldn't setting tcp_tstamp_always on OI to '1' be better in > >>this case (or would OI not honour that setting correctly)? > > > >It wouldn't work. From what I can tell, those ndd settings only affect > >the SYN segments (ie. timestamp negotiation); pre-5850 illumos will > >always stop timestamping mid-connection if it receives a non-timestamped > >segment. > > > > Okay, I set tcp_tstamp_if_wscale to 0 and it does seem to work fine. Thanks, that pretty much confirms the issue is what I suspected it is. > (Hoping there isn't any fallout from doing this now...) As long as that middlebox has been mucking with your traffic in the way it is, timestamps have been getting turned off mid-connection for your pre-5850 box. I recommend you to ugprade to post-5850 if you can, or to scream loudly at whoever is modifying your traffic :) -- Lauri Tirkkonen | lotheac @ IRCnet ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
Le 30/09/15 10:02, Lauri Tirkkonen a écrit : On Wed, Sep 30 2015 09:56:47 +0200, Richard PALO wrote: To be clear, it's not implementing RFC 1323 (and not even *not* implementing 7323) that causes the issue. 1323 actually didn't specify what to do with non-timestamped segments on a timestamp-negotiated connection, and illumos pre-5850 did something very surprising which I doubt nobody else did (stop generating timestamps on all future segments), so I don't think you will be able to reproduce the hang with other operating systems, but you'll likely be able to see the unexpected non-timestamped segments in connections between other OSes as well (but I still can't be sure because I don't know what middlebox is injecting them or why :) In that case, wouldn't setting tcp_tstamp_always on OI to '1' be better in this case (or would OI not honour that setting correctly)? It wouldn't work. From what I can tell, those ndd settings only affect the SYN segments (ie. timestamp negotiation); pre-5850 illumos will always stop timestamping mid-connection if it receives a non-timestamped segment. Okay, I set tcp_tstamp_if_wscale to 0 and it does seem to work fine. (Hoping there isn't any fallout from doing this now...) kiitoksia ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
On Wed, Sep 30 2015 09:56:47 +0200, Richard PALO wrote: > >To be clear, it's not implementing RFC 1323 (and not even *not* > >implementing 7323) that causes the issue. 1323 actually didn't specify > >what to do with non-timestamped segments on a timestamp-negotiated > >connection, and illumos pre-5850 did something very surprising which I > >doubt nobody else did (stop generating timestamps on all future > >segments), so I don't think you will be able to reproduce the hang with > >other operating systems, but you'll likely be able to see the unexpected > >non-timestamped segments in connections between other OSes as well (but > >I still can't be sure because I don't know what middlebox is injecting > >them or why :) > > > > In that case, wouldn't setting tcp_tstamp_always on OI to '1' be better in > this case (or would OI not honour that setting correctly)? It wouldn't work. From what I can tell, those ndd settings only affect the SYN segments (ie. timestamp negotiation); pre-5850 illumos will always stop timestamping mid-connection if it receives a non-timestamped segment. -- Lauri Tirkkonen | lotheac @ IRCnet ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
Le 29/09/15 12:35, Lauri Tirkkonen a écrit : On Tue, Sep 29 2015 12:19:09 +0200, Richard PALO wrote: Since I'm not having any issues with netbsd (6.1), which seemingly is still at rfc1323 richard@omnis:/home/richard$ ssh netbsd.org /sbin/sysctl net.inet.tcp.rfc1323 net.inet.tcp.rfc1323 = 1 I'd like to do some additional tests involving a non-illumos host as well just to make sure. To be clear, it's not implementing RFC 1323 (and not even *not* implementing 7323) that causes the issue. 1323 actually didn't specify what to do with non-timestamped segments on a timestamp-negotiated connection, and illumos pre-5850 did something very surprising which I doubt nobody else did (stop generating timestamps on all future segments), so I don't think you will be able to reproduce the hang with other operating systems, but you'll likely be able to see the unexpected non-timestamped segments in connections between other OSes as well (but I still can't be sure because I don't know what middlebox is injecting them or why :) In that case, wouldn't setting tcp_tstamp_always on OI to '1' be better in this case (or would OI not honour that setting correctly)? ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
On Tue, Sep 29 2015 12:19:09 +0200, Richard PALO wrote: > Since I'm not having any issues with netbsd (6.1), which seemingly is still > at rfc1323 > >richard@omnis:/home/richard$ ssh netbsd.org /sbin/sysctl net.inet.tcp.rfc1323 > >net.inet.tcp.rfc1323 = 1 > > I'd like to do some additional tests involving a non-illumos host as well > just to make sure. To be clear, it's not implementing RFC 1323 (and not even *not* implementing 7323) that causes the issue. 1323 actually didn't specify what to do with non-timestamped segments on a timestamp-negotiated connection, and illumos pre-5850 did something very surprising which I doubt nobody else did (stop generating timestamps on all future segments), so I don't think you will be able to reproduce the hang with other operating systems, but you'll likely be able to see the unexpected non-timestamped segments in connections between other OSes as well (but I still can't be sure because I don't know what middlebox is injecting them or why :) -- Lauri Tirkkonen | lotheac @ IRCnet ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
Le 28/09/15 17:40, Lauri Tirkkonen a écrit : It just occurred to me that if timestamp options don't get negotiated at all on the connection, both peers should be fine with this injection and continue to function. So as a workaround you could try disabling timestamps on the oi_151a9 box. I see the following ndd options: % ndd -get tcp ?|grep tstamp tcp_tstamp_always (read and write) tcp_tstamp_if_wscale (read and write) You could try setting those to 0 and see if that works around the hang (untested, so beware). This obviously turns off TCP timestamps, but how useful are they on the pre-5850 box anyway if your middlebox has been defeating their use all this time? :) On OI (actually on both): richard@smicro:~$ ndd -get tcp tcp_tstamp_always 0 richard@smicro:~$ ndd -get tcp tcp_tstamp_if_wscale 1 so if I understand correctly, setting tcp_tstamp_if_wscale on OI will turn off timestamps avoiding the issue with 5850 on Omnios. I'll give it a try. Since I'm not having any issues with netbsd (6.1), which seemingly is still at rfc1323 richard@omnis:/home/richard$ ssh netbsd.org /sbin/sysctl net.inet.tcp.rfc1323 net.inet.tcp.rfc1323 = 1 I'd like to do some additional tests involving a non-illumos host as well just to make sure. terveisin, risto3 ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
On Mon, Sep 28 2015 16:20:03 +0200, Richard PALO wrote: > Le 28/09/15 15:46, Lauri Tirkkonen a écrit : > > On Mon, Sep 28 2015 08:21:46 -0400, Dan McDonald wrote: > >> > >>> On Sep 28, 2015, at 8:15 AM, Dan McDonald wrote: > >>> > >>> If 5850 is indeed the problem, you need to report this to the > >>> illumos developers list, including a deterministic way of > >>> reproducing it. > >> > >> I see you filed bug 6264, which is a good first step. Please make > >> sure you summarize the how-to-reproduce in it. > >> > >> I also wonder if you patch your oi_151a9 box with 5850, AND keep 5850 > >> on your OmniOS machine, whether or not this problem ALSO goes away. > >> After all, this fix specifically targets machines that drop > >> timestamps... > > > > If my analysis is correct (see the mail I sent to this thread > > previously), then applying 5850 to the oi_151a9 box will cause the issue > > to disappear -- both peers will then ignore the injected window change > > segment because it has no timestamps. Of course, it's possible that the > > middlebox won't like being ignored and might cause other failures (it > > could still inject RSTs, for example, since those are not required to > > have timestamps). > > > > If I experienced the issue, chances a great anybody else with oi_151a9 have it > as well in France as the OI machine is connected to an Orange (previously > known > as France Télécom) Business Services SDSL router and the Omnios box to a > Freebox (Free Télécom). It just occurred to me that if timestamp options don't get negotiated at all on the connection, both peers should be fine with this injection and continue to function. So as a workaround you could try disabling timestamps on the oi_151a9 box. I see the following ndd options: % ndd -get tcp ?|grep tstamp tcp_tstamp_always (read and write) tcp_tstamp_if_wscale (read and write) You could try setting those to 0 and see if that works around the hang (untested, so beware). This obviously turns off TCP timestamps, but how useful are they on the pre-5850 box anyway if your middlebox has been defeating their use all this time? :) -- Lauri Tirkkonen | lotheac @ IRCnet ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
On Mon, Sep 28 2015 16:20:03 +0200, Richard PALO wrote: > Le 28/09/15 15:46, Lauri Tirkkonen a écrit : > > On Mon, Sep 28 2015 08:21:46 -0400, Dan McDonald wrote: > >> > >>> On Sep 28, 2015, at 8:15 AM, Dan McDonald wrote: > >>> > >>> If 5850 is indeed the problem, you need to report this to the > >>> illumos developers list, including a deterministic way of > >>> reproducing it. > >> > >> I see you filed bug 6264, which is a good first step. Please make > >> sure you summarize the how-to-reproduce in it. > >> > >> I also wonder if you patch your oi_151a9 box with 5850, AND keep 5850 > >> on your OmniOS machine, whether or not this problem ALSO goes away. > >> After all, this fix specifically targets machines that drop > >> timestamps... > > > > If my analysis is correct (see the mail I sent to this thread > > previously), then applying 5850 to the oi_151a9 box will cause the issue > > to disappear -- both peers will then ignore the injected window change > > segment because it has no timestamps. Of course, it's possible that the > > middlebox won't like being ignored and might cause other failures (it > > could still inject RSTs, for example, since those are not required to > > have timestamps). > > > > If I experienced the issue, chances a great anybody else with oi_151a9 have it > as well in France as the OI machine is connected to an Orange (previously > known > as France Télécom) Business Services SDSL router and the Omnios box to a > Freebox (Free Télécom). > > Any hint on how to determine which box is doing it (or both)? > If not, if I can ssh into someplace that is able to check... > perhaps even an ftp session? Well, seeing how we only know that neither peer is actually sending the non-timestamped segment, it could be any box along the path - I'd start with examining your routers. It's hard to say what exactly will trigger a repro without knowing what the middlebox is trying to accomplish by injecting this segment, but it might be beneficial to try to get a repro with a simple echo server or something like that, and then try to isolate the issue by trying different connection paths. You could also talk to your providers. It's unfortunate that this manifests in a regression like this, but it's a product of the previous incorrect behavior, an obnoxious middlebox doing unsanitary things, and us (illumos-gate) trying to do the right thing by following the RFC. -- Lauri Tirkkonen | lotheac @ IRCnet ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
Le 28/09/15 15:46, Lauri Tirkkonen a écrit : > On Mon, Sep 28 2015 08:21:46 -0400, Dan McDonald wrote: >> >>> On Sep 28, 2015, at 8:15 AM, Dan McDonald wrote: >>> >>> If 5850 is indeed the problem, you need to report this to the >>> illumos developers list, including a deterministic way of >>> reproducing it. >> >> I see you filed bug 6264, which is a good first step. Please make >> sure you summarize the how-to-reproduce in it. >> >> I also wonder if you patch your oi_151a9 box with 5850, AND keep 5850 >> on your OmniOS machine, whether or not this problem ALSO goes away. >> After all, this fix specifically targets machines that drop >> timestamps... > > If my analysis is correct (see the mail I sent to this thread > previously), then applying 5850 to the oi_151a9 box will cause the issue > to disappear -- both peers will then ignore the injected window change > segment because it has no timestamps. Of course, it's possible that the > middlebox won't like being ignored and might cause other failures (it > could still inject RSTs, for example, since those are not required to > have timestamps). > If I experienced the issue, chances a great anybody else with oi_151a9 have it as well in France as the OI machine is connected to an Orange (previously known as France Télécom) Business Services SDSL router and the Omnios box to a Freebox (Free Télécom). Any hint on how to determine which box is doing it (or both)? If not, if I can ssh into someplace that is able to check... perhaps even an ftp session? cheers -- Richard PALO ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
On Mon, Sep 28 2015 08:21:46 -0400, Dan McDonald wrote: > > > On Sep 28, 2015, at 8:15 AM, Dan McDonald wrote: > > > > If 5850 is indeed the problem, you need to report this to the > > illumos developers list, including a deterministic way of > > reproducing it. > > I see you filed bug 6264, which is a good first step. Please make > sure you summarize the how-to-reproduce in it. > > I also wonder if you patch your oi_151a9 box with 5850, AND keep 5850 > on your OmniOS machine, whether or not this problem ALSO goes away. > After all, this fix specifically targets machines that drop > timestamps... If my analysis is correct (see the mail I sent to this thread previously), then applying 5850 to the oi_151a9 box will cause the issue to disappear -- both peers will then ignore the injected window change segment because it has no timestamps. Of course, it's possible that the middlebox won't like being ignored and might cause other failures (it could still inject RSTs, for example, since those are not required to have timestamps). -- Lauri Tirkkonen | lotheac @ IRCnet ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
Le 28/09/15 14:21, Dan McDonald a écrit : > >> On Sep 28, 2015, at 8:15 AM, Dan McDonald wrote: >> >> If 5850 is indeed the problem, you need to report this to the illumos >> developers list, including a deterministic way of reproducing it. > > I see you filed bug 6264, which is a good first step. Please make sure you > summarize the how-to-reproduce in it. > > I also wonder if you patch your oi_151a9 box with 5850, AND keep 5850 on your > OmniOS machine, whether or not this problem ALSO goes away. After all, this > fix specifically targets machines that drop timestamps... > > Dan > > > Unfortunately this being an OI machine in production, I'd need the patched kit available in http://pkg.openindiana.org/dev/which is currently at illumos 52e13e00ba with the last update being 2014-12-10 16:08:49 I'm not sure anybody deals with non-hipster OI anymore, unfortunately. -- Richard PALO ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
> On Sep 28, 2015, at 8:15 AM, Dan McDonald wrote: > > If 5850 is indeed the problem, you need to report this to the illumos > developers list, including a deterministic way of reproducing it. I see you filed bug 6264, which is a good first step. Please make sure you summarize the how-to-reproduce in it. I also wonder if you patch your oi_151a9 box with 5850, AND keep 5850 on your OmniOS machine, whether or not this problem ALSO goes away. After all, this fix specifically targets machines that drop timestamps... Dan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
If 5850 is indeed the problem, you need to report this to the illumos developers list, including a deterministic way of reproducing it. Funny though, the fix was brought forth because of specific middlebox behavior. It is POSSIBLE your middlebox is behaving differently than the bug-filer's middlebox. Please keep that in mind. Dan > On Sep 26, 2015, at 11:33 AM, Richard PALO wrote: > > Le 08/09/15 06:32, Richard PALO a écrit : >> Thought I would try snoop with port 22. >> >> From omnios, in one window I issued: >>> pfexec snoop -rv -d e1000g0 port 22 |& tee snoop.out >> >> From another I connected to the OI machine and did nothing further (as it >> hangs in that direction too): >>> ssh xx.xx.xxx.xx >> >> In the attached snoop.output, I edited snoop.out to put in a comment after >> the initial connection >> (search for "pause after connection") >> before the traffic seemingly when things go sour... I notice a Window >> changed to 1024?? >> >> At the moment I'm running with the gate @ >> 2ed96329a073f74bd33f766ab982be14f3205bc9 > > > is it possible that the following has something to do with it (it is in about > the right timeframe)? >> commit 1f183ba0b0be3e10202501aa3740753df6512804 >> Author: Lauri Tirkkonen >> AuthorDate: Wed Apr 15 16:30:46 2015 +0300 >> Commit: Robert Mustacchi >> CommitDate: Thu Jul 30 08:33:51 2015 -0700 >> >>5850 tcp timestamping behavior changed mid-connection >> > > If so, would it be safe to revert for a test build to try? > -- > Richard PALO > ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
On Thu, Sep 10 2015 11:28:12 +0200, Richard PALO wrote: > Le 08/09/15 14:12, Dan McDonald a écrit : > > > >>On Sep 8, 2015, at 12:32 AM, Richard PALO > >> wrote: > >> > >>before the traffic seemingly when things go sour... I notice a Window > >>changed to 1024?? > > > >Which side is advertising the window change again? And which side is > >running -gate from 2ed96329a073f74bd33f766ab982be14f3205bc9 ? > > > >This thread has been paged out, so to speak, for long enough. Can > >you give me the context of which machine is running what to explain > >the context of the snoop file? > > > >Thanks, > >Dan > > > Just for completeness, same histoire from the OI side, snoop and ssh > > here, 192.168.1.2 is smicro (oi_151a9) > >e1000g0 192.168.1.1 255.255.255.255 00:12:ef:21:9c:f8 > >e1000g0 192.168.1.2 255.255.255.255 SPLA 00:30:48:f4:33:f0 > and 192.168.1.1 is an Orange Business Services SDSL router. Are these captures both from the same connection? If so, there is obviously a middle box modifying the traffic. On *both* ends, it looks like the other end is sending an empty ACK requesting the window change to 1024 (packet 41 in snoop.output, with dst 192.168.0.6, and packet 41 in snoop-OI.output, with dst 192.168.1.2). Both of these TCP segments are missing the required timestamp options. With the fix for 5850, illumos should never send a segment without timestamps on a connection which has negotiated timestamps (this one has, since they are present on previous segments). In addition, as part of 5850, we follow the RFC recommendation to drop any arriving segments *without* timestamps on a timestamp-negotiated connection [0]. This is likely the reason why your use case worked before; the older behavior was to stop generating timestamps altogether on a connection where any received segment omits them, but that's the wrong thing to do. There is a new dtrace probe 'tcp:::droppedtimestamp' which should fire whenever a segment is dropped by this behavior. You could use that to verify my speculation, eg. # dtrace -n 'tcp:::droppedtimestamp { trace(probefunc); }' should generate output when the connection hangs (and more information about the connection is available in (tcp_t*)arg0). Based on the data you have made available I believe this is an issue with a middlebox injecting erroneous traffic into the TCP stream for both peers. This injected segment is ignored by the box with the fix for 5850 appliec, but it causes the older illumos box to stop generation of timestamps, after which all segments it sends are rejected by the newer box. Oh, and in the future, please post snoop capture files (from snoop -o); it's much easier to find the desired information in those :) [0]: http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/inet/tcp/tcp_input.c#2878 -- Lauri Tirkkonen | lotheac @ IRCnet ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
Le 26/09/15 17:33, Richard PALO a écrit : > Le 08/09/15 06:32, Richard PALO a écrit : >> Thought I would try snoop with port 22. >> >> From omnios, in one window I issued: >>> pfexec snoop -rv -d e1000g0 port 22 |& tee snoop.out >> >> From another I connected to the OI machine and did nothing further (as it >> hangs in that direction too): >>> ssh xx.xx.xxx.xx >> >> In the attached snoop.output, I edited snoop.out to put in a comment after >> the initial connection >> (search for "pause after connection") >> before the traffic seemingly when things go sour... I notice a Window >> changed to 1024?? >> >> At the moment I'm running with the gate @ >> 2ed96329a073f74bd33f766ab982be14f3205bc9 > > > is it possible that the following has something to do with it (it is in about > the right timeframe)? >> commit 1f183ba0b0be3e10202501aa3740753df6512804 >> Author: Lauri Tirkkonen >> AuthorDate: Wed Apr 15 16:30:46 2015 +0300 >> Commit: Robert Mustacchi >> CommitDate: Thu Jul 30 08:33:51 2015 -0700 >> >> 5850 tcp timestamping behavior changed mid-connection >> > > If so, would it be safe to revert for a test build to try? > Stroke of luck, tried a recent build with this reverted and have been able to work over an hour without problems on a couple of sessions in parallel doing things that used to hang after a few moments. I'll file an issue, this should probably be reverted until things are worked out. -- Richard PALO ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
Le 08/09/15 06:32, Richard PALO a écrit : > Thought I would try snoop with port 22. > > From omnios, in one window I issued: >> pfexec snoop -rv -d e1000g0 port 22 |& tee snoop.out > > From another I connected to the OI machine and did nothing further (as it > hangs in that direction too): >> ssh xx.xx.xxx.xx > > In the attached snoop.output, I edited snoop.out to put in a comment after > the initial connection > (search for "pause after connection") > before the traffic seemingly when things go sour... I notice a Window changed > to 1024?? > > At the moment I'm running with the gate @ > 2ed96329a073f74bd33f766ab982be14f3205bc9 is it possible that the following has something to do with it (it is in about the right timeframe)? > commit 1f183ba0b0be3e10202501aa3740753df6512804 > Author: Lauri Tirkkonen > AuthorDate: Wed Apr 15 16:30:46 2015 +0300 > Commit: Robert Mustacchi > CommitDate: Thu Jul 30 08:33:51 2015 -0700 > > 5850 tcp timestamping behavior changed mid-connection > If so, would it be safe to revert for a test build to try? -- Richard PALO ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
Le 08/09/15 14:12, Dan McDonald a écrit : On Sep 8, 2015, at 12:32 AM, Richard PALO wrote: before the traffic seemingly when things go sour... I notice a Window changed to 1024?? Which side is advertising the window change again? And which side is running -gate from 2ed96329a073f74bd33f766ab982be14f3205bc9 ? This thread has been paged out, so to speak, for long enough. Can you give me the context of which machine is running what to explain the context of the snoop file? Thanks, Dan Just for completeness, same histoire from the OI side, snoop and ssh here, 192.168.1.2 is smicro (oi_151a9) e1000g0 192.168.1.1 255.255.255.255 00:12:ef:21:9c:f8 e1000g0 192.168.1.2 255.255.255.255 SPLA 00:30:48:f4:33:f0 and 192.168.1.1 is an Orange Business Services SDSL router. snoop-OI.output.gz Description: application/gzip ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
Le 08/09/15 14:12, Dan McDonald a écrit : On Sep 8, 2015, at 12:32 AM, Richard PALO wrote: before the traffic seemingly when things go sour... I notice a Window changed to 1024?? Which side is advertising the window change again? And which side is running -gate from 2ed96329a073f74bd33f766ab982be14f3205bc9 ? This thread has been paged out, so to speak, for long enough. Can you give me the context of which machine is running what to explain the context of the snoop file? Thanks, Dan snoop is running on the omnios machine (omnis) with [near]latest gate *and* is the initiator of the ssh session (having address 192.168.0.6) to the OI target (xx.xx.xxx.xx) on the LAN, looking at 'arp -an' >e1000g0 192.168.0.1 255.255.255.255 00:24:d4:78:eb:ac >e1000g0 192.168.0.6 255.255.255.255 SPLA00:25:90:f3:5c:8c omnis is 192.168.0.6 and 192.168.0.1 is my freebox adsl router. from what I can gather, the window change request is in a packet "arriving" so it's probably not requested by the local omnis machine. ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
> On Sep 8, 2015, at 12:32 AM, Richard PALO wrote: > > before the traffic seemingly when things go sour... I notice a Window changed > to 1024?? Which side is advertising the window change again? And which side is running -gate from 2ed96329a073f74bd33f766ab982be14f3205bc9 ? This thread has been paged out, so to speak, for long enough. Can you give me the context of which machine is running what to explain the context of the snoop file? Thanks, Dan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
Thought I would try snoop with port 22. >From omnios, in one window I issued: > pfexec snoop -rv -d e1000g0 port 22 |& tee snoop.out >From another I connected to the OI machine and did nothing further (as it >hangs in that direction too): > ssh xx.xx.xxx.xx In the attached snoop.output, I edited snoop.out to put in a comment after the initial connection (search for "pause after connection") before the traffic seemingly when things go sour... I notice a Window changed to 1024?? At the moment I'm running with the gate @ 2ed96329a073f74bd33f766ab982be14f3205bc9 -- Richard PALO snoop.output.gz Description: application/gzip ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
Le 24/08/15 19:14, Eric Sproul a écrit : > On Mon, Aug 24, 2015 at 12:04 PM, Richard PALO > wrote: >> notice inbound invalids and nomatches both ways... are they a concern? > > I have no idea. I might try adding an unconditional pass rule for the > OmniOS system to ensure it's not matching any other ipfilter rules, or > if possible, disable ipfilter during the testing. > Been noticing some talk of issues with ipv6/ipv4 lately... My freebox (on the omnios side) has ipv6 enabled but not on the OI side with an OBS router. I'll try turning that off to see if things settle down.. with some luck some fixes are on the way. (seems I remember Dan already fixed an issue last year in this area that I had with the hottail fox). -- Richard PALO ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
On Mon, Aug 24, 2015 at 12:04 PM, Richard PALO wrote: > notice inbound invalids and nomatches both ways... are they a concern? I have no idea. I might try adding an unconditional pass rule for the OmniOS system to ensure it's not matching any other ipfilter rules, or if possible, disable ipfilter during the testing. ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
Le 24/08/15 18:05, Eric Sproul a écrit : > What you describe sounds network-related, perhaps just a coincidence > that it happened "recently". However, it also sounds like the > behavior changes depending on whether you use an older BE or a newer > one, so that makes it seem *less* likely that it is an issue with the > network. I might still try to packet capture both working and > non-working ssh sessions and compare them. I would also double-check > that your omnios BEs don't have something like ipfilter enabled or > perhaps some kernel tunable that you changed but might have forgotten. > > Eric > > I do find the following from the OI machine interesting: > richard@smicro:~$ pfexec kstat -m ipf > module: ipf instance: 0 > name: inbound class:net > acct0 > bad frag state alloc0 > bad ip pkt 0 > bad pkt state alloc 0 > block 0 > block, logged 0 > cachehit57425203 > crtime 154,516657078 > dropped:pps ceiling 0 > ip upd. fail0 > ipv6 pkt0 > logged 0 > new frag state compl. pkt 0 > new frag state kept 0 > new pkt kept state 0 > nomatch 92080544 > nomatch, logged 0 > pass95757622 > pass, logged3676918 > pullup nok 0 > pullup ok 254596 > return sent 0 > short 0 > skip57 > snaptime154,516657078 > src != route0 > tcp cksum bad 0 > ttl invalid 1099124 > > module: ipf instance: 0 > name: outboundclass:net > acct0 > bad frag state alloc0 > bad ip pkt 0 > bad pkt state alloc 0 > block 14 > block, logged 0 > cachehit0 > crtime 154,516663632 > dropped:pps ceiling 0 > ip upd. fail0 > ipv6 pkt0 > logged 0 > new frag state compl. pkt 0 > new frag state kept 0 > new pkt kept state 0 > nomatch 123524975 > nomatch, logged 0 > pass123524967 > pass, logged0 > pullup nok 0 > pullup ok 252835 > return sent 0 > short 0 > skip0 > snaptime154,516663632 > src != route0 > tcp cksum bad 0 > ttl invalid 0 notice inbound invalids and nomatches both ways... are they a concern? -- Richard PALO ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
On Mon, Aug 24, 2015 at 10:35 AM, Richard PALO wrote: > The machines do not belong to the same subnet. They are physically remote > and the omnios machine is behind a router with port forwarding. > The OI machine *is* multihomed, though. > What strikes me most is that previous versions of omnios (and the gate) > worked fine, it is only now I just happened to come across this PITA-ful > issue) > > What could cause this difference in treatment, and is OI at fault or recent > gate? What you describe sounds network-related, perhaps just a coincidence that it happened "recently". However, it also sounds like the behavior changes depending on whether you use an older BE or a newer one, so that makes it seem *less* likely that it is an issue with the network. I might still try to packet capture both working and non-working ssh sessions and compare them. I would also double-check that your omnios BEs don't have something like ipfilter enabled or perhaps some kernel tunable that you changed but might have forgotten. Eric ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
Le 24/08/15 16:26, Eric Sproul a écrit : > On Sun, Aug 23, 2015 at 9:19 PM, Dan McDonald wrote: >> I'm not seeing it. Do you maybe have PathMTU issues, or are these >> same-subnet machines? > > This sounds a lot like a split-path routing issue, so knowing whether > these machines are on the same subnet would be key. > > I've had this happen in the past with multi-homed machines, where the > initiating TCP client's packets traverse a router/firewall with > stateful filtering and "hair-pin" back onto the local subnet (perhaps > due to DNAT or other fancy tricks). Since the destination knows it is > directly connected to the network of the source IP, it responds > directly back to the client, bypassing the router. Thus the stateful > firewall sees only one half of the connection, and misses any > negotiated changes, e.g. TCP window-scaling that might occur. As soon > as the client sends a scaled-window packet, the firewall drops it as > invalid, and the client experiences a hang in connectivity. The > solution of course is, Don't Do That(tm). Multi-homing should be > employed along with split-horizon DNS or firewall routing rules that > NAT the source to ensure responses come back through the router. > > But maybe all this is moot if you're not multi-homing or doing > anything similarly "fancy" between your OI and OmniOS hosts. > > Eric > > Hi Eric, The machines do not belong to the same subnet. They are physically remote and the omnios machine is behind a router with port forwarding. The OI machine *is* multihomed, though. What strikes me most is that previous versions of omnios (and the gate) worked fine, it is only now I just happened to come across this PITA-ful issue) What could cause this difference in treatment, and is OI at fault or recent gate? -- Richard PALO ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
On Sun, Aug 23, 2015 at 9:19 PM, Dan McDonald wrote: > I'm not seeing it. Do you maybe have PathMTU issues, or are these > same-subnet machines? This sounds a lot like a split-path routing issue, so knowing whether these machines are on the same subnet would be key. I've had this happen in the past with multi-homed machines, where the initiating TCP client's packets traverse a router/firewall with stateful filtering and "hair-pin" back onto the local subnet (perhaps due to DNAT or other fancy tricks). Since the destination knows it is directly connected to the network of the source IP, it responds directly back to the client, bypassing the router. Thus the stateful firewall sees only one half of the connection, and misses any negotiated changes, e.g. TCP window-scaling that might occur. As soon as the client sends a scaled-window packet, the firewall drops it as invalid, and the client experiences a hang in connectivity. The solution of course is, Don't Do That(tm). Multi-homing should be employed along with split-horizon DNS or firewall routing rules that NAT the source to ensure responses come back through the router. But maybe all this is moot if you're not multi-homing or doing anything similarly "fancy" between your OI and OmniOS hosts. Eric ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
I just tried this reproduction on my OmniOS box: 1.) ssh to OI 151a9. 2.) ssh from 151a9 to bloody 3.) cat $illumos-gate/usr/src/uts/common/inet/ip/ip.c (a large file) No breakage. 4.) "git log -p ip.c" -- it uses less -M by default, so that stopped waiting for input. 5.) "git log -p ip.c | cat" -- it spewed output. 6.) exec bash (I use tcsh) 7.) repeat #5 8.) Login to bloody with a user account with SHELL=bash. 9.) repeat #5 I'm not seeing it. Do you maybe have PathMTU issues, or are these same-subnet machines? Dan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
Le 21/08/15 23:12, Dan McDonald a écrit : > >> On Aug 21, 2015, at 4:37 PM, Richard PALO >> wrote: >> >> There seems to be a recent regression somewhere as when I ssh in from >> an OI machine to my bloody dev machine running recent vanilla bits, my >> session hangs relatively soon. >> I'm using bash as my login shell > > I used tcsh as mine, and my bloody box hasn't been updated to the very latest > OmniOS bits (which include that ld fix). Did you have this problem with the > current bloody repo? > > Dan > > Well, unfortunately I don't keep around all my boot environments, but I was able to determine that a build from 20150725 works fine but the builds I have from 20150818 and later don't. Given I didn't necessarily update from upstream for each build, I'd say something between 15/07 and 18/08 busted something. What made it easier for me to test was relatively simple. 1. boot test be 2. in a virtual terminal (keeping the console to check things out) ssh into OI and then back to omnios 3. cd src/illumos-gate and do a git diff with something needing a few pages of output 4. switch back to console for a few moments and for example with ptree, find the pid of git invoked less and pstack it 5. switch back to vt which should now be hung (if on a broken be). Perhaps someone has a means to reproduce this and identify with more precision when/why things went awry. -- Richard PALO ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
Le 21/08/15 23:12, Dan McDonald a écrit : > >> On Aug 21, 2015, at 4:37 PM, Richard PALO wrote: >> >> There seems to be a recent regression somewhere as when I ssh in from >> an OI machine to my bloody dev machine running recent vanilla bits, my >> session hangs relatively soon. >> I'm using bash as my login shell > > I used tcsh as mine, and my bloody box hasn't been updated to the very latest > OmniOS bits (which include that ld fix). Did you have this problem with the > current bloody repo? > > Dan > > Well, unfortunately I don't keep around all my boot environments, but I was able to determine that a build from 20150725 works fine but the builds I have from 20150818 and later don't. Given I didn't necessarily update from upstream for each build, I'd say something between 15/07 and 18/08 busted something. What made it easier for me to test was relatively simple. 1. boot test be 2. in a virtual terminal (keeping the console to check things out) ssh into OI and then back to omnios 3. cd src/illumos-gate and do a git diff with something needing a few pages of output 4. switch back to console for a few moments and for example with ptree, find the pid of git invoked less and pstack it 5. switch back to vt which should now be hung (if on a broken be). Perhaps someone has a means to reproduce this and identify with more precision when/why things went awry. -- Richard PALO ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
> On Aug 21, 2015, at 4:37 PM, Richard PALO wrote: > > There seems to be a recent regression somewhere as when I ssh in from > an OI machine to my bloody dev machine running recent vanilla bits, my > session hangs relatively soon. > I'm using bash as my login shell I used tcsh as mine, and my bloody box hasn't been updated to the very latest OmniOS bits (which include that ld fix). Did you have this problem with the current bloody repo? Dan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss