Re: Issues with TCP Timestamps allocation

Vitalij Satanivskij Wed, 17 Jul 2019 04:55:49 -0700

MT> > MT> In the meantime you can deal with the buggy hosts by disabling the 
timestamps
MT> > MT> or dropping extensions on SYN retransmits.
MT> > 
MT> > You meen by some code changes?
MT> No.
MT> 
MT> Two options:
MT> 
MT> Option 1: Drop the TCP timestamp option on the third retransmission
MT> To enable this, you configure on the client
MT> sudo sysctl -w net.inet.tcp.rexmit_drop_options=1
MT> or put
MT> net.inet.tcp.rexmit_drop_options=1
MT> in /etc/sysctl.conf
MT> and reboot
MT> In case of the broken host, the first SYN retransmission will happen 1 
second after the
MT> initial SYN segment, the second retransmission will happen 1.2 seconds 
after the first. On the
MT> third retransmission, which happens again 1.2 seconds later, the TCP 
timestamp option is
MT> dropped and the connection setup will succeed. This gives you a total delay 
of 3.4 seconds
MT> on connection setup instead of the longer timeout.


First Option is not working. Steel see same behave.


MT> 
MT> Option 2: Disable the TCP timestamps (and window scaling)
MT> To enable this, you configure on the client
MT> sudo sysctl -w net.inet.tcp.rfc1323=0
MT> or put
MT> net.inet.tcp.rfc1323=0
MT> in /etc/sysctl.conf
MT> and reboot.
MT> This disables the timestamp option and window scaling completely. This 
allows you to
MT> setup the connections without any delay. However, you don't have the 
benefits of the
MT> extension.
MT> 
MT> Both options don't require any code changes.

This option was tested some time before. Yep it's help. But overal performance 
of tcp networking ... Let's say to bad :(




MT> Best regards
MT> Michael
MT> 
MT> 
MT> > 
MT> > 
MT> > MT> 
MT> > MT> Best regards
MT> > MT> Michael
MT> > MT> > 
MT> > MT> > 
MT> > MT> > 
MT> > MT> > Michael Tuexen wrote:
MT> > MT> > MT> 
MT> > MT> > MT> 
MT> > MT> > MT> > On 9. Jul 2019, at 14:58, Paul <de...@ukr.net> wrote:
MT> > MT> > MT> > 
MT> > MT> > MT> > Hi Michael,
MT> > MT> > MT> > 
MT> > MT> > MT> > 9 July 2019, 15:34:29, by "Michael Tuexen" 
<tue...@freebsd.org>:
MT> > MT> > MT> > 
MT> > MT> > MT> >> 
MT> > MT> > MT> >> 
MT> > MT> > MT> >>> On 8. Jul 2019, at 17:22, Paul <de...@ukr.net> wrote:
MT> > MT> > MT> >>> 
MT> > MT> > MT> >>> 
MT> > MT> > MT> >>> 
MT> > MT> > MT> >>> 8 July 2019, 17:12:21, by "Michael Tuexen" 
<tue...@freebsd.org>:
MT> > MT> > MT> >>> 
MT> > MT> > MT> >>>>> On 8. Jul 2019, at 15:24, Paul <de...@ukr.net> wrote:
MT> > MT> > MT> >>>>> 
MT> > MT> > MT> >>>>> Hi Michael,
MT> > MT> > MT> >>>>> 
MT> > MT> > MT> >>>>> 8 July 2019, 15:53:15, by "Michael Tuexen" 
<tue...@freebsd.org>:
MT> > MT> > MT> >>>>> 
MT> > MT> > MT> >>>>>>> On 8. Jul 2019, at 12:37, Paul <de...@ukr.net> wrote:
MT> > MT> > MT> >>>>>>> 
MT> > MT> > MT> >>>>>>> Hi team,
MT> > MT> > MT> >>>>>>> 
MT> > MT> > MT> >>>>>>> Recently we had an upgrade to 12 Stable. Immediately 
after, we have started 
MT> > MT> > MT> >>>>>>> seeing some strange connection establishment timeouts 
to some fixed number
MT> > MT> > MT> >>>>>>> of external (world) hosts. The issue was persistent and 
easy to reproduce.
MT> > MT> > MT> >>>>>>> Thanks to a patience and dedication of our system 
engineer we have tracked  
MT> > MT> > MT> >>>>>>> this issue down to a specific commit:
MT> > MT> > MT> >>>>>>> 
MT> > MT> > MT> >>>>>>> 
https://svnweb.freebsd.org/base?view=revision&revision=338053
MT> > MT> > MT> >>>>>>> 
MT> > MT> > MT> >>>>>>> This patch was also back-ported into 11 Stable:
MT> > MT> > MT> >>>>>>> 
MT> > MT> > MT> >>>>>>> 
https://svnweb.freebsd.org/base?view=revision&revision=348435
MT> > MT> > MT> >>>>>>> 
MT> > MT> > MT> >>>>>>> Among other things this patch changes the timestamp 
allocation strategy,
MT> > MT> > MT> >>>>>>> by introducing a deterministic randomness via a hash 
function that takes
MT> > MT> > MT> >>>>>>> into account a random key as well as source address, 
source port, dest
MT> > MT> > MT> >>>>>>> address and dest port. As the result, timestamp offsets 
of different
MT> > MT> > MT> >>>>>>> tuples (SA,SP,DA,DP) will be wildly different and will 
jump from small 
MT> > MT> > MT> >>>>>>> to large numbers and back, as long as something in the 
tuple changes.
MT> > MT> > MT> >>>>>> Hi Paul,
MT> > MT> > MT> >>>>>> 
MT> > MT> > MT> >>>>>> this is correct.
MT> > MT> > MT> >>>>>> 
MT> > MT> > MT> >>>>>> Please note that the same happens with the old method, 
if two hosts with
MT> > MT> > MT> >>>>>> different uptimes are bind a consumer grade NAT.
MT> > MT> > MT> >>>>> 
MT> > MT> > MT> >>>>> If NAT does not replace timestamps then yes, it should be 
the case.
MT> > MT> > MT> >>>>> 
MT> > MT> > MT> >>>>>>> 
MT> > MT> > MT> >>>>>>> After performing various tests of hosts that produce 
the above mentioned 
MT> > MT> > MT> >>>>>>> issue we came to conclusion that there are some 
interesting implementations 
MT> > MT> > MT> >>>>>>> that drop SYN packets with timestamps smaller  than the 
largest timestamp 
MT> > MT> > MT> >>>>>>> value from streams of all recent or current connections 
from a specific 
MT> > MT> > MT> >>>>>>> address. This looks as some kind of SYN flood 
protection.
MT> > MT> > MT> >>>>>> This also breaks multiple hosts with different uptimes 
behind a consumer
MT> > MT> > MT> >>>>>> level NAT talking to such a server.
MT> > MT> > MT> >>>>>>> 
MT> > MT> > MT> >>>>>>> To ensure that each external host is not going to see a 
wild jumps of 
MT> > MT> > MT> >>>>>>> timestamp values I propose a patch that removes ports 
from the equation
MT> > MT> > MT> >>>>>>> all together, when calculating the timestamp offset:
MT> > MT> > MT> >>>>>>> 
MT> > MT> > MT> >>>>>>> Index: sys/netinet/tcp_subr.c
MT> > MT> > MT> >>>>>>> 
===================================================================
MT> > MT> > MT> >>>>>>> --- sys/netinet/tcp_subr.c      (revision 348435)
MT> > MT> > MT> >>>>>>> +++ sys/netinet/tcp_subr.c      (working copy)
MT> > MT> > MT> >>>>>>> @@ -2224,7 +2224,22 @@
MT> > MT> > MT> >>>>>>> uint32_t
MT> > MT> > MT> >>>>>>> tcp_new_ts_offset(struct in_conninfo *inc)
MT> > MT> > MT> >>>>>>> {
MT> > MT> > MT> >>>>>>> -       return (tcp_keyed_hash(inc, 
V_ts_offset_secret));
MT> > MT> > MT> >>>>>>> +        /* 
MT> > MT> > MT> >>>>>>> +         * Some implementations show a strange 
behaviour when a wildly random 
MT> > MT> > MT> >>>>>>> +         * timestamps allocated for different streams. 
It seems that only the
MT> > MT> > MT> >>>>>>> +         * SYN packets are affected. Observed 
implementations drop SYN packets
MT> > MT> > MT> >>>>>>> +         * with timestamps smaller than the largest 
timestamp value of all 
MT> > MT> > MT> >>>>>>> +         * recent or current connections from specific 
a address. To mitigate 
MT> > MT> > MT> >>>>>>> +         * this we are going to ensure that each host 
will always observe 
MT> > MT> > MT> >>>>>>> +         * timestamps as increasing no matter the 
stream: by dropping ports
MT> > MT> > MT> >>>>>>> +         * from the equation.
MT> > MT> > MT> >>>>>>> +         */ 
MT> > MT> > MT> >>>>>>> +        struct in_conninfo inc_copy = *inc;
MT> > MT> > MT> >>>>>>> +
MT> > MT> > MT> >>>>>>> +        inc_copy.inc_fport = 0;
MT> > MT> > MT> >>>>>>> +        inc_copy.inc_lport = 0;
MT> > MT> > MT> >>>>>>> +
MT> > MT> > MT> >>>>>>> +       return (tcp_keyed_hash(&inc_copy, 
V_ts_offset_secret));
MT> > MT> > MT> >>>>>>> }
MT> > MT> > MT> >>>>>>> 
MT> > MT> > MT> >>>>>>> /*
MT> > MT> > MT> >>>>>>> 
MT> > MT> > MT> >>>>>>> In any case, the solution of the uptime leak, 
implemented in rev338053 is 
MT> > MT> > MT> >>>>>>> not going to suffer, because a supposed attacker is 
currently able to use 
MT> > MT> > MT> >>>>>>> any fixed values of SP and DP, albeit not 0, anyway, to 
remove them out 
MT> > MT> > MT> >>>>>>> of the equation.
MT> > MT> > MT> >>>>>> Can you describe how a peer can compute the uptime from 
two observed timestamps?
MT> > MT> > MT> >>>>>> I don't see how you can do that...
MT> > MT> > MT> >>>>> 
MT> > MT> > MT> >>>>> Supposed attacker could run a script that continuously 
monitors timestamps,
MT> > MT> > MT> >>>>> for example via a periodic TCP connection from a fixed 
local port (eg 12345) 
MT> > MT> > MT> >>>>> and a fixed local address to the fixed victim's address 
and port (eg 80).
MT> > MT> > MT> >>>>> Whenever large discrepancy is observed, attacker can 
assume that reboot has 
MT> > MT> > MT> >>>>> happened (due to V_ts_offset_secret re-generation), hence 
the received 
MT> > MT> > MT> >>>>> timestamp is considered an approximate point of reboot 
from which the uptime
MT> > MT> > MT> >>>>> can be calculated, until the next reboot and so on.
MT> > MT> > MT> >>>> Ahh, I see. The patch we are talking about is not intended 
to protect against
MT> > MT> > MT> >>>> continuous monitoring, which is something you can always 
do. You could even
MT> > MT> > MT> >>>> watch for service availability and detect reboots. A 
change of the local key
MT> > MT> > MT> >>>> would also look similar to a reboot without a temporary 
loss of connectivity.
MT> > MT> > MT> >>>> 
MT> > MT> > MT> >>>> Thanks for the clarification.
MT> > MT> > MT> >>>>> 
MT> > MT> > MT> >>>>>>> 
MT> > MT> > MT> >>>>>>> There is the list of example hosts that we were able to 
reproduce the 
MT> > MT> > MT> >>>>>>> issue with:
MT> > MT> > MT> >>>>>>> 
MT> > MT> > MT> >>>>>>> curl -v http://88.99.60.171:80
MT> > MT> > MT> >>>>>>> curl -v http://163.172.71.252:80
MT> > MT> > MT> >>>>>>> curl -v http://5.9.242.150:80
MT> > MT> > MT> >>>>>>> curl -v https://185.134.205.105:443
MT> > MT> > MT> >>>>>>> curl -v https://136.243.1.231:443
MT> > MT> > MT> >>>>>>> curl -v https://144.76.196.4:443
MT> > MT> > MT> >>>>>>> curl -v http://94.127.191.194:80
MT> > MT> > MT> >>>>>>> 
MT> > MT> > MT> >>>>>>> To reproduce, call curl repeatedly with a same URL some 
number of times. 
MT> > MT> > MT> >>>>>>> You are going  to see some of the requests stuck in 
MT> > MT> > MT> >>>>>>> `*    Trying XXX.XXX.XXX.XXX...`
MT> > MT> > MT> >>>>>>> 
MT> > MT> > MT> >>>>>>> For some reason, the easiest way to reproduce the issue 
is with nc:
MT> > MT> > MT> >>>>>>> 
MT> > MT> > MT> >>>>>>> $ echo "foooooo" | nc -v 88.99.60.171 80
MT> > MT> > MT> >>>>>>> 
MT> > MT> > MT> >>>>>>> Only a few such calls are required until one of them is 
stuck on connect():
MT> > MT> > MT> >>>>>>> issuing SYN packets with an exponential backoff.
MT> > MT> > MT> >>>>>> Thanks for providing an end-point to test with. I'll 
take a look.
MT> > MT> > MT> >>>>>> Just to be clear: You are running a FreeBSD client 
against one of the above
MT> > MT> > MT> >>>>>> servers and experience the problem with the new 
timestamp computations.
MT> > MT> > MT> >>>>>> 
MT> > MT> > MT> >>>>>> You are not running arbitrary clients against a FreeBSD 
server...
MT> > MT> > MT> >>>>> 
MT> > MT> > MT> >>>>> We are talking about FreeBSD being the client. Peers that 
yield this unwanted
MT> > MT> > MT> >>>>> behaviour are unknown. Little bit of tinkering showed 
that some of them run 
MT> > MT> > MT> >>>>> Debian:
MT> > MT> > MT> >>>>> 
MT> > MT> > MT> >>>>> telnet 88.99.60.171 22
MT> > MT> > MT> >>>>> Trying 88.99.60.171...
MT> > MT> > MT> >>>>> Connected to 88.99.60.171.
MT> > MT> > MT> >>>>> Escape character is '^]'.
MT> > MT> > MT> >>>>> SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3
MT> > MT> > MT> >>>> Also some are hosted by Hetzner, but not all. I'll will 
look into
MT> > MT> > MT> >>>> this tomorrow, since I'm on a deadline today (well it is 
2am tomorrow
MT> > MT> > MT> >>>> morning, to be precise)...
MT> > MT> > MT> >>> 
MT> > MT> > MT> >>> Thanks a lot, I would appreciate that.
MT> > MT> > MT> >> Hi Paul,
MT> > MT> > MT> >> 
MT> > MT> > MT> >> I have looked into this.
MT> > MT> > MT> >> 
MT> > MT> > MT> >> * The FreeBSD behaviour is the one which is specified in the 
last bullet item
MT> > MT> > MT> >>  in https://tools.ietf.org/html/rfc7323#section-5.4
MT> > MT> > MT> >>  It is also the one, which is RECOMMENDED in
MT> > MT> > MT> >>  https://tools.ietf.org/html/rfc7323#section-7.1 
MT> > MT> > MT> >> 
MT> > MT> > MT> >> * My NAT box (a popular one in Germany) does NOT rewrite TCP 
timestamps.
MT> > MT> > MT> >> 
MT> > MT> > MT> >> This means that the host you are referring to have some sort 
of protection,
MT> > MT> > MT> >> which makes incorrect assumptions. It will also break 
multiple hosts behind
MT> > MT> > MT> >> a NAT.
MT> > MT> > MT> >> 
MT> > MT> > MT> >> I can run
MT> > MT> > MT> >> curl -v http://88.99.60.171:80
MT> > MT> > MT> >> in a loop without any problems from a FreeBSD head system. I 
tested 1000
MT> > MT> > MT> >> iterations or so. The TS.val is jumping up and down as 
expected.
MT> > MT> > MT> >> I'm wondering why you are observing errors in this case, too.
MT> > MT> > MT> >> 
MT> > MT> > MT> >> However, doing something like
MT> > MT> > MT> >> echo "foooooo" | nc -v 88.99.60.171 80
MT> > MT> > MT> >> triggers the problem.
MT> > MT> > MT> >> 
MT> > MT> > MT> >> So I think there is some functionality (in a middlebox or 
running on the host),
MT> > MT> > MT> >> which incorrectly assume monotonic timestamps between 
multiple TCP connections
MT> > MT> > MT> >> coming from the same IP address, but only in case of errors 
at the application layer.
MT> > MT> > MT> > 
MT> > MT> > MT> > Yeah, exactly, some hosts seem to enable this only in case of 
an error in HTTP
MT> > MT> > MT> > communication (some smart proxy?). However, there are some 
that behave this way
MT> > MT> > MT> > regardless of errors, for example these:
MT> > MT> > MT> > 
MT> > MT> > MT> > curl -v https://185.134.205.105:443
MT> > MT> > MT> > curl -v https://136.243.1.231:443
MT> > MT> > MT> Wireshark sees an Encrypted Alert in both cases. So I guess 
this is another indication
MT> > MT> > MT> of "error at the application layer".
MT> > MT> > MT> > 
MT> > MT> > MT> >> 
MT> > MT> > MT> >> Do you have any insights whether the hosts you are listed 
share something in
MT> > MT> > MT> >> common. Some of them are hosted by Hetzner, but not all.
MT> > MT> > MT> > 
MT> > MT> > MT> > Nope. A whole set of endpoints that we have detected so far 
is pretty diverse,
MT> > MT> > MT> > containing a lot of different locations geographically, as 
well as different
MT> > MT> > MT> > hosters.
MT> > MT> > MT> OK. Thanks for the clarification.
MT> > MT> > MT> > 
MT> > MT> > MT> >> 
MT> > MT> > MT> >> I think in general, it is the correct thing to include the 
port numbers in
MT> > MT> > MT> >> the offset computation. We might add a sysctl variable to 
control the inclusion.
MT> > MT> > MT> >> This would allow interworking with broken middleboxes.
MT> > MT> > MT> > 
MT> > MT> > MT> > Yeah, I completely agree that these rare cases should not 
dictate the implementation.
MT> > MT> > MT> > But an ability to enable a work-around via sysctl would be 
greatly appreciated.
MT> > MT> > MT> > Currently we are unable to roll-out the upgrade across all 
servers because of this
MT> > MT> > MT> > issue: even though it happens not so often, a lot of requests 
from our users 
MT> > MT> > MT> > get stuck or fail all together. For example, a host 
185.134.205.105 is a kind of
MT> > MT> > MT> > social network that our proxy servers connect to so securely 
access to content,
MT> > MT> > MT> > such as images, on behalf of our users.
MT> > MT> > MT> > 
MT> > MT> > MT> >> 
MT> > MT> > MT> >> Please note, this does not fix the case of multiple clients 
behind a NAT.
MT> > MT> > MT> > 
MT> > MT> > MT> > Yeah, that's true. Fortunately we don't use NAT.
MT> > MT> > MT> > 
MT> > MT> > MT> >> 
MT> > MT> > MT> >> I'm also trying to figure out how and why Linux and Windows 
are handling this.
MT> > MT> > MT> > 
MT> > MT> > MT> > Thanks for bothering!
MT> > MT> > MT> Will let you know what I figure out.
MT> > MT> > MT> 
MT> > MT> > MT> Best regards
MT> > MT> > MT> Michael
MT> > MT> > MT> > 
MT> > MT> > MT> >> 
MT> > MT> > MT> >> Best regards
MT> > MT> > MT> >> Michael
MT> > MT> > MT> >> 
MT> > MT> > MT> >>> 
MT> > MT> > MT> >>>> 
MT> > MT> > MT> >>>> Best regards
MT> > MT> > MT> >>>> Michael 
MT> > MT> > MT> >>>>> 
MT> > MT> > MT> >>>>> 
MT> > MT> > MT> >>>>>> 
MT> > MT> > MT> >>>>>> Best regards
MT> > MT> > MT> >>>>>> Michael
MT> > MT> > MT> >>>>>> 
MT> > MT> > MT> >>>>>> 
MT> > MT> > MT> >>>> 
MT> > MT> > MT> >>>> 
MT> > MT> > MT> >> 
MT> > MT> > MT> >> 
MT> > MT> > MT> 
MT> > MT> > MT> _______________________________________________
MT> > MT> > MT> freebsd-net@freebsd.org mailing list
MT> > MT> > MT> https://lists.freebsd.org/mailman/listinfo/freebsd-net
MT> > MT> > MT> To unsubscribe, send any mail to 
"freebsd-net-unsubscr...@freebsd.org"
MT> > MT> 
MT> > MT> _______________________________________________
MT> > MT> freebsd-net@freebsd.org mailing list
MT> > MT> https://lists.freebsd.org/mailman/listinfo/freebsd-net
MT> > MT> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
MT> 
_______________________________________________
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: Issues with TCP Timestamps allocation

Reply via email to