Re: statement regarding keepalives

2018-08-17 Thread Tom Herbert
On Fri, Aug 17, 2018 at 4:06 PM, Joe Touch  wrote:
>
>
>
> On 2018-08-17 14:13, Tom Herbert wrote:
>
> On Fri, Aug 17, 2018 at 1:31 PM, Joe Touch  wrote:
>
>
>
> If you KNOW that the app keepalive will cause the TCP transmission, sure -
> but how do you KNOW that? You don't and can't. Even if you write to the TCP
> socket, all you know when the socket returns is that the data was copied to
> the kernel. You don't know for sure that you've triggered a TCP packet.
>
> Actually, you do know that information. Application keepalives are
> request/response messages sent in TCP data. When a response is
> received to keepalive request over the TCP connection that is proof
> that the keepalive was sent.
>
>
> Yes, but the keepalive itself is a guarantee of nothing. It is the keepalive
> ACK that matters.
>
>
>
> If the application keepalive was sent on
> the socket, and no response is received before the application timer
> expires, then the application declares the the connection dead and
> most likely will just close the socket and try to reconnect. The fact
> that an application keepalive request, or its response, might be stuck
> in a TCP send buffer (e.g. peer rcv window is zero) versus the peer
> host completely disappeared is irrelevant. To the application it's all
> the same, a connection to a peer application has failed and action
> needs to be taken.
>
>
> In that case it never mattered whether TCP had a keepalive or whether the
> app action interacted with that TCP keepalive.
>
> What mattered was that the app-app communication was maintained.
>
> Again, this reiterates my point - run keepalives at the layer that matter to
> you. Ignore how they affect other layers; they will (and should) take care
> of themselves.

Joe,

I agree to the extent that keepalives are run only at one layer with
one keepalive control loop. If they are simultaneously done at
multiple layers without consideration that can be problematic. It's
also probably a good general recommendation to do keepalives only at
the application (highest layer) if at all possible. Most modern
application protocols support them (HTTP, RPC protocols, etc). There
are still protocols like telnet ssh would need TCP keepalives. TCP
keepalives are a weaker signal and have become increasingly prone to
false positives. For instance, if a connection goes through a
transparent proxy, a TCP keepalive would only verify the liveness of a
connection to the proxy, not TCP all the way to the peer unbeknownst
to the user.

Tom

>
> Joe
>
>
>
>



Re: statement regarding keepalives

2018-08-17 Thread Joe Touch
On 2018-08-17 14:13, Tom Herbert wrote:

> On Fri, Aug 17, 2018 at 1:31 PM, Joe Touch  wrote: 
> 
>> If you KNOW that the app keepalive will cause the TCP transmission, sure -
>> but how do you KNOW that? You don't and can't. Even if you write to the TCP
>> socket, all you know when the socket returns is that the data was copied to
>> the kernel. You don't know for sure that you've triggered a TCP packet.
> Actually, you do know that information. Application keepalives are
> request/response messages sent in TCP data. When a response is
> received to keepalive request over the TCP connection that is proof
> that the keepalive was sent.

Yes, but the keepalive itself is a guarantee of nothing. It is the
keepalive ACK that matters. 

> If the application keepalive was sent on
> the socket, and no response is received before the application timer
> expires, then the application declares the the connection dead and
> most likely will just close the socket and try to reconnect. The fact
> that an application keepalive request, or its response, might be stuck
> in a TCP send buffer (e.g. peer rcv window is zero) versus the peer
> host completely disappeared is irrelevant. To the application it's all
> the same, a connection to a peer application has failed and action
> needs to be taken.

In that case it never mattered whether TCP had a keepalive or whether
the app action interacted with that TCP keepalive. 

What mattered was that the app-app communication was maintained. 

Again, this reiterates my point - run keepalives at the layer that
matter to you. Ignore how they affect other layers; they will (and
should) take care of themselves. 

Joe

Re: statement regarding keepalives

2018-08-17 Thread Tom Herbert
On Fri, Aug 17, 2018 at 1:31 PM, Joe Touch  wrote:
>
>
>
>
> On 2018-08-17 11:43, Tom Herbert wrote:The purpose of an application keep
> alive is not to do favors for TCP,
>
> it's to verify the end to end liveness between application end points.
> This is at a much higher layer, verifying liveness of the TCP
> connection is a side effect.
>
>
> Sure - that's fine and not what I'm concerned about.
>
> I don't want the text to say that higher level protocols or apps should try
> to do favors to keepalive lower level protocols - because it doesn't
> necessarily work.
>
>
>
>
> However, if that 1GB goes out in 10 seconds, then TCP would have sent its
> own keepalives just fine. It didn't need the app's help.
>
> So the app didn't help at all; at best, it does nothing and at worst it
> hurts.
>
>
> Consider that someone sets an application keepalive to 35 second
> interval and the TCP keepalive timer is 30 seconds. When the
> connection goes idle TCP keepalive will fire at thirty seconds, and
> five seconds later the application keepalive fires. So every
> thirty-five seconds two keepalives are done at two layers. This is not
> good as it wastes network resources and power.
>
>
> Agreed.
>
>
> In this case, the
> application keepalive is sufficient
>
>
> In this *implementation* it *might* be sufficient, in others, it might not.
> There's simply no way for the layers to know.
>
>
> and the TCP keepalive shouldn't be
> used.
>
>
> If you KNOW that the app keepalive will cause the TCP transmission, sure -
> but how do you KNOW that? You don't and can't. Even if you write to the TCP
> socket, all you know when the socket returns is that the data was copied to
> the kernel. You don't know for sure that you've triggered a TCP packet.
>
Actually, you do know that information. Application keepalives are
request/response messages sent in TCP data. When a response is
received to keepalive request over the TCP connection that is proof
that the keepalive was sent. If the application keepalive was sent on
the socket, and no response is received before the application timer
expires, then the application declares the the connection dead and
most likely will just close the socket and try to reconnect. The fact
that an application keepalive request, or its response, might be stuck
in a TCP send buffer (e.g. peer rcv window is zero) versus the peer
host completely disappeared is irrelevant. To the application it's all
the same, a connection to a peer application has failed and action
needs to be taken.

Tom

> Besides, your "keepalives" might end up causing TCP to send packets it never
> needed to send in the first place - even IF you think you're doing it a
> favor.
>
>
>
> This is an example of the problems in running two control loops
> at different layers with overlapping functionality,
>
>
>
> The problem is trying to infer overlap in functionality. If you realize that
> these are independent control loops *and leave them alone* you're fine.
>
Independence of control loops does not mean they can't conflict.
Mulitple layers performing keepalives is just one example, and
probably one with lesser insidious behavoir. Look at
http://qurinet.ucdavis.edu/pubs/conf/wicon2008.pdf for good example
how link layer retransmissions can conflict with TCP algorithms to
produce really bad results.

Tom

> It's only in trying to optimize them as overlapping that a problem is
> created.
>
>
>
>
> if the
> ramifications of doing aren't understood it can lead to undesirable
> interactions and behavior.
>
>
> Agreed - so don't. Admit that there are inefficiencies *regardless of how
> hard you try to do otherwise* and leave them alone, IMO.
>
>
> If the app needs an app-level keepalive, do it.
>
> If the app wants TCP to be kept alive, let IT do it and leave it alone.
>
> Don't try to couple the two because you can't, and whatever you think you
> might gain you could easily lose. Leaving the two alone and separate is
> sufficient and robust.
>
> Joe



Re: statement regarding keepalives

2018-08-17 Thread Joe Touch
On 2018-08-17 11:43, Tom Herbert wrote:The purpose of an application
keep alive is not to do favors for TCP,

> it's to verify the end to end liveness between application end points.
> This is at a much higher layer, verifying liveness of the TCP
> connection is a side effect.

Sure - that's fine and not what I'm concerned about. 

I don't want the text to say that higher level protocols or apps should
try to do favors to keepalive lower level protocols - because it doesn't
necessarily work. 

>> However, if that 1GB goes out in 10 seconds, then TCP would have sent its
>> own keepalives just fine. It didn't need the app's help.
>> 
>> So the app didn't help at all; at best, it does nothing and at worst it
>> hurts.
> 
> Consider that someone sets an application keepalive to 35 second
> interval and the TCP keepalive timer is 30 seconds. When the
> connection goes idle TCP keepalive will fire at thirty seconds, and
> five seconds later the application keepalive fires. So every
> thirty-five seconds two keepalives are done at two layers. This is not
> good as it wastes network resources and power.

Agreed. 

> In this case, the
> application keepalive is sufficient

In this *implementation* it *might* be sufficient, in others, it might
not. There's simply no way for the layers to know. 

> and the TCP keepalive shouldn't be
> used.

If you KNOW that the app keepalive will cause the TCP transmission, sure
- but how do you KNOW that? You don't and can't. Even if you write to
the TCP socket, all you know when the socket returns is that the data
was copied to the kernel. You don't know for sure that you've triggered
a TCP packet. 

Besides, your "keepalives" might end up causing TCP to send packets it
never needed to send in the first place - even IF you think you're doing
it a favor. 

> This is an example of the problems in running two control loops
> at different layers with overlapping functionality,

The problem is trying to infer overlap in functionality. If you realize
that these are independent control loops *and leave them alone* you're
fine. 

It's only in trying to optimize them as overlapping that a problem is
created. 

> if the
> ramifications of doing aren't understood it can lead to undesirable
> interactions and behavior.

Agreed - so don't. Admit that there are inefficiencies *regardless of
how hard you try to do otherwise* and leave them alone, IMO. 

If the app needs an app-level keepalive, do it. 

If the app wants TCP to be kept alive, let IT do it and leave it alone. 

Don't try to couple the two because you can't, and whatever you think
you might gain you could easily lose. Leaving the two alone and separate
is sufficient and robust. 

Joe

Re: statement regarding keepalives

2018-08-17 Thread Tom Herbert
On Fri, Aug 17, 2018 at 10:27 AM, Joe Touch  wrote:
>
>
>
>
> On 2018-08-17 09:05, Tom Herbert wrote:
>
> On Fri, Aug 17, 2018 at 7:40 AM, Joe Touch  wrote:
>
>
> ...
> It's not subtle. There's no way to know whether keepalives at a higher level
> have any desired affect at the lower level at all - except using Wireshark
> to trace the packets sent.
>
> I don't think that's necessarily true. RFC1122 states:
>
> "Keep-alive packets MUST only be sent when no data or acknowledgement
> packets have been received for the connection within an interval."
>
>
>
> That's Sec 4.2.3.6. and it's talking about what TCP does inside TCP.
>
> It's not talking about actions by layers above TCP. For all TCP knows, a
> user might have tried to send data that's been hung up in the OS. There's
> simply no specific way to know that anything above TCP causes TCP to do
> anything per se; even if an upper layer protocol does a TCP_SEND() directly,
> TCP might stall that data because of other things going on.
>
>
> So if an application is performing keepalives by sending and receiving
> keepalive messages over the connection then that is enough to supress
> TCP keepalives.
>
>
>
> That may or may not be true, but it's for TCP to decide for itself. If the
> data isn't getting down to TCP in a way that causes TCP to send data before
> a TCP keepalive timer expires, TCP will - and should - send a keepalive. If
> the data does cause that timer to be reset, then that's for TCP to know.
>
>
> For instance, if the period of application sending
> keepalives on a connection is less then the one for TCP keepalives,
> then there should be no TCP keepalives ever sent on the connection (if
> Wireshark is showing otherwise then that might be a bug in the
> implementation).
>
>
> Consider an app that writes 1GB to TCP every day. If TCP sends that out
> slowly (for whatever reason), it's possible no TCP keepalives will ever be
> sent. An app that thinks it's doing TCP a favor by sending an app keepalive
> every 1.9 hrs (just under the 2 hour default config) would simply be causing
> TCP to do unnecessary work.
>
The purpose of an application keep alive is not to do favors for TCP,
it's to verify the end to end liveness between application end points.
This is at a much higher layer, verifying liveness of the TCP
connection is a side effect.

> However, if that 1GB goes out in 10 seconds, then TCP would have sent its
> own keepalives just fine. It didn't need the app's help.
>
> So the app didn't help at all; at best, it does nothing and at worst it
> hurts.

Consider that someone sets an application keepalive to 35 second
interval and the TCP keepalive timer is 30 seconds. When the
connection goes idle TCP keepalive will fire at thirty seconds, and
five seconds later the application keepalive fires. So every
thirty-five seconds two keepalives are done at two layers. This is not
good as it wastes network resources and power. In this case, the
application keepalive is sufficient and the TCP keepalive shouldn't be
used. This is an example of the problems in running two control loops
at different layers with overlapping functionality, if the
ramifications of doing aren't understood it can lead to undesirable
interactions and behavior.

Tom

>
> Joe
>
>



Re: statement regarding keepalives

2018-08-17 Thread Joe Touch
On 2018-08-17 09:05, Tom Herbert wrote:

> On Fri, Aug 17, 2018 at 7:40 AM, Joe Touch  wrote: 
> 
>> ...
>> It's not subtle. There's no way to know whether keepalives at a higher level 
>> have any desired affect at the lower level at all - except using Wireshark 
>> to trace the packets sent.
> I don't think that's necessarily true. RFC1122 states:
> 
> "Keep-alive packets MUST only be sent when no data or acknowledgement
> packets have been received for the connection within an interval."

That's Sec 4.2.3.6. and it's talking about what TCP does inside TCP. 

It's not talking about actions by layers above TCP. For all TCP knows, a
user might have tried to send data that's been hung up in the OS.
There's simply no specific way to know that anything above TCP causes
TCP to do anything per se; even if an upper layer protocol does a
TCP_SEND() directly, TCP might stall that data because of other things
going on. 

> So if an application is performing keepalives by sending and receiving
> keepalive messages over the connection then that is enough to supress
> TCP keepalives.

That may or may not be true, but it's for TCP to decide for itself. If
the data isn't getting down to TCP in a way that causes TCP to send data
before a TCP keepalive timer expires, TCP will - and should - send a
keepalive. If the data does cause that timer to be reset, then that's
for TCP to know. 

> For instance, if the period of application sending
> keepalives on a connection is less then the one for TCP keepalives,
> then there should be no TCP keepalives ever sent on the connection (if
> Wireshark is showing otherwise then that might be a bug in the
> implementation).

Consider an app that writes 1GB to TCP every day. If TCP sends that out
slowly (for whatever reason), it's possible no TCP keepalives will ever
be sent. An app that thinks it's doing TCP a favor by sending an app
keepalive every 1.9 hrs (just under the 2 hour default config) would
simply be causing TCP to do unnecessary work. 

However, if that 1GB goes out in 10 seconds, then TCP would have sent
its own keepalives just fine. It didn't need the app's help. 

So the app didn't help at all; at best, it does nothing and at worst it
hurts. 

Joe

Re: statement regarding keepalives

2018-08-17 Thread Tom Herbert
On Fri, Aug 17, 2018 at 7:40 AM, Joe Touch  wrote:
>
>
>> On Aug 16, 2018, at 3:57 PM, Benjamin Kaduk  wrote:
>>
>> On Thu, Aug 16, 2018 at 03:52:54PM -0700, Joe Touch wrote
>
>>>
>>> On Aug 16, 2018, at 3:10 PM, Benjamin Kaduk  wrote:
>>>
> Keepalives at a layer SHOULD NOT be interpreted as implying state at
> any other layer.

 What's going on here in the last sentence is probably a bit subtle -- a
 keeaplive both does not indicate "real" protocol activity but also can
 serve to exercise the lower protocol layers (and, even, per the previous
 sentence, suppresses their keepalives).
>>>
>>> That may be intended but is never actually known. Lower layers can 
>>> compress, cache, merge, and otherwise change the effect a transmission st 
>>> one layer has on any other.
>>
>> Right, that's why it's subtle :)
>
> It’s not subtle. There’s no way to know whether keepalives at a higher level 
> have any desired affect at the lower level at all - except using Wireshark to 
> trace the packets sent.
>
I don't think that's necessarily true. RFC1122 states:

"Keep-alive packets MUST only be sent when no data or acknowledgement
packets have been received for the connection within an interval."

So if an application is performing keepalives by sending and receiving
keepalive messages over the connection then that is enough to supress
TCP keepalives. For instance, if the period of application sending
keepalives on a connection is less then the one for TCP keepalives,
then there should be no TCP keepalives ever sent on the connection (if
Wireshark is showing otherwise then that might be a bug in the
implementation).

Tom

> That’s why users SHOULD NOT try to affect lower level keepalives using higher 
> level ones.  (it’s not MUST NOT because there’s no strict harm, except that 
> it simply can’t be known whether it achieved its desired effect).
>
> Joe



Re: statement regarding keepalives

2018-08-17 Thread Joe Touch



> On Aug 16, 2018, at 3:57 PM, Benjamin Kaduk  wrote:
> 
> On Thu, Aug 16, 2018 at 03:52:54PM -0700, Joe Touch wrote

>> 
>> On Aug 16, 2018, at 3:10 PM, Benjamin Kaduk  wrote:
>> 
 Keepalives at a layer SHOULD NOT be interpreted as implying state at
 any other layer.
>>> 
>>> What's going on here in the last sentence is probably a bit subtle -- a
>>> keeaplive both does not indicate "real" protocol activity but also can
>>> serve to exercise the lower protocol layers (and, even, per the previous
>>> sentence, suppresses their keepalives).
>> 
>> That may be intended but is never actually known. Lower layers can compress, 
>> cache, merge, and otherwise change the effect a transmission st one layer 
>> has on any other. 
> 
> Right, that's why it's subtle :)

It’s not subtle. There’s no way to know whether keepalives at a higher level 
have any desired affect at the lower level at all - except using Wireshark to 
trace the packets sent.

That’s why users SHOULD NOT try to affect lower level keepalives using higher 
level ones.  (it’s not MUST NOT because there’s no strict harm, except that it 
simply can’t be known whether it achieved its desired effect).

Joe


Re: statement regarding keepalives

2018-08-16 Thread Benjamin Kaduk
On Thu, Aug 16, 2018 at 03:52:54PM -0700, Joe Touch wrote:
> 
> 
> On Aug 16, 2018, at 3:10 PM, Benjamin Kaduk  wrote:
> 
> >> Keepalives at a layer SHOULD NOT be interpreted as implying state at
> >> any other layer.
> > 
> > What's going on here in the last sentence is probably a bit subtle -- a
> > keeaplive both does not indicate "real" protocol activity but also can
> > serve to exercise the lower protocol layers (and, even, per the previous
> > sentence, suppresses their keepalives).
> 
> That may be intended but is never actually known. Lower layers can compress, 
> cache, merge, and otherwise change the effect a transmission st one layer has 
> on any other. 

Right, that's why it's subtle :)

-Benjamin

> Protocols should avoid trying to do this. 
> 
> Joe



Re: statement regarding keepalives

2018-08-16 Thread Joe Touch



On Aug 16, 2018, at 3:10 PM, Benjamin Kaduk  wrote:

>> Keepalives at a layer SHOULD NOT be interpreted as implying state at
>> any other layer.
> 
> What's going on here in the last sentence is probably a bit subtle -- a
> keeaplive both does not indicate "real" protocol activity but also can
> serve to exercise the lower protocol layers (and, even, per the previous
> sentence, suppresses their keepalives).

That may be intended but is never actually known. Lower layers can compress, 
cache, merge, and otherwise change the effect a transmission st one layer has 
on any other. 

Protocols should avoid trying to do this. 

Joe


Re: statement regarding keepalives

2018-08-16 Thread Benjamin Kaduk
It's great to see the good discussion happening here.  I'll note a couple
things online, and also that given the discussion of the tradeoffs that go
into making these decisions, Tom is probably right that shoving this into
an I-D would be helpful.

On Wed, Aug 15, 2018 at 05:35:28PM +, Kent Watsen wrote:
> 
> Below is an updated version of some text that we might roll into
> a statement or an I-D of some sort.  Kindly review and provide 
> suggestions for improvement, or support for the text as is, if
> that is the case.  ;)
> 
> This update accommodates comments from:
>   - Wesley Eddy & David Black
>  - removed "layers of functionality" verbiage
>  - moved footnote into the body of the document (this had
>a cascading effect, and why it looks so different now)
>   - Joe Touch
>  - keepalives should occur at *all* layers that benefit
>  - keepalives at a layer should be suppressed in the 
>presence of sufficient traffic from higher layers
>  - keepalives at a layer should not be interpreted as
>implying state at any other layer
> 
> This update does not accommodate comments from:
>   - Michael Abrahamsson & Tom Herbert
>  - no statement added to promote TCP keepalives
> * note: I believe this to be unnecessary because 
>   the current text doesn't ever say to not use TCP.
>  - no statement added for tuning params (e.g., timeouts).
> * note: we could add this, but it will increase the
>   scope of the document - do we want to do this?
> 
> Cheers!
> Kent
> 
> 
> = START =
> 
> # Connection Strategies for Long-lived Connections
> 
> A networked device may have an ongoing need to interact with a remote
> device. Sometimes the need arises from wanting to push data to the
> remote device, and sometimes the need arises from wanting to check if
> there is any data the remote device may have pending to deliver to
> it.
> 
> There are two fundamental network connection strategies that can be
> used to accomplish this goal: 1) a single long-lived connection and
> 2) a sequence of short-lived connections.
> 
> A single long-lived connection is most common, as it is
> straightforward to implement and directly answers the question of 
> if the "connection" is established. However, long-lived connections
> require more system resources, which may affect scalability, and
> require the initiator of the connection to periodically test the
> aliveness of the remote device, discussed further in the next 
> section.
> 
> A sequence of short-lived connections is less common, as there is an
> additional implementation effort, as well as concerns such as: 1) the
> delay of the remote device needing to wait until the connection is
> reestablished in order to deliver pending data, and 2) the additional
> latency incurred from starting new connections, especially when
> cryptology is involved. However, short-lived connections do not

(nit: "cryptography" is probably better than "cryptology")

> require keepalives and are arguably more secure, as each device is
> forced to re-authenticate the other and reload all related
> access-control policies on each connection.
> 
> For networking sessions that are primarily quiet, and the use case
> can cope with the additional latency of waiting for and starting new
> connections, it is RECOMMENDED to use a sequence of short-lived
> connections, instead of maintaining a single long-lived connection
> using aliveness checks.
> 
> 
> # Keepalives for Persistent Connections
> 
> When the initiator of a networking session needs to maintain a
> long-lived connection, it is necessary for it to periodically test
> the aliveness of the remote device. In such cases, it is RECOMMENDED
> that the aliveness check happens at the highest protocol layer
> possible that is meaningful to the application, in order to maximize
> the depth of the aliveness check.
> 
> For example, for an HTTPS connection to a simple webserver,
> HTTP-level keepalives would test more layers of functionality than
> TLS-level keepalives. However, for a webserver that is accessed via a
> load-balancer that terminates TLS connections, TLS-level aliveness
> checks may be the most meaningful check that can be performed.
> 
> More generally, it is RECOMMENDED that applications be able to
> perform the aliveness checks at all protocol levels that benefit, but
> suppress the aliveness checks at lower protocol layers from occurring
> when there is sufficient activity at higher protocol layers.
> Keepalives at a layer SHOULD NOT be interpreted as implying state at
> any other layer.

What's going on here in the last sentence is probably a bit subtle -- a
keeaplive both does not indicate "real" protocol activity but also can
serve to exercise the lower protocol layers (and, even, per the previous
sentence, suppresses their keepalives).  Though I'm not sure it's correct
to change this to just "at higher layers", since (as was pointed out

Re: statement regarding keepalives

2018-08-16 Thread Tom Herbert
On Thu, Aug 16, 2018 at 8:01 AM, Mikael Abrahamsson  wrote:
> On Thu, 16 Aug 2018, Tom Herbert wrote:
>
>> They are already on, TCP has a default keepalive for 2 hrs. The issue
>
>
> http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html says:
>
> "Remember that keepalive support, even if configured in the kernel, is not
> the default behavior in Linux."
>
> Which OSes default to this on?
>
Linux, but I misread the code. The default keepalive timeout is 2
hrs., but it needs to be enabled per socket.

Tom

>
> --
> Mikael Abrahamssonemail: swm...@swm.pp.se



Re: statement regarding keepalives

2018-08-16 Thread Mikael Abrahamsson

On Thu, 16 Aug 2018, Tom Herbert wrote:


They are already on, TCP has a default keepalive for 2 hrs. The issue


http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html says:

"Remember that keepalive support, even if configured in the kernel, is not 
the default behavior in Linux."


Which OSes default to this on?

--
Mikael Abrahamssonemail: swm...@swm.pp.se



Re: statement regarding keepalives

2018-08-16 Thread Joe Touch


> On Aug 16, 2018, at 7:38 AM, Tom Herbert  wrote:
> 
> They are already on, TCP has a default keepalive for 2 hrs. 

RFC1122 says that keepalives are optional and MUST default to off.

This has already been included in RFC793bis.

Joe

Re: statement regarding keepalives

2018-08-16 Thread Tom Herbert
On Thu, Aug 16, 2018 at 12:44 AM, Olle E. Johansson  wrote:
>
>
> On 16 Aug 2018, at 09:28, Mikael Abrahamsson  wrote:
>
> On Wed, 15 Aug 2018, Kent Watsen wrote:
>
> You bring up an interesting point, it goes to the motivation for wanting to
> do keepalives in the first place.  The text doesn't yet mention maintain
> flow state as a motivation.
>
>
> It's not only to maintain flow state, it's also to close the connection when
> the network goes down and doesn't work anymore, and "give up" on connections
> that doesn't work anymore (for some definition of "anymore").
>
> I have operationally been in the situation where a server/client application
> was implemented so that the server could only handle 256 connections (some
> filedescriptor limit). Every time the firewall was rebooted, lost state, the
> connection hung around forever. So the server administrators had to go in
> and restart the process to clear these connections, otherwise there were 256
> hung connections and no new connections could be established.
>
> Sometimes the other endpoint goes down, and doesn't come back. We will for
> instance deploy home gateways probably keeping netconf-call-home sessions to
> an NMS, and we want them to be around forever, as long as they work. TCP
> level keepalives would solve this, as if the customer just powers off the
> device, after a while the session will be cleared. Using TCP keepalives here
> means you get this kind of behaviour even if the upper-layer application
> doesn't support it (netconf might have been a bad example here). It's a
> single socket option to set, so it's very easy to do.
>
> From knowing approximately what settings people have in their NAT44 and
>
> firewalls etc, I'd say the recommendation should be that keepalives are set
> to around 60-300 second interval, and then kill the connection if no traffic
> has passed in 3-5 of these intervals, kill the connection. Otherwise TCP
> will have backed off so far anyway, that it's probably faster to just re-try
> the connection instead of waiting for TCP to re-send the packet.
>
> I have seen so many times in my 20 years working in networking where lack of
> keepalives have caused all kinds of problems. I wish everybody would turn it
> on and keep it on.
>
Olle,

They are already on, TCP has a default keepalive for 2 hrs. The issue
that is inevitably raised is that 2 hrs. is much too long a period for
maintaining NAT state (NAT timeouts are usuallu far less time). But,
as I pointed out already, sending keepalives at a higher frequency is
not devoid of cost nor problems.

Tom

>
> As more and more connections flow over mobile networks, it seems more and
> more important, even for flows you did not expect. I have to send keepalives
> over IPv6 connections - not for NAT as on IPv4. but for middlebox devices
> that has an interesting approach and attitude towards connection management.
> ;-)
>
> The SIP Outbound RFC has a lot of reasoning behind keep-alives for
> connection failover and may be good input here.
>
> https://tools.ietf.org/html/rfc5626
>
> /O



Re: statement regarding keepalives

2018-08-16 Thread Gorry Fairhurst
Adding some comments here. I'm playinh catch-up, so I may have comments 
on some things that have been fixed, and missed others.


On 16/08/2018, 08:28, Mikael Abrahamsson wrote:

On Wed, 15 Aug 2018, Kent Watsen wrote:

You bring up an interesting point, it goes to the motivation for 
wanting to do keepalives in the first place.  The text doesn't yet 
mention maintain flow state as a motivation.


It's not only to maintain flow state, it's also to close the 
connection when the network goes down and doesn't work anymore, and 
"give up" on connections that doesn't work anymore (for some 
definition of "anymore").


I have operationally been in the situation where a server/client 
application was implemented so that the server could only handle 256 
connections (some filedescriptor limit). Every time the firewall was 
rebooted, lost state, the connection hung around forever. So the 
server administrators had to go in and restart the process to clear 
these connections, otherwise there were 256 hung connections and no 
new connections could be established.


Sometimes the other endpoint goes down, and doesn't come back. We will 
for instance deploy home gateways probably keeping netconf-call-home 
sessions to an NMS, and we want them to be around forever, as long as 
they work. TCP level keepalives would solve this, as if the customer 
just powers off the device, after a while the session will be cleared. 
Using TCP keepalives here means you get this kind of behaviour even if 
the upper-layer application doesn't support it (netconf might have 
been a bad example here). It's a single socket option to set, so it's 
very easy to do.


Agree. I think if we look to the transport layer that allowing a flow to 
continue to use existing "network" state (in various forms) is an 
important aspect - there are NATs, Firewalls, QoS Classifiers, etc as 
well as load balancers, and layer 2/3's that take resource decisions at 
the flow level. Normally all of these do the correct thing when there is 
a continuous flow of packets.


Somewhere in the thread I also saw statement that suggested that 
asosciations should be short-lived - If that advice is carried to the 
transport layer, I would expect it to have serious impact on the 
performance for some paths! (There are important trade-offs here, and we 
should not make sweeping assumptions).
From knowing approximately what settings people have in their NAT44 and 
firewalls etc, I'd say the recommendation should be that keepalives 
are set to around 60-300 second interval, and then kill the connection 
if no traffic has passed in 3-5 of these intervals, kill the 
connection. Otherwise TCP will have backed off so far anyway, that 
it's probably faster to just re-try the connection instead of waiting 
for TCP to re-send the packet.


I have seen so many times in my 20 years working in networking where 
lack of keepalives have caused all kinds of problems. I wish everybody 
would turn it on and keep it on.


I agree.  I have the feeling that this is at all not easy advice to get 
correct in a general way (and this thread is quite there yet). e.g., RFC 
5245 set lower limits for timers - because that was thought important.


I don't agree that protocol stacks with a secure transport protocol 
layer (e.g., TLS, SSH, DTLS) that sits on top of a cleartext protocol 
layer (e.g., TCP, UDP) should be advised to do the aliveness check only 
within protection envelope afforded by the secure transport protocol 
layer - to me that seems entirely wrong - it has the same "issue" as a 
above, it depends on the function of the aliveness check and the way 
this is used by the layer's protocol machine. In many cases it is 
absolutely desirable to do this within the layer that needs this 
information. Passing the detailed state down between layers can be most 
awkward. Higher layers can make there own decisions - and suppress 
keep-alives or reaffirm state.


Guidance from the transport perspective on timers is in RFC8085 in 3.1.1 
, there is also more advice in the "behave" RFCs and a summary of the 
mechanisms in RFC8085 3.5 (noted by Lars)   The vulnerabilities are 
also noted in RFC8085, and I think we should be clear to differentiate 
between on-path versus off path knowledge when understanding this.


Gorry



Re: statement regarding keepalives

2018-08-16 Thread Olle E. Johansson


> On 16 Aug 2018, at 09:28, Mikael Abrahamsson  wrote:
> 
> On Wed, 15 Aug 2018, Kent Watsen wrote:
> 
>> You bring up an interesting point, it goes to the motivation for wanting to 
>> do keepalives in the first place.  The text doesn't yet mention maintain 
>> flow state as a motivation.
> 
> It's not only to maintain flow state, it's also to close the connection when 
> the network goes down and doesn't work anymore, and "give up" on connections 
> that doesn't work anymore (for some definition of "anymore").
> 
> I have operationally been in the situation where a server/client application 
> was implemented so that the server could only handle 256 connections (some 
> filedescriptor limit). Every time the firewall was rebooted, lost state, the 
> connection hung around forever. So the server administrators had to go in and 
> restart the process to clear these connections, otherwise there were 256 hung 
> connections and no new connections could be established.
> 
> Sometimes the other endpoint goes down, and doesn't come back. We will for 
> instance deploy home gateways probably keeping netconf-call-home sessions to 
> an NMS, and we want them to be around forever, as long as they work. TCP 
> level keepalives would solve this, as if the customer just powers off the 
> device, after a while the session will be cleared. Using TCP keepalives here 
> means you get this kind of behaviour even if the upper-layer application 
> doesn't support it (netconf might have been a bad example here). It's a 
> single socket option to set, so it's very easy to do.
> 
>> From knowing approximately what settings people have in their NAT44 and 
> firewalls etc, I'd say the recommendation should be that keepalives are set 
> to around 60-300 second interval, and then kill the connection if no traffic 
> has passed in 3-5 of these intervals, kill the connection. Otherwise TCP will 
> have backed off so far anyway, that it's probably faster to just re-try the 
> connection instead of waiting for TCP to re-send the packet.
> 
> I have seen so many times in my 20 years working in networking where lack of 
> keepalives have caused all kinds of problems. I wish everybody would turn it 
> on and keep it on.

As more and more connections flow over mobile networks, it seems more and more 
important, even for flows you did not expect. I have to send keepalives over 
IPv6 connections - not for NAT as on IPv4. but for middlebox devices that has 
an interesting approach and attitude towards connection management. ;-)

The SIP Outbound RFC has a lot of reasoning behind keep-alives for connection 
failover and may be good input here.

https://tools.ietf.org/html/rfc5626 

/O

Re: statement regarding keepalives

2018-08-16 Thread Mikael Abrahamsson

On Wed, 15 Aug 2018, Kent Watsen wrote:

You bring up an interesting point, it goes to the motivation for wanting 
to do keepalives in the first place.  The text doesn't yet mention 
maintain flow state as a motivation.


It's not only to maintain flow state, it's also to close the connection 
when the network goes down and doesn't work anymore, and "give up" on 
connections that doesn't work anymore (for some definition of "anymore").


I have operationally been in the situation where a server/client 
application was implemented so that the server could only handle 256 
connections (some filedescriptor limit). Every time the firewall was 
rebooted, lost state, the connection hung around forever. So the server 
administrators had to go in and restart the process to clear these 
connections, otherwise there were 256 hung connections and no new 
connections could be established.


Sometimes the other endpoint goes down, and doesn't come back. We will for 
instance deploy home gateways probably keeping netconf-call-home sessions 
to an NMS, and we want them to be around forever, as long as they work. 
TCP level keepalives would solve this, as if the customer just powers off 
the device, after a while the session will be cleared. Using TCP 
keepalives here means you get this kind of behaviour even if the 
upper-layer application doesn't support it (netconf might have been a bad 
example here). It's a single socket option to set, so it's very easy to 
do.


From knowing approximately what settings people have in their NAT44 and 
firewalls etc, I'd say the recommendation should be that keepalives are 
set to around 60-300 second interval, and then kill the connection if no 
traffic has passed in 3-5 of these intervals, kill the connection. 
Otherwise TCP will have backed off so far anyway, that it's probably 
faster to just re-try the connection instead of waiting for TCP to re-send 
the packet.


I have seen so many times in my 20 years working in networking where lack 
of keepalives have caused all kinds of problems. I wish everybody would 
turn it on and keep it on.


--
Mikael Abrahamssonemail: swm...@swm.pp.se



Re: statement regarding keepalives

2018-08-15 Thread Tom Herbert
On Wed, Aug 15, 2018 at 2:24 PM, Kent Watsen  wrote:
> Hi Tom,
>
>
>
>> Kent, I'm not sure what the context of formal text is. Is this write up
>> going
>
>> to be in an I-D, or is it intended to be published by some other
>> mechanism?
>
>
>
> That is a good question.  At first, we were thinking that is might be an
> AD-level
>
> statement, but I think Spencer last suggested it being put into an I-D that
> might
>
> be published by TVGWG, if the chairs were to support that idea.
>
Kent,

There's little cost for individuals to create and post
Internet-Drafts, and you can always ask later for adoption as a
working group item. I think this will be a lot easier to work with
once it's in I-D form. Also, that exercise should help to clarify
exactly what the intent of this work is. To me, it seems like the
intent is to make recommendations on protocol design concerning
keepalives, so I believe that is reasonable to target an informational
RFC or possibly even a BCP.

Tom

>
>
> For now, I'm just pecking away at the text.  I figure that how to publish it
>
> will make more sense the more we know what "it" is.  Does that work for
>
> you?
>
Thanks. I think this will be easier to work with as draft. I suspect
there's also some additional background material on keepalives
>
>
> Kent



Re: statement regarding keepalives

2018-08-15 Thread Kent Watsen
Hi Tom,

> Kent, I'm not sure what the context of formal text is. Is this write up going
> to be in an I-D, or is it intended to be published by some other mechanism?

That is a good question.  At first, we were thinking that is might be an 
AD-level
statement, but I think Spencer last suggested it being put into an I-D that 
might
be published by TVGWG, if the chairs were to support that idea.

For now, I'm just pecking away at the text.  I figure that how to publish it
will make more sense the more we know what "it" is.  Does that work for
you?

Kent


Re: statement regarding keepalives

2018-08-15 Thread Tom Herbert
On Wed, Aug 15, 2018, 1:56 PM Kent Watsen  wrote:

>
> Hi Tom,
>
> I recall you're mentioning NAT before.  It fell into a crack and I
> lost sight of it.
>
> You bring up an interesting point, it goes to the motivation for
> wanting to do keepalives in the first place.  The text doesn't
> yet mention maintain flow state as a motivation.
>
> The first paragraph of the "keepalives" section says:
>
>   When the initiator of a networking session needs to maintain a
>   long-lived connection, it is necessary for it to periodically test
>   the aliveness of the remote device.
>
> Would it make sense to adjust it to say the following?
>
>   When the initiator of a networking session needs to maintain a
>   long-lived connection, it is necessary for it to periodically
>   ensure network accessibility to and test the aliveness of the
>   remote device.  For instance, without keepalive, an intermediate
>   NAT or firewalls may evict the flow state for quiet connections
>   due to a timeout or least recently used policy.  Similarly, the
>   remote application process, while accessible, may be hung, thus
>   accounting for the reason why the connection is quiet.
>
>
>
> Regarding your other comment, that the discussion should "include
> considerations on the frequency of keepalives and their cost", it
> seems that you almost wrote the paragraph below.  Would you be
> willing to proffer some formal text we could paste in, maybe to
> the end of the "keepalives" section or another section?  If not,
> I can try to take a stab at it.
>

Kent, I'm not sure what the context of formal text is. Is this write up
going to be in an I-D, or is it intended to be published by some other
mechanism?

Tom


>
> Thanks,
> Kent
>
>
>
> = original message =
>
> I think the statement is missing a primary purpose of keepalives,
> maybe the most important one, which to maintain flow state in NAT and
> firewalls and prevent eviction by timeout or LRU.
>
> Also, any meaningful discussion or statement about keepalives should
> include considerations on the frequency of keepalives and their cost.
>
> Keepalives themselves carry no meaningful end user data, they are
> purely management overhead. The higher the frequency of keepalives,
> the higher the overhead and hence the more network resources they
> consume. At some point they can become a source of congestion,
> especially when keepalive timers become synchronized across a network
> as I previously pointed out. Unfortunately, there is no standard for
> how NAT state eviction is done and no standard NAT timeout, so the
> frequency of keepalives to prevent NAT state eviction is probably
> higher than it should be (hence more network overhead).
>
> In terms of cost, consider the effects of waking up the transmitter on
> a smart phone periodically just for the purpose of keeping connections
> up. With a high enough frequency this will drain the battery quickly.
> In fact, one of the touted benefits of IPv6 was supposed to be that
> NAT isn't present so there is no need for periodic keepalives to
> maintain NAT state and hence this would conserve power on mobile
> devices. Use of keepalives in power constrained devices is a real
> issue.
>
> Tom
>
> >
>
>
>


Re: statement regarding keepalives

2018-08-15 Thread Kent Watsen

Hi Tom,

I recall you're mentioning NAT before.  It fell into a crack and I
lost sight of it.

You bring up an interesting point, it goes to the motivation for
wanting to do keepalives in the first place.  The text doesn't
yet mention maintain flow state as a motivation.

The first paragraph of the "keepalives" section says:

  When the initiator of a networking session needs to maintain a
  long-lived connection, it is necessary for it to periodically test
  the aliveness of the remote device.

Would it make sense to adjust it to say the following?

  When the initiator of a networking session needs to maintain a
  long-lived connection, it is necessary for it to periodically 
  ensure network accessibility to and test the aliveness of the
  remote device.  For instance, without keepalive, an intermediate
  NAT or firewalls may evict the flow state for quiet connections
  due to a timeout or least recently used policy.  Similarly, the
  remote application process, while accessible, may be hung, thus
  accounting for the reason why the connection is quiet.



Regarding your other comment, that the discussion should "include
considerations on the frequency of keepalives and their cost", it
seems that you almost wrote the paragraph below.  Would you be 
willing to proffer some formal text we could paste in, maybe to
the end of the "keepalives" section or another section?  If not,
I can try to take a stab at it.


Thanks,
Kent



= original message =

I think the statement is missing a primary purpose of keepalives,
maybe the most important one, which to maintain flow state in NAT and
firewalls and prevent eviction by timeout or LRU.

Also, any meaningful discussion or statement about keepalives should
include considerations on the frequency of keepalives and their cost.

Keepalives themselves carry no meaningful end user data, they are
purely management overhead. The higher the frequency of keepalives,
the higher the overhead and hence the more network resources they
consume. At some point they can become a source of congestion,
especially when keepalive timers become synchronized across a network
as I previously pointed out. Unfortunately, there is no standard for
how NAT state eviction is done and no standard NAT timeout, so the
frequency of keepalives to prevent NAT state eviction is probably
higher than it should be (hence more network overhead).

In terms of cost, consider the effects of waking up the transmitter on
a smart phone periodically just for the purpose of keeping connections
up. With a high enough frequency this will drain the battery quickly.
In fact, one of the touted benefits of IPv6 was supposed to be that
NAT isn't present so there is no need for periodic keepalives to
maintain NAT state and hence this would conserve power on mobile
devices. Use of keepalives in power constrained devices is a real
issue.

Tom

>




Re: statement regarding keepalives

2018-07-20 Thread Kent Watsen

> ...but still don't put off people turning on TCP keepalives "because
> the IETF doesn't recommend that", and thus they do nothing at all and
> the problem just persists.

No disagreement with what you and others have written, but note that 
the proposed statement only recommends not using TCP keepalives in
the presence of a crypto layer on top of the TCP-layer.

Perhaps the statement could be refined, something along the lines 
of, in cases when there is a crypto layer, to recommend not using,
or at least relying on, TCP keepalives, *unless* higher-level
keepalives have stopped working.

To be clear, the statement as written, though not stated explicitly,
recommends TCP keepalives, in cases where they make sense.

Kent




Re: statement regarding keepalives

2018-07-20 Thread Joe Touch


> On Jul 20, 2018, at 4:47 AM, Mikael Abrahamsson  wrote:
> 
> So I'd like to see in the text that we recommend to do it as "high up" in the 
> stack as possible, but still don't put off people turning on TCP keepalives 
> "because the IETF doesn't recommend that", and thus they do nothing at all 
> and the problem just persists.

Agreed. Further, I don’t see the problem with having keepalives at all levels 
of the stack - at lower levels, they can be suppressed as long as higher levels 
are performing that function, but it’s still always useful to have them on at 
lower levels as a “backup”.

So I would hesitate to say “do this at the highest level”. My advice would be 
“do this at ALL levels that benefit, and be sure to suppress independent 
keepalives at a given level if there is activity at higher levels that 
suffices”.

Joe

Re: statement regarding keepalives

2018-07-20 Thread Tom Herbert
On Fri, Jul 20, 2018, 7:40 AM Spencer Dawkins at IETF <
spencerdawkins.i...@gmail.com> wrote:

> Hi, Mikael,
>
> On Fri, Jul 20, 2018 at 6:48 AM Mikael Abrahamsson 
> wrote:
>
>>
>> Hi,
>>
>> While I agree with the sentiment here, I have personally been in
>> positions
>> where application programmers were unable to (in a timely manner) modify
>> whatever was running, to implement a keepalive protocol. In that case,
>> turning on TCP keepalives was a very easy thing to do that immediately
>> would yield operational benefits.
>>
>> So I'd like to see in the text that we recommend to do it as "high up" in
>> the stack as possible, but still don't put off people turning on TCP
>> keepalives "because the IETF doesn't recommend that", and thus they do
>> nothing at all and the problem just persists.
>>
>> Also, should we talk about recommendations for what these timers should
>> be? In my experience, it's typically in tens of seconds up to 5-10
>> minutes
>> that makes sense for Internet use. Shorter than that might interrupt the
>> connection prematurely, longer than that causes things to take too long
>> to
>> detect a problem. Of course it's up to the application/environment to
>> choose the best value for each use-case, but some text on this might be
>> worthwhile to have there?
>>
>
> This is exactly the kind of feedback I'd like to reflect in whatever
> guidance we give, in whatever form. Thank you for that.
>
> Other thoughts?
>

Spencer,

When I saw the subject I was wondering if this was going to be the proposal
to deprecate keepalives! With vast numbers of connections performing them,
they at some point start to create congestion. This is especially
problematic if somehow keepalives timers become sychronized. IIRC Google
calendar was brought down at one point a while back because of a keepalive
avalanche. I would assume that keepalives are primarily needed to maintain
NAT state, so if NAT were deprecated (like conceptually promised by  IPv6)
then maybe keepalives could go away.

In any case, I think the text here probably would be good in the draft that
examines current state of keepalives and makes recommendations.

Tom


> Spencer
>
> (And, yes, there's a tension between "why would I tear down a perfectly
> good idle connection because it might not work the next time I try to use
> it? for real" and "OMG, I need to send a message to the other side, but my
> idle connection is now timing out, so I have to set up a new connection,
> secure it, and do whatever else I need to do with the other side before I
> can send a message!!!". That's a good thing to include in our advice)
>


Re: statement regarding keepalives

2018-07-20 Thread Spencer Dawkins at IETF
Hi, Mikael,

On Fri, Jul 20, 2018 at 6:48 AM Mikael Abrahamsson  wrote:

>
> Hi,
>
> While I agree with the sentiment here, I have personally been in positions
> where application programmers were unable to (in a timely manner) modify
> whatever was running, to implement a keepalive protocol. In that case,
> turning on TCP keepalives was a very easy thing to do that immediately
> would yield operational benefits.
>
> So I'd like to see in the text that we recommend to do it as "high up" in
> the stack as possible, but still don't put off people turning on TCP
> keepalives "because the IETF doesn't recommend that", and thus they do
> nothing at all and the problem just persists.
>
> Also, should we talk about recommendations for what these timers should
> be? In my experience, it's typically in tens of seconds up to 5-10 minutes
> that makes sense for Internet use. Shorter than that might interrupt the
> connection prematurely, longer than that causes things to take too long to
> detect a problem. Of course it's up to the application/environment to
> choose the best value for each use-case, but some text on this might be
> worthwhile to have there?
>

This is exactly the kind of feedback I'd like to reflect in whatever
guidance we give, in whatever form. Thank you for that.

Other thoughts?

Spencer

(And, yes, there's a tension between "why would I tear down a perfectly
good idle connection because it might not work the next time I try to use
it? for real" and "OMG, I need to send a message to the other side, but my
idle connection is now timing out, so I have to set up a new connection,
secure it, and do whatever else I need to do with the other side before I
can send a message!!!". That's a good thing to include in our advice)


Re: statement regarding keepalives

2018-07-20 Thread Mikael Abrahamsson



Hi,

While I agree with the sentiment here, I have personally been in positions 
where application programmers were unable to (in a timely manner) modify 
whatever was running, to implement a keepalive protocol. In that case, 
turning on TCP keepalives was a very easy thing to do that immediately 
would yield operational benefits.


So I'd like to see in the text that we recommend to do it as "high up" in 
the stack as possible, but still don't put off people turning on TCP 
keepalives "because the IETF doesn't recommend that", and thus they do 
nothing at all and the problem just persists.


Also, should we talk about recommendations for what these timers should 
be? In my experience, it's typically in tens of seconds up to 5-10 minutes 
that makes sense for Internet use. Shorter than that might interrupt the 
connection prematurely, longer than that causes things to take too long to 
detect a problem. Of course it's up to the application/environment to 
choose the best value for each use-case, but some text on this might be 
worthwhile to have there?


On Fri, 13 Jul 2018, Kent Watsen wrote:



Dear TSVAREA,

The folks working with the BBF asked the NETMOD WG to consider modifying 
draft-ietf-netconf-netconf-client-server to support TCP keepalives [1].  However, it 
is unclear what IETF's position is on the use of keepalives, especially with regards 
to keepalives provided in protocol stacks (e.g.,  over HTTP over TLS 
over TCP).

After some discussion with Transport ADs (Spencer and Mijra) and the TLS ADs 
(Eric and Ben), the following draft statement has been crafted.  Spencer and 
Mijra have requested TSVAREA critique it before, perhaps, developing a 
consensus document around it in TSVWG.

It would be greatly appreciated if folks here could review and provide comments 
on the draft statement below.  The scope of the statement can be increased or 
reduced as deemed appropriate.

[1] https://mailarchive.ietf.org/arch/msg/netconf/MOzcZKp2rSxPVMTGdmmrVInwx2M

Thanks,
Kent (and Mahesh) // NETCONF chairs


= STATEMENT =

When the initiator of a networking session needs to maintain a persistent 
connection [1], it is necessary for it to periodically test the aliveness of 
the remote peer.  In such cases, it is RECOMMENDED that the aliveness check 
happens at the highest protocol layer possible that is most meaningful to the 
application, to maximize the depth of the aliveness check.

E.g., for an HTTPS connection to a simple webserver, HTTP-level keepalives 
would test more aliveness than TLS-level keepalives.  However, for a webserver 
that is accessed via a load-balancer that terminates TLS connections, TLS-level 
aliveness checks may be the most meaningful check that could be performed.

In order to ensure aliveness checks can always occur at the highest protocol layer, it is 
RECOMMENDED that protocol designers always include an aliveness check mechanism in the 
protocol and, for client/server protocols, that the aliveness check can be initiated from 
either peer, as sometimes the "server" is the initiator of the underlying 
networking connection (e.g., RFC 8071).

Some protocol stacks have a secure transport protocol layer (e.g., TLS, SSH, 
DTLS) that sits on top of a cleartext protocol layer (e.g., TCP, UDP).  In such 
cases, it is RECOMMENDED that the aliveness check occurs within protection 
envelope afforded by the secure transport protocol layer.  In such cases, the 
aliveness checks SHOULD NOT occur via the cleartext protocol layer, as an 
adversary can block aliveness check messages in either direction and send fake 
aliveness check messages in either direction.

[1] While reasons may vary for why the initiator of a networking session feels 
compelled to maintain a persistent connection.  If the session is primarily 
quiet, and the use case can cope with the additional latency of starting a new 
connection, it is RECOMMENDED to use short-lived connections, instead of 
maintaining a long-lived persistent connection using aliveness checks.





--
Mikael Abrahamssonemail: swm...@swm.pp.se



Re: statement regarding keepalives

2018-07-19 Thread Spencer Dawkins at IETF
Dear All,

Top-posting ... thank yous to David and Wes for your feedback.

I'd like to report back to the SEC ADs about our discussion tomorrow when
the IESG and IAB meet (immediately after the final meeting session).

If you have any other comments, they will be appreciated at any point in
time, but would be most useful if they arrive before then.

Thanks,

Spencer

On Fri, Jul 13, 2018 at 9:38 AM Black, David  wrote:

> +1 on Wes's comments, especially that "layers of functionality" is better
> than "aliveness" ;-).
>
> Thanks, --David
>
> > -Original Message-
> > From: tsv-area [mailto:tsv-area-boun...@ietf.org] On Behalf Of Wesley
> > Eddy
> > Sent: Thursday, July 12, 2018 9:06 PM
> > To: tsv-area@ietf.org
> > Subject: Re: statement regarding keepalives
> >
> > Hi Kent, I agree with the spirit of the statement / guidance you've
> drafted.
> >
> > You might want to tweak some of the wording, e.g. "test more aliveness"
> > could be "test more layers of functionality" or something like that, but
> > this is just a nit.
> >
> > I think the footnote recommending short-lived connections should be more
> > clear about why that's the recommendation.  What is the risk/danger/etc
> > of longer-lived connections.  That recommendation seems a bit naked as
> > currently described, and actually should probably be more than just a
> > footnote.
> >
> >
> >
> > On 7/12/2018 8:37 PM, Kent Watsen wrote:
> > > Dear TSVAREA,
> > >
> > > The folks working with the BBF asked the NETMOD WG to consider
> > modifying draft-ietf-netconf-netconf-client-server to support TCP
> keepalives
> > [1].  However, it is unclear what IETF's position is on the use of
> keepalives,
> > especially with regards to keepalives provided in protocol stacks (e.g.,
> >  over HTTP over TLS over TCP).
> > >
> > > After some discussion with Transport ADs (Spencer and Mijra) and the
> TLS
> > ADs (Eric and Ben), the following draft statement has been crafted.
> Spencer
> > and Mijra have requested TSVAREA critique it before, perhaps, developing
> a
> > consensus document around it in TSVWG.
> > >
> > > It would be greatly appreciated if folks here could review and provide
> > comments on the draft statement below.  The scope of the statement can
> > be increased or reduced as deemed appropriate.
> > >
> > > [1]
> > https://mailarchive.ietf.org/arch/msg/netconf/MOzcZKp2rSxPVMTGdmmrVI
> > nwx2M
> > >
> > > Thanks,
> > > Kent (and Mahesh) // NETCONF chairs
> > >
> > >
> > > = STATEMENT =
> > >
> > > When the initiator of a networking session needs to maintain a
> persistent
> > connection [1], it is necessary for it to periodically test the
> aliveness of the
> > remote peer.  In such cases, it is RECOMMENDED that the aliveness check
> > happens at the highest protocol layer possible that is most meaningful
> to the
> > application, to maximize the depth of the aliveness check.
> > >
> > > E.g., for an HTTPS connection to a simple webserver, HTTP-level
> keepalives
> > would test more aliveness than TLS-level keepalives.  However, for a
> > webserver that is accessed via a load-balancer that terminates TLS
> > connections, TLS-level aliveness checks may be the most meaningful check
> > that could be performed.
> > >
> > > In order to ensure aliveness checks can always occur at the highest
> protocol
> > layer, it is RECOMMENDED that protocol designers always include an
> > aliveness check mechanism in the protocol and, for client/server
> protocols,
> > that the aliveness check can be initiated from either peer, as sometimes
> the
> > "server" is the initiator of the underlying networking connection (e.g.,
> RFC
> > 8071).
> > >
> > > Some protocol stacks have a secure transport protocol layer (e.g.,
> TLS, SSH,
> > DTLS) that sits on top of a cleartext protocol layer (e.g., TCP, UDP).
> In such
> > cases, it is RECOMMENDED that the aliveness check occurs within
> protection
> > envelope afforded by the secure transport protocol layer.  In such
> cases, the
> > aliveness checks SHOULD NOT occur via the cleartext protocol layer, as an
> > adversary can block aliveness check messages in either direction and send
> > fake aliveness check messages in either direction.
> > >
> > > [1] While reasons may vary for why the initiator of a networking
> session
> > feels compelled to maintain a persistent connection.  If the session is
> > primarily quiet, and the use case can cope with the additional latency of
> > starting a new connection, it is RECOMMENDED to use short-lived
> > connections, instead of maintaining a long-lived persistent connection
> using
> > aliveness checks.
> > >
> > >
>
>


Re: statement regarding keepalives

2018-07-12 Thread Wesley Eddy

Hi Kent, I agree with the spirit of the statement / guidance you've drafted.

You might want to tweak some of the wording, e.g. "test more aliveness" 
could be "test more layers of functionality" or something like that, but 
this is just a nit.


I think the footnote recommending short-lived connections should be more 
clear about why that's the recommendation.  What is the risk/danger/etc 
of longer-lived connections.  That recommendation seems a bit naked as 
currently described, and actually should probably be more than just a 
footnote.




On 7/12/2018 8:37 PM, Kent Watsen wrote:

Dear TSVAREA,

The folks working with the BBF asked the NETMOD WG to consider modifying 
draft-ietf-netconf-netconf-client-server to support TCP keepalives [1].  However, it 
is unclear what IETF's position is on the use of keepalives, especially with regards 
to keepalives provided in protocol stacks (e.g.,  over HTTP over TLS 
over TCP).

After some discussion with Transport ADs (Spencer and Mijra) and the TLS ADs 
(Eric and Ben), the following draft statement has been crafted.  Spencer and 
Mijra have requested TSVAREA critique it before, perhaps, developing a 
consensus document around it in TSVWG.

It would be greatly appreciated if folks here could review and provide comments 
on the draft statement below.  The scope of the statement can be increased or 
reduced as deemed appropriate.

[1] https://mailarchive.ietf.org/arch/msg/netconf/MOzcZKp2rSxPVMTGdmmrVInwx2M

Thanks,
Kent (and Mahesh) // NETCONF chairs


= STATEMENT =

When the initiator of a networking session needs to maintain a persistent 
connection [1], it is necessary for it to periodically test the aliveness of 
the remote peer.  In such cases, it is RECOMMENDED that the aliveness check 
happens at the highest protocol layer possible that is most meaningful to the 
application, to maximize the depth of the aliveness check.

E.g., for an HTTPS connection to a simple webserver, HTTP-level keepalives 
would test more aliveness than TLS-level keepalives.  However, for a webserver 
that is accessed via a load-balancer that terminates TLS connections, TLS-level 
aliveness checks may be the most meaningful check that could be performed.

In order to ensure aliveness checks can always occur at the highest protocol layer, it is 
RECOMMENDED that protocol designers always include an aliveness check mechanism in the 
protocol and, for client/server protocols, that the aliveness check can be initiated from 
either peer, as sometimes the "server" is the initiator of the underlying 
networking connection (e.g., RFC 8071).

Some protocol stacks have a secure transport protocol layer (e.g., TLS, SSH, 
DTLS) that sits on top of a cleartext protocol layer (e.g., TCP, UDP).  In such 
cases, it is RECOMMENDED that the aliveness check occurs within protection 
envelope afforded by the secure transport protocol layer.  In such cases, the 
aliveness checks SHOULD NOT occur via the cleartext protocol layer, as an 
adversary can block aliveness check messages in either direction and send fake 
aliveness check messages in either direction.

[1] While reasons may vary for why the initiator of a networking session feels 
compelled to maintain a persistent connection.  If the session is primarily 
quiet, and the use case can cope with the additional latency of starting a new 
connection, it is RECOMMENDED to use short-lived connections, instead of 
maintaining a long-lived persistent connection using aliveness checks.






statement regarding keepalives

2018-07-12 Thread Kent Watsen

Dear TSVAREA,

The folks working with the BBF asked the NETMOD WG to consider modifying 
draft-ietf-netconf-netconf-client-server to support TCP keepalives [1].  
However, it is unclear what IETF's position is on the use of keepalives, 
especially with regards to keepalives provided in protocol stacks (e.g., 
 over HTTP over TLS over TCP).

After some discussion with Transport ADs (Spencer and Mijra) and the TLS ADs 
(Eric and Ben), the following draft statement has been crafted.  Spencer and 
Mijra have requested TSVAREA critique it before, perhaps, developing a 
consensus document around it in TSVWG.

It would be greatly appreciated if folks here could review and provide comments 
on the draft statement below.  The scope of the statement can be increased or 
reduced as deemed appropriate. 

[1] https://mailarchive.ietf.org/arch/msg/netconf/MOzcZKp2rSxPVMTGdmmrVInwx2M 

Thanks,
Kent (and Mahesh) // NETCONF chairs


= STATEMENT =

When the initiator of a networking session needs to maintain a persistent 
connection [1], it is necessary for it to periodically test the aliveness of 
the remote peer.  In such cases, it is RECOMMENDED that the aliveness check 
happens at the highest protocol layer possible that is most meaningful to the 
application, to maximize the depth of the aliveness check.  

E.g., for an HTTPS connection to a simple webserver, HTTP-level keepalives 
would test more aliveness than TLS-level keepalives.  However, for a webserver 
that is accessed via a load-balancer that terminates TLS connections, TLS-level 
aliveness checks may be the most meaningful check that could be performed.

In order to ensure aliveness checks can always occur at the highest protocol 
layer, it is RECOMMENDED that protocol designers always include an aliveness 
check mechanism in the protocol and, for client/server protocols, that the 
aliveness check can be initiated from either peer, as sometimes the "server" is 
the initiator of the underlying networking connection (e.g., RFC 8071).

Some protocol stacks have a secure transport protocol layer (e.g., TLS, SSH, 
DTLS) that sits on top of a cleartext protocol layer (e.g., TCP, UDP).  In such 
cases, it is RECOMMENDED that the aliveness check occurs within protection 
envelope afforded by the secure transport protocol layer.  In such cases, the 
aliveness checks SHOULD NOT occur via the cleartext protocol layer, as an 
adversary can block aliveness check messages in either direction and send fake 
aliveness check messages in either direction.

[1] While reasons may vary for why the initiator of a networking session feels 
compelled to maintain a persistent connection.  If the session is primarily 
quiet, and the use case can cope with the additional latency of starting a new 
connection, it is RECOMMENDED to use short-lived connections, instead of 
maintaining a long-lived persistent connection using aliveness checks.