Re: [DNSOP] Mirja Kühlewind's Discuss on draft-ietf-dnsop-session-signal-12: (with DISCUSS and COMMENT)

Mirja Kuehlewind (IETF) Fri, 26 Oct 2018 06:36:02 -0700

Hi Ted,

please see below.


> Am 23.10.2018 um 21:51 schrieb Ted Lemon <mel...@fugue.com>:
> 
> On Mon, Oct 15, 2018 at 10:02 AM Mirja Kuehlewind (IETF) 
> <i...@kuehlewind.net> wrote:
> sorry for the delay, however, as you performed a couple of changes it took me 
> a while to re-review. I believe I’m unfortunately not fully ready to release 
> my discuss at this point, but close..
> 
> No worries—it's a busy time.
>  
> Regarding my first discuss point (delayed ACKs aso.) I think the text 
> improved and  I would like to seem my minor wording question (comment 2) 
> below addressed before I finally release the discuss here. However, I still 
> think the extensive discussion as provided in section 9.5 now, does not 
> necessarily belong in this document. Therefore I would rather would have 
> preferred to move this text in a real appendix, or removed it completely and 
> maybe document in an own informational RFC (in tcpm).
> 
> Regarding my second discuss point (keep-alives), the text seems still not 
> quite right yet, or I’m really confused. Please also see also further below 
> (comment 3).
> 
> Anyway here are my comments on the edited/new text in the order they appear 
> in the draft:
> 
> 1) I think the following text in section 3 is not fully correct:
> 
> "Fast Open message: A TCP SYN packet that begins a DSO connection and
>    contains early data ([RFC8446] section 2.3).  Fast Open is only
>    permitted when using TLS encapsulation: a TCP SYN message that does
>    not use TLS encapsulation but contains early data is not permitted.“
> If TLS 0-RTT is used this data will not be carried in the TCP SYN, it will 
> „just“ be send at the same time as the TLS handshake is performed (but after 
> the TCP handshake). Only if TCP Fast Open (TFO) (see RFC7413) is used, data 
> can also be sent in the TCP SYN. I guess you mainly need to fix the reference 
> here, or maybe name both mechanisms separately.
> 
> If you look at the table on p. 18 of RFC8447, it shows early data being sent 
> in the first packet.  What am I missing here?

I guess you mean RFC8446 :-)

The table there on p.18 shows only the TLS handshake, there is a TCP handshake 
before that.

>  
> 2) In section 5.5.1:
>    "With a DSO request message, the TCP implementation waits for the
>    application-layer client software to generate the corresponding DSO
>    response message, which enables the TCP implementation to send a
>    single combined IP packet containing the TCP acknowledgement, the TCP
>    window update, and the application-generated DSO response message.
>    This is more efficient than sending three separate IP packets.“
> 
> The phrasing here is a bit confusing, to me at least. It sounds a bit like 
> there is a special TCP for DSO… maybe the following is a bit better:
>    "With a DSO request message, TCP delayed acknowledge timer will usually
>    make the implementation wait for the
>    application-layer client software to generate the corresponding DSO
>    response message before it sends out an TCP acknowledgment
>    This will generate a 
>    single combined IP packet containing the TCP acknowledgement, the TCP
>    window update, and the application-generated DSO response message and
>    is more efficient than sending three separate IP packets.“
> 
> (Note that the deplayed ack timer can be configured to a very small value as 
> well, and as such it depends on the processing time and the value of the 
> timer if a TCP implementation will wait or not.)
> 
> I think using the passive voice here makes the text harder to follow, but I 
> see what you are saying.   How about this:
> 
> With a bidirectional exchange over TCP, as for example with a DSO request
> message, the operating system TCP implementation waits for the

"the operating system’s TCP implementation will usually wait for"

or even better

„the deplayed acknowledgments timer in TCP will usually wait for"

> application-layer client software to generate the corresponding DSO
> response message.   It can then send a
> single combined packet containing the TCP acknowledgement, the
> TCP window update, and the application-generated DSO response message..
> This is more efficient than sending three separate packets, as would occur if
> the TCP packet containing the DSO request were acknowledged immediately.
> 
> 3) Section 6.5.2
> "For example, a (hypothetical and unrealistic)
>    keepalive interval value of 100 ms would result in a continuous
>    stream of ten messages per second or more, in both directions, to
>    keep the DSO Session alive.  And, in this extreme example, a single
>    packet loss and retransmission over a long path could introduce a
>    momentary pause in the stream of messages of over 200 ms, long enough
>    to cause the server to overzealously abort the connection.“
> I think this example is still not correct (and the changes might made have it 
> worse: how can there be more then 10 messages?) 
> 
> So the point here is that there is a dependency on the RTT. Only if the RTT 
> is smaller than 200ms this can happen, otherwise the connection is closed 
> anyway after two keep-alives. However, if the RTT is much smaller than 100ms 
> and e.g. TLP is used, it would still work even if one packet is lost.
> 
> Remember that keepalives are not synchronous.   That is, if we send a 
> keepalive, we don't wait for the response.   So it's perfectly possible for 
> there to be several keepalives in flight in this situation, if the RTT is 
> >200ms.

Yes, I misunderstood that initially, however btw. why did you decide to design 
it that way?
>  
> In any case, I don’t think this example is actually very helpful. The point 
> is that the keep-alives interval should always be much larger than the RTT to 
> make this work appropriately. However, the point about keeping the network 
> load is, is rather independent to the question of when the mechanism actually 
> breaks. I would recommend to simply remove this example and just say that the 
> interval MUST not be smaller than 10 sec to keep the network load reasonably 
> low.
> 
> However, having read this and the previous section again, I think your 
> implementation of the keep-alives mechanism could also be improved. Usually, 
> there should be two intervals. One defines, how long the connection can be 
> idle before an keeps-live is sent and one that defines when a keeper-lives 
> should be retransmitted if it is deemed to be lost, where the first one just 
> usually be larger than the second one (and both timers should always be 
> larger than the RTT). That would enable faster failure if the connection is 
> actually lost. 
> 
> A possible point of confusion is that these are not TCP keepalive packets.   
> These are DSO messages being sent over the TCP transport.   So it's not 
> possible for a keepalive to be lost.   If we don't get a response to a 
> keepalive during the keepalive interval, this means that the TCP connection 
> has stalled, or that the remote end is no longer reachable.   There is no 
> retransmission.   Is that where the confusion lies, or am I misunderstanding?

Right, I got actually confused here. So you send data frequently and if 
something goes wrong the connection will be closed at sender-side. Hm... it 
seems like if you want to test transport liveness with this (and not 
application liveness), you might maybe rather want use the existing keep-alive 
mechanism in TCP…? Why don’t you just recommend to use that?

>  
> 4) Section 6.6.2.2. (Reconnecting After an Unexplained Connection Drop)
>   "It is also possible for a server to forcibly terminate the
>    connection; in this case the client doesn't know whether the
>    termination was the result of a protocol error or a network outage.
>    The client could determine which of the two is occurring by noticing
>    if a connection is repeatedly dropped by the server; if so, the
>    client can mark the server as not supporting DSO.“
> How often should the client try and in which interval?
> 
> I've added the following text to address this question:
> 
> ### Misbehaving Clients
> 
> A server may determine that a client is not following the protocol correctly. 
>  There may be no
> way for the server to recover the session, in which case the server forcibly 
> terminates the
> connection.  Since the client doesn't know why the connection dropped, it may 
> reconnect
> immediately.  If the server has determined that a client is not following the 
> protocol
> correctly, it may terminate the DSO session as soon as it is established, 
> specifying a long
> retry-delay to prevent the client from immediately reconnecting.
> 
> and
> 
> #### Reconnecting After an Unexplained Connection Drop {#dropreconnect}
> 
> It is also possible for a server to forcibly terminate the connection; in
> this case the client doesn't know whether the termination was the result
> of a protocol error or a network outage.   When the client notices that
> the connection has been dropped, it can attempt to reconnect immediately.
> However, if the connection is dropped again without the client being
> able to successfully do whatever it is trying to do, it should mark the
> server as not supporting DSO.
> 
> These two bits of advice, in combination with the surrounding text, should 
> address the problem you're pointing to.

Yes, I think this is fine now.

>  
> 5) Section 9.2:
>    "In principle, anycast servers could maintain sufficient state that  
>     they can both handle packets in the same TCP connection.“
> Really? I mean in theory yes but has this ever been done in practice? I would 
> think that sharing TCP state is even harder than sharing DSO state.
> 
> I've just deleted this paragraph—I think we were trying to address a 
> hypothetical scenario here and got a little carried away. :)

Good. Thanks!
>  
> Please clarify that TLS 0-RTT can be used without TFO (or TFO can be used 
> without TLS) and I would also recommend to discuss the respective issue 
> separately.
> 
> As Benjamin said, I took out support for TCP Fast Open without TLS 1.3 
> because I didn't think it was practical to address the potential issues with 
> it.

Not sure why you think that/which issues you mean? rfc7413 actually discusses 
all kind of issues extensively.

>   However, in looking back at what I wrote, it's easy to see why this was 
> confusing.   I've substantially tightened up the text about this: all cases 
> where terms like "TCP Fast Open" and "0-RTT" are used now refer to "early 
> data."   The changes are relatively small, but sprinkled over a whole 
> section, so I don't think it's practical to enumerate them here, but they 
> should show up nicely in the diffs.   I believe these changes address your 
> concern, but please let me know if they do not.
> 
> https://www.ietf.org/rfcdiff?url2=draft-ietf-dnsop-session-signal-17

Okay, I believe this is fine now. I guess you could further clarify somewhere 
that „early data“ is always 0-RTT TLS with or _WITHOUT_ TFO.

Thanks!
Mirja

 

> 
> Thanks!

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] Mirja Kühlewind's Discuss on draft-ietf-dnsop-session-signal-12: (with DISCUSS and COMMENT)

Reply via email to