On Thu, Aug 26, 2010 at 12:40:04PM +1000, James A. Donald wrote: > On 2010-08-25 11:04 PM, Richard Salz wrote: > >>Also, note that HSTS is presently specific to HTTP. One could imagine > >>expressing a more generic "STS" policy for an entire site > > > >A really knowledgeable net-head told me the other day that the problem > >with SSL/TLS is that it has too many round-trips. In fact, the RTT costs > >are now more prohibitive than the crypto costs. I was quite surprised to > >hear this; he was stunned to find it out.
It'd help amortize the cost of round-trips if we used HTTP/1.1 pipelining more. Just as we could amortize the cost of public key crypto by making more use of TLS session resumption, including session resumption without server-side state [RFC4507]. And if only end-to-end IPsec with connection latching [RFC5660] had been deployed years ago we could further amortize crypto context setup. We need solutions, but abandoning security isn't really a good solution. > This is inherent in the layering approach - inherent in our current > crypto architecture. The second part is a correct description of the current state of affairs. I don't buy the first part (see below). > To avoid inordinate round trips, crypto has to be compiled into the > application, has to be a source code library and application level > protocol, rather than layers. Authentication and key exchange are generally going to require 1.5 round trips at least, which is to say, really, 2. Yes, Kerberos AP exchanges happen in 1 round trip, but at the cost of requiring a persistent replay cache (and also there's the non-trivial TGS exchanges as well). Replay caches historically have killed performance, though they don't have to[0], but still, there's the need for either a persistent replay cache backing store or a trade-off w.r.t. startup time and clients with slow clocks[0], and even then you need to worry about large (>1s) clock adjustments. So, really, as a rule of thumb, budget 2 round trips for all crypto setup. That leaves us with amortization and piggy-backing as ways to make up for that hefty up-front cost. > Every time you layer one communication protocol on top of another, > you get another round trip. > > When you layer application protocol on ssl on tcp on ip, you get > round trips to set up tcp, and *then* round trips to set up ssl, > *then* round trips to set up the application protocol. See draft-williams-tls-app-sasl-opt-04.txt [1], a variant of false start, which alleviates the latter. See also draft-bmoeller-tls- falsestart-00.txt [2]. Back to layering... If abstractions are leaky, maybe we should consider purposeful abstraction leaking/piercing. There's no reason that we couldn't piggy-back one layer's initial message (and in some cases more) on a lower layer connection setup message exchange -- provide much care is taken in doing so. That's what PROT_READY in the GSS-API is for, that's one use for GSS-API channel binding (see SASL/GS2 [RFC5801] for one example). It's what TLS "false start" proposals are about... draft-williams-tls-app-sasl-opt-04 gets an up to 1.5 round-trip optimization for applications over TLS. We could apply the same principle to TCP... (Shades of the old, failed? transaction TCP [RFC1644] proposal from the mid `90s, I know. Shades also of TCP-AO and other more recent proposals perhaps as well.) But there is a gotcha: the upper layer must be aware of the early message send/delivery semantics. For example, early messages may not have been protected by the lower layer, with protection not confirmed till the lower layer succeeds, which means... for example, that the upper layer must not commit much in the way of resources until the lower layer completes (e.g., so as to avoid DoS attacks). I'm not saying that piercing layers is to be done cavalierly. Rather, that we should consider this approach, carefully. I don't really see better solutions (amortization won't always help). Nico [0] Turns out that there is a way to optimize replay caches greatly, so that an fsync(2) is not needed on every transaction, or even most. This is an optimization that turned out to be quite simple to implement (with much commentary), but took a long time to think through. Writing a test program and then using it to test the implementation's correctness was the lion's share of the implementation work. You can see it here: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/gss_mechs/mech_krb5/krb5/rcache/rc_file.c Diffs: http://src.opensolaris.org/source/diff/onnv/onnv-gate/usr/src/lib/gss_mechs/mech_krb5/krb5/rcache/rc_file.c?r2=%252Fonnv%252Fonnv-gate%252Fusr%252Fsrc%252Flib%252Fgss_mechs%252Fmech_krb5%252Fkrb5%252Frcache%252Frc_file.c%4012192%3Ab9153e7686cf&r1=%252Fonnv%252Fonnv-gate%252Fusr%252Fsrc%252Flib%252Fgss_mechs%252Fmech_krb5%252Fkrb5%252Frcache%252Frc_file.c%407934%3A6aeeafc994de RFE (though IIRC the description is wrong/out of date): http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6794523 [1] http://tools.ietf.org/html/draft-williams-tls-app-sasl-opt-04 [2] https://tools.ietf.org/html/draft-bmoeller-tls-falsestart-00 --------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to majord...@metzdowd.com