Re: [TLS] Next steps for key share prediction
On Thu, Mar 7, 2024 at 6:34 PM Watson Ladd wrote: > On Thu, Mar 7, 2024 at 2:56 PM David Benjamin > wrote: > > > > Hi all, > > > > With the excitement about, sometime in the far future, possibly > transitioning from a hybrid, or to a to-be-developed better PQ algorithm, I > thought it would be a good time to remind folks that, right now, we have no > way to effectively transition between PQ-sized KEMs at all. > > > > At IETF 118, we discussed draft-davidben-tls-key-share-prediction, which > aims to address this. For a refresher, here are some links: > > > https://davidben.github.io/tls-key-share-prediction/draft-davidben-tls-key-share-prediction.html > > > https://datatracker.ietf.org/meeting/118/materials/slides-118-tls-key-share-prediction-00 > > (Apologies, I forgot to cut a draft-01 with some of the outstanding > changes in the GitHub, so the link above is probably better than draft-00.) > > > > If I recall, the outcome from IETF 118 was two-fold: > > > > First, we'd clarify in rfc8446bis that the "key_share first" selection > algorithm is not quite what you want. This was done in > https://github.com/tlswg/tls13-spec/pull/1331 > > > > Second, there was some discussion over whether what's in the draft is > the best way to resolve a hypothetical future transition, or if there was > another formulation. I followed up with folks briefly offline afterwards, > but an alternative never came to fruition. > > > > Since we don't have another solution yet, I'd suggest we move forward > with what's in the draft as a starting point. (Or if this email inspires > folks to come up with a better solution, even better! :-D) In particular, > whatever the rfc8446bis guidance is, there are still TLS implementations > out there with the problematic selection algorithm. Concretely, OpenSSL's > selection algorithm is incompatible with this kind of transition. See > https://github.com/openssl/openssl/issues/22203 > > Is that asking whether or not we want adoption? I want adoption. > I suppose that would be the next step. :-) I think, last meeting, we were a little unclear what we wanted the document to be, so I was trying to take stock first. Though MT prompted me to ponder this a bit more in https://github.com/davidben/tls-key-share-prediction/issues/5, and now I'm coming around to the idea that we don't need to do anything special to account for the "wrong" server behavior. Since RFC8446 already explicitly said that clients are allowed to not predict their most preferred groups, we can already reasonably infer that such servers actively believe that all their groups are comparable in security. OpenSSL, at least, seems to be taking that position. I... question whether taking that position is wise, given the ongoing postquantum transition, but so it goes. Hopefully your TLS server software, if it advertises pluggable cryptography with a PQ use case, and yet opted for a PQ-incompatible selection criteria, has clearly documented this so it isn't a surprise to you. ;-) Between all that, we probably can reasonably say that's the server operator's responsibility? I'm going to take some time to draft a hopefully simpler version of the draft that only defines the DNS hint, and just includes some rough text warning about the implications. Maybe also some SHOULD level text to call out that servers should be sure their policy is what they want. Hopefully, in drafting that, it'll be clearer what the options are. If nothing else, I'm sure writing it will help me crystalize my own preferences! > > Given that, I don't see a clear way to avoid some way to separate the > old behavior (which impacts the existing groups) from the new behavior. The > draft proposes to do it by keying on the codepoint, and doing our future > selves a favor by ensuring that the current generation of PQ codepoints are > ready for this. That's still the best solution I see right now for this > situation. > > > > Thoughts? > > I think letting the DNS signal also be an indicator the server > implements the correct behavior would be a good idea. I'm afraid DNS is typically unauthenticated. In most TLS deployments, we have to assume that the attacker has influence over DNS, which makes it unsuitable for such a signal. Of course, if we end up settling on not needing a signal, this is moot. David ___ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls
Re: [TLS] Time to first byte vs time to last byte
Hi Martin, I think we are generally in agreement, but I want to push back on the argument that the PQ slowdown for a page transferring 72KB is going to be the problem. I will try to quantify this below (look for [72KBExample]). Btw, if you have any stats on Web content size distribution, I am interested. Other than averages, I could not find any data on how Web content size looks today. Note that our paper not bashing TTFB as a metric, we are just saying TTFB is more relevant for use-cases that send little data, which is not the case for most applications today. Snippet from the Conclusion of the paper > Connections that transfer <10-20KB of data will probably be more impacted by > the new data-heavy handshakes This study picked data sizes based on public data on Web sizes (HTTP Archive) and other data for other cloud uses. Of course, if we reached a world where most use-cases (Web connections, IoT sensor measurement conns, cloud conns) were typically sending <50KB, then the TTFB would become more relevant. I am not sure we are there or we will ever be. Even the page you referenced (thx, I did not know of it) argues " ~100KiB of HTML/CSS/fonts and ~300-350KiB of JS." from 2021. [72KBExample] I think your 20-25% for a 72KB example page probably came from reading Fig 4b which includes an extra RTT due to initcwnd=10. Given that especially for the web, CDNs used much higher initcwnds, let's focus on Figure 10. Based on Fig 10, 50-100KB of data over a PQ connection, the TTLB would be 10-15% slower for 1Mbps and 200ms RTT. At higher speeds, this percentage is much less (1-1.5% based on Fig 9b), but let's focus on the slow link. If we consider the same case for handshake, then the PQ handshake slowdown is 30-35% which definitely looks like a very impactful slowdown. A 10-15% for the TTLB is much less, but someone could argue that even that is a significant slowdown. Note we are still in a slow link, so even the classical conn transferring 72KB is probably suffering. To quantify that I looked at my data from these experiments. A classical connection TTLB for 50-100KB of data at 1Mbps and 200ms RTT and 0% loss was ~1.25s. This is not shown in the paper because I only included text about the 10% loss case. 1.25s for a 72KB page to start getting rendered on a browser over a classical conn vs 1.25*1.15=1.44s for a PQ one. I am not sure any user waiting for 1.25s will close the browser at 1.44s. Btw, the Google PageSpeed Insights TTFB metric which includes (DNS lookup, redirects and more) considers 0.8s - 1.8s as "Needs improvement". In our experiments, the handshake time for 1Mbps and 200ms RTT amounted to 436ms and 576ms for the classical and PQ handshakes respectively. I am not sure the extra 140ms (30-35% slowdown) for the PQ handshake would even throw the Google PageSpeed Insights TTFB metric to the "Needs improvement" category. -Original Message- From: Martin Thomson Sent: Thursday, March 7, 2024 10:26 PM To: Kampanakis, Panos ; David Benjamin ; Deirdre Connolly ; Rob Sayre Cc: TLS@ietf.org; Childs-Klein, Will Subject: RE: [EXTERNAL] [TLS] Time to first byte vs time to last byte CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Panos, I realize that TTLB might correlate well for some types of web content, but it's important to recognize that lots of web content is badly bloated (if you can tolerate the invective, this is a pretty good look at the situation, with numbers: https://infrequently.org/series/performance-inequality/). I don't want to call out your employer's properties in particular, but at over 3M and with relatively few connections, handshakes really don't play much into page load performance. That might be typical, but just being typical doesn't mean that it's a case we should be optimizing for. The 72K page I linked above looks very different. There, your paper shows a 20-25% hit on TTLB. TTFB is likely more affected due to the way congestion controllers work and the fact that you never leave slow start. Cheers, Martin On Fri, Mar 8, 2024, at 13:56, Kampanakis, Panos wrote: > Thx Deirdre for bringing it up. > > David, > > ACK. I think the overall point of our paper is that application > performance is more closely related to PQ TTLB than PQ TTFB/handshake. > > Snippet from the paper > > *> Google’s PageSpeed Insights [12] uses a set of metrics to measure > the user experience and webpage performance. The First Contentful > Paint (FCP), Largest Contentful Paint (LCP), First Input Delay (FID), > Interaction to Next Paint (INP), Total Blocking Time (TBT), and > Cumulative Layout Shift (CLS) metrics include this work’s TTLB along > with other client-side, browser application-specific execution delays. > The PageSpeed Insights TTFB metric measures the total time up to the > point the first