Re: [TLS] Next steps for key share prediction

2024-03-08 Thread David Benjamin
On Thu, Mar 7, 2024 at 6:34 PM Watson Ladd  wrote:

> On Thu, Mar 7, 2024 at 2:56 PM David Benjamin 
> wrote:
> >
> > Hi all,
> >
> > With the excitement about, sometime in the far future, possibly
> transitioning from a hybrid, or to a to-be-developed better PQ algorithm, I
> thought it would be a good time to remind folks that, right now, we have no
> way to effectively transition between PQ-sized KEMs at all.
> >
> > At IETF 118, we discussed draft-davidben-tls-key-share-prediction, which
> aims to address this. For a refresher, here are some links:
> >
> https://davidben.github.io/tls-key-share-prediction/draft-davidben-tls-key-share-prediction.html
> >
> https://datatracker.ietf.org/meeting/118/materials/slides-118-tls-key-share-prediction-00
> > (Apologies, I forgot to cut a draft-01 with some of the outstanding
> changes in the GitHub, so the link above is probably better than draft-00.)
> >
> > If I recall, the outcome from IETF 118 was two-fold:
> >
> > First, we'd clarify in rfc8446bis that the "key_share first" selection
> algorithm is not quite what you want. This was done in
> https://github.com/tlswg/tls13-spec/pull/1331
> >
> > Second, there was some discussion over whether what's in the draft is
> the best way to resolve a hypothetical future transition, or if there was
> another formulation. I followed up with folks briefly offline afterwards,
> but an alternative never came to fruition.
> >
> > Since we don't have another solution yet, I'd suggest we move forward
> with what's in the draft as a starting point. (Or if this email inspires
> folks to come up with a better solution, even better! :-D) In particular,
> whatever the rfc8446bis guidance is, there are still TLS implementations
> out there with the problematic selection algorithm. Concretely, OpenSSL's
> selection algorithm is incompatible with this kind of transition. See
> https://github.com/openssl/openssl/issues/22203
>
> Is that asking whether or not we want adoption? I want adoption.
>

I suppose that would be the next step. :-) I think, last meeting, we were a
little unclear what we wanted the document to be, so I was trying to take
stock first. Though MT prompted me to ponder this a bit more in
https://github.com/davidben/tls-key-share-prediction/issues/5, and now I'm
coming around to the idea that we don't need to do anything special to
account for the "wrong" server behavior. Since RFC8446 already explicitly
said that clients are allowed to not predict their most preferred groups,
we can already reasonably infer that such servers actively believe that all
their groups are comparable in security. OpenSSL, at least, seems to be
taking that position. I... question whether taking that position is wise,
given the ongoing postquantum transition, but so it goes. Hopefully your
TLS server software, if it advertises pluggable cryptography with a PQ use
case, and yet opted for a PQ-incompatible selection criteria, has clearly
documented this so it isn't a surprise to you. ;-)

Between all that, we probably can reasonably say that's the server
operator's responsibility? I'm going to take some time to draft a hopefully
simpler version of the draft that only defines the DNS hint, and just
includes some rough text warning about the implications. Maybe also some
SHOULD level text to call out that servers should be sure their policy is
what they want. Hopefully, in drafting that, it'll be clearer what the
options are. If nothing else, I'm sure writing it will help me crystalize
my own preferences!


> > Given that, I don't see a clear way to avoid some way to separate the
> old behavior (which impacts the existing groups) from the new behavior. The
> draft proposes to do it by keying on the codepoint, and doing our future
> selves a favor by ensuring that the current generation of PQ codepoints are
> ready for this. That's still the best solution I see right now for this
> situation.
> >
> > Thoughts?
>
> I think letting the DNS signal also be an indicator the server
> implements the correct behavior would be a good idea.


I'm afraid DNS is typically unauthenticated. In most TLS deployments, we
have to assume that the attacker has influence over DNS, which makes it
unsuitable for such a signal. Of course, if we end up settling on not
needing a signal, this is moot.

David
___
TLS mailing list
TLS@ietf.org
https://www.ietf.org/mailman/listinfo/tls


Re: [TLS] Time to first byte vs time to last byte

2024-03-08 Thread Kampanakis, Panos
Hi Martin,

I think we are generally in agreement, but I want to push back on the argument 
that the PQ slowdown for a page transferring 72KB is going to be the problem. I 
will try to quantify this below (look for [72KBExample]). 

Btw, if you have any stats on Web content size distribution, I am interested. 
Other than averages, I could not find any data on how Web content size looks 
today.

Note that our paper not bashing TTFB as a metric, we are just saying TTFB is 
more relevant for use-cases that send little data, which is not the case for 
most applications today. Snippet from the Conclusion of the paper 
> Connections that transfer <10-20KB of data will probably be more impacted by 
> the new data-heavy handshakes  
This study picked data sizes based on public data on Web sizes (HTTP Archive) 
and other data for other cloud uses. Of course, if we reached a world where 
most use-cases (Web connections, IoT sensor measurement conns, cloud conns) 
were typically sending <50KB, then the TTFB would become more relevant. I am 
not sure we are there or we will ever be. Even the page you referenced (thx, I 
did not know of it) argues " ~100KiB of HTML/CSS/fonts and ~300-350KiB of JS." 
from 2021. 

[72KBExample] 
I think your 20-25% for a 72KB example page probably came from reading Fig 4b 
which includes an extra RTT due to initcwnd=10. Given that especially for the 
web, CDNs used much higher initcwnds, let's focus on Figure 10. Based on Fig 
10, 50-100KB of data over a PQ connection, the TTLB would be 10-15% slower for 
1Mbps and 200ms RTT. At higher speeds, this percentage is much less (1-1.5% 
based on Fig 9b), but let's focus on the slow link. 

If we consider the same case for handshake, then the PQ handshake slowdown is 
30-35% which definitely looks like a very impactful slowdown. A 10-15% for the 
TTLB is much less, but someone could argue that even that is a significant 
slowdown. Note we are still in a slow link, so even the classical conn 
transferring 72KB is probably suffering. To quantify that I looked at my data 
from these experiments. A classical connection TTLB for 50-100KB of data at 
1Mbps and 200ms RTT and 0% loss was ~1.25s. This is not shown in the paper 
because I only included text about the 10% loss case. 1.25s for a 72KB page to 
start getting rendered on a browser over a classical conn vs 1.25*1.15=1.44s 
for a PQ one. I am not sure any user waiting for 1.25s will close the browser 
at 1.44s. 

Btw, the Google PageSpeed Insights TTFB metric which includes (DNS lookup, 
redirects and more) considers 0.8s - 1.8s as "Needs improvement". In our 
experiments, the handshake time for 1Mbps and 200ms RTT amounted to 436ms and 
576ms for the classical and PQ handshakes respectively. I am not sure the extra 
140ms (30-35% slowdown) for the PQ handshake would even throw the Google 
PageSpeed Insights TTFB metric to the "Needs improvement" category. 



-Original Message-
From: Martin Thomson  
Sent: Thursday, March 7, 2024 10:26 PM
To: Kampanakis, Panos ; David Benjamin 
; Deirdre Connolly ; Rob Sayre 

Cc: TLS@ietf.org; Childs-Klein, Will 
Subject: RE: [EXTERNAL] [TLS] Time to first byte vs time to last byte

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



Hi Panos,

I realize that TTLB might correlate well for some types of web content, but 
it's important to recognize that lots of web content is badly bloated (if you 
can tolerate the invective, this is a pretty good look at the situation, with 
numbers: https://infrequently.org/series/performance-inequality/).

I don't want to call out your employer's properties in particular, but at over 
3M and with relatively few connections, handshakes really don't play much into 
page load performance.  That might be typical, but just being typical doesn't 
mean that it's a case we should be optimizing for.

The 72K page I linked above looks very different.  There, your paper shows a 
20-25% hit on TTLB.  TTFB is likely more affected due to the way congestion 
controllers work and the fact that you never leave slow start.

Cheers,
Martin

On Fri, Mar 8, 2024, at 13:56, Kampanakis, Panos wrote:
> Thx Deirdre for bringing it up.
>
> David,
>
> ACK. I think the overall point of our paper is that application 
> performance is more closely related to PQ TTLB than PQ TTFB/handshake.
>
> Snippet from the paper
>
> *> Google’s PageSpeed Insights [12] uses a set of metrics to measure 
> the user experience and webpage performance. The First Contentful 
> Paint (FCP), Largest Contentful Paint (LCP), First Input Delay (FID), 
> Interaction to Next Paint (INP), Total Blocking Time (TBT), and 
> Cumulative Layout Shift (CLS) metrics include this work’s TTLB along 
> with other client-side, browser application-specific execution delays.
> The PageSpeed Insights TTFB metric measures the total time up to the 
> point the first