Note

This supersedes and replaces the prior proposal in Intent to Experiment:
TCP Socket Pool per-Top-Level-Site
<https://groups.google.com/a/chromium.org/g/blink-dev/c/DStqis1UcXo/m/plJ7NlQAAgAJ?e=48417069>.
Unlike that proposal, this one uses randomization to mitigate attacks
instead of partitioning the socket pool by top-level-site. Like the prior
approach, this new experiment offers a similar probabilistic mitigation to
a new-tab-socket-observation attack. Unlike the prior approach, this new
experiment offers the same (non-complete) mitigation to a
new-iframe-socket-observation attack. Further, this new experiment will be
significantly easier to implement as it does away with the need to track
partitioning information.

Contact emails

[email protected], [email protected], [email protected],
[email protected], [email protected], [email protected],
[email protected]
Explainer/Specification

None

Summary

This experiment takes the fixed per-profile maximum of 256
<https://source.chromium.org/chromium/chromium/src/+/main:net/socket/client_socket_pool_manager.cc;drc=8b81608b6457dfef865f46e509c79dc60fe3c69b;l=35>
and increases it by a randomized amount ranging from 1 to a chosen upper
bound (we will experiment with using 64, 128, and 256) as determined by the
algorithm below:


(1) Define a function NEXT_POOL_STATE(STATE, MIN, MAX, VALUE) with the
constraints MIN < MAX, VALUE in range [MIN, MAX], and STATE being either
‘capped’ or ‘uncapped’. This function returns ‘capped’ or ‘uncapped’ based
on some probability distribution across [MIN, MAX] which might vary
depending on STATE. If VALUE == MIN the return value will always be
‘uncapped’ and if VALUE == MAX the return value will always be ‘capped’.
Although the exact distribution will be experimented on, in general we want
transitions from ‘uncapped’ from ‘capped’ to be likely at the upper end of
the range and transitions from ‘capped’ to ‘uncapped’ to be likely at the
lower end of the range.

(2) Define LOWER_LIMIT as 256.

(3) Define UPPER_LIMIT as 320, 384, or 512 depending on the experiment arm.

(4) Consider a socket pool to have a STATE that is either ‘uncapped’ or
‘capped’, and that starts as ‘uncapped’.

(5) If a socket pool is ‘uncapped’ then it still processes socket releases
as before, but for socket requests:

(5a) Define X as the number of active sockets before the allocation.

(5b) If X > LOWER_LIMIT update STATE to the result of
NEXT_POOL_STATE(STATE, LOWER_LIMIT, UPPER_LIMIT, X)

(5c) If the value of STATE is ‘uncapped’ allocate the socket, otherwise
queue the socket request for later processing when the STATE is ‘uncapped’.

(6) If a socket pool is ‘capped’ then it is queueing all socket requests
for later processing when the STATE is ‘uncapped’, but for socket releases:

(6a) Define Y as the number of active sockets after the release occurs.

(6b) Update STATE to the result of NEXT_POOL_STATE(STATE, LOWER_LIMIT,
UPPER_LIMIT, X).

(6c) Release the socket.


The feasibility of raising the per-profile limit to 512 was already studied
<https://groups.google.com/a/chromium.org/g/blink-dev/c/1r-i4Koc5nM?e=48417069>
and did not yield negative (or positive) results, so there should not be an
issue with raising the limit to random numbers between 257 and 320/384/512.


This new randomized limit will be imposed independently for the WebSocket
pool and the normal (HTTP) socket pool.


Limits on UDP sockets (e.g., HTTP/3), multiplexed streams for a single
socket (e.g., HTTP/2), proxies, and HEv3
<https://datatracker.ietf.org/doc/draft-ietf-happy-happyeyeballs-v3/> will
not be evaluated in this experiment. In the future an experiment following
this approach for them will likely be considered.


The intent is to roll this experiment directly into a full launch if no ill
effects are seen. See the motivation section for more.


Blink component

Blink>Network
<https://issues.chromium.org/issues?q=customfield1222907:%22Blink%3ENetwork%22>

TAG review

https://github.com/w3ctag/design-reviews/issues/1151


Motivation

Having a fixed pool of TCP sockets available to an entire profile allows
attackers to effectively divinate the amount of network requests done by
other tabs/frames, and learn things about them to the extent that any given
site can be profiled. For example, if a site does X network requests if
it’s logged in and Y if it’s logged out, by saturating the TCP socket pool
and watching movement after calling window.open, the state of the other
site can be gleaned. This sort of attack is outlined in more detail here:
https://xsleaks.dev/docs/attacks/timing-attacks/connection-pool/

In order to address this sort of attack, we randomize the points at which a
socket pool is considered to be full and the point at which it subsequently
can be considered sufficiently drained to allow new allocations. We don’t
want sites to be able to detect the point at which the pool becomes full
without triggering a drain, and vice-versa. The probabilistic allocation of
sockets when we are approaching the chosen upper bound and delaying
allocations until the randomized lower bound ensures this, and even makes
it difficult for a site to walk all the way up to a known maximum socket
count on the assumption the final socket use will cause a drain.

Risks

Interoperability and Compatibility

While other user agents may wish to follow the results, we only anticipate
compatibility issues with local machines or remote servers when the amount
of available TCP sockets in the browser fluctuates up (256 -> 320/384/512)
in a way Chrome did not allow before. This will be monitored carefully, and
any experiment yielding significant negative impact on browsing experience
will be terminated early.

Gecko: https://github.com/mozilla/standards-positions/issues/1299; current
global cap of 128-900
<https://github.com/mozilla-firefox/firefox/blob/4bd4e4c595499ee51c2e6f4c9f780fe720f454e8/modules/libpref/init/all.js#L1138>
(as allowed by OS)

WebKit: https://github.com/WebKit/standards-positions/issues/550; current
global cap of 256
<https://github.com/WebKit/WebKit/blob/d323b2fc4cd2686c828bd8976fae6ec2d2b6311c/Source/WebCore/platform/network/soup/SoupNetworkSession.cpp#L104>

Debuggability

This will be gated behind the base::feature
kTcpSocketPoolLimitRandomizationTrial, so if breakage is suspected that
flag could be turned off to detect impact. For how to control feature
flags, see this
<https://source.chromium.org/chromium/chromium/src/+/main:base/feature_list.h;drc=159a65729cf8fca4d9f453d12d97ab6515360491;l=259>
.

Measurement

The existing SOCKET_POOL_STALLED_MAX_SOCKETS event can be tracked to see if
an uptick is noticed.

The existing metric Net.TcpConnectAttempt.Latency.{Result} will be used to
detect increases in overall connection failure rates.

New metrics Net.TCPSocketPoolSize.{UpperBound|
LowerBound}.{Skipped|Enforced} to track usage of the NEXT_POOL_STATE
function.

Will this feature be supported on all six Blink platforms (Windows, Mac,
Linux, ChromeOS, Android, and Android WebView)?

No, not WebView. That will have to be studied independently due to the
differing constraints.

Is this feature fully tested by web-platform-tests?

No, as this is a blink networking focused change browser tests or unit
tests are more likely.

Flag name on about://flags

None

Finch feature name

TcpSocketPoolLimitRandomizationTrial

Rollout plan

We will never test more than 5% in each group on stable, and will stay on
canary/dev/beta for a while to detect issues before testing stable.

Requires code in //chrome?

No

Tracking bug

https://crbug.com/415691664

Estimated milestones

143

Link to entry on the Chrome Platform Status

https://chromestatus.com/feature/6496757559197696

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAGpy5DLoKTEqm_JzGbUbn_FKUQN49dozCcocTfo9KfMoXjdzuQ%40mail.gmail.com.

Reply via email to