Re: [EXTERNAL] Re: [blink-dev] Intent to ship: Cache sharing for extremely-pervasive resources

Patrick Meenan Thu, 30 Oct 2025 12:50:49 -0700

Reaching out to site owners was mostly for a sanity check that the resource
is not expecting to be partitioned for some reason (even though the
payloads are known to be identical). If it helps, we can replace the
reach-out step with a requirement that the responses be "Cache-Control:
public" (and hard-enforce it in the browser by not writing the resource to
cache if it isn't). That is an explicit indicator that the resources are
cacheable in shared upstream caches.


I removed the 2 items from the design doc that were specifically targeted
at direct fingerprinting since that's moot with the 3PC link (as well as
the fingerprinting bits from the validation with resource owners).

On the site-preferencing concern, it doesn't actually preference large
sites but it does preference currently-popular third-party resources (most
of which are provided by large corporations). The benefit is spread across
all of the sites that they are embedded in (funnily enough, most large
sites won't benefit because they don't tend to use third-parties).

Determining the common resources at a local level exposes the same XS Leak
issues as allowing all resources (i.e. your local map tiles will show up in
multiple cache partitions because they all reference your current location
but they can be used to identify your location since they are not globally
common). Instead of using the HTTP Archive to collect the candidates, we
could presumably build a centralized list based on aggregated common
resources that are seen across cache partitions by each user but that feels
like an awful lot of complexity for a very small number of resulting
resources.

On the test results, sorry, I thought I had included the experiment results
in the I2S but it looks like I may not have.

The test was specifically just with the patterns for the Google ads scripts
because we aren't expecting this feature to impact the vitals for the main
page/content since most of the pervasive resources are third-party content
that is usually async already and not critical-path. It's possible some
video or map embeds might trigger LCP in some cases but that's the
exception more than the norm. This is more geared to making those
supporting things work better while maintaining the user experience. Ads
has the kind of instrumentation that we'd need to be able to get visibility
into the success (or failure) of that assumption and to be able to measure
small changes.

The results were stat-sig positive but relatively small. The ad iframes
displayed their content slightly faster and transmitted fewer bytes for
each frame (very low single digit percentages).

The guardrail metrics, including vitals) were all neutral which is what we
were hoping for (improvement without a cost of increased contention).

If you'd feel more comfortable with gathering more data, I wouldn't be
opposed to running the full list at 1% to check the guardrail metrics again
before fully launching. We won't necessarily expect to see positive
movement to justify a launch since the resources are still async but we can
validate that assumption with the full list at least (if that is the only
remaining concern).


On Thu, Oct 30, 2025 at 5:28 PM Rick Byers <[email protected]> wrote:

> Thanks Erik and Patrick, of course that makes sense. Sorry for the naive
> question. My naive reading of the design doc suggested to me that a lot of
> the privacy mitigations were about preventing the cross-site tracking risk.
> Could the design be simplified by removing some of those mitigations? For
> example, the section about reaching out to the resource owners, to what
> extent is that really necessary when all we're trying to mitigate is XS
> leaks? Don't the popularity properties alone mitigate that sufficiently?
>
> What can you share about the magnitude of the performance benefit in
> practice in your experiments? In particular for LCP, since we know
> <https://wpostats.com/> that correlates well with user engagement (and
> against abandonment) and so presumably user value.
>
> The concern about not wanting to further advantage more popular sites over
> less popular ones resonates with me. Part of that argument seems to apply
> broadly to the idea of any LRU cache (especially one with a reuse bias
> which I believe ours has
> <https://www.chromium.org/developers/design-documents/network-stack/disk-cache/#eviction>?).
> But perhaps an important distinction here is that the benefits are
> determined globally vs. on a user-by-user basis? But I think any solution
> that worked on a user-by-user basis would have the XS leak problem, right?
> Perhaps it's worth reflecting on our stance on using crowd-sourced data to
> try to improve the experience for all users while still being fair to sites
> broadly. In general I think this is something Chromium is much more open to
> (where it brings significant user benefit) than other engines. For example,
> our Media Engagement Index <https://developer.chrome.com/blog/autoplay>
> system has some similar properties in terms of using aggregate user
> behaviour to help decide which sites have the power to play audio on page
> load and which don't. I was personally uncertain at the time if the
> complexity would prove to be worth the benefit, but now I'm quite convinced
> it is. Playing audio on load is just something users and developers want in
> a few cases, but not most cases. I wonder if perhaps cross-site caching is
> similar?
>
> Rick
>
> On Thu, Oct 30, 2025 at 9:09 AM Matt Menke <[email protected]> wrote:
>
>> Note that even with Vary: Origin, we still have to load the HTTP request
>> headers from the disk cache to apply the vary header, which leaks timing
>> information, so "Vary: Origin" is not a sufficient security mechanism to
>> prevent that sort of cross-site attack.
>>
>> On Wednesday, October 29, 2025 at 5:08:42 PM UTC-4 Erik Anderson wrote:
>>
>>> My understanding was that there was believed to be a meaningful security
>>> benefit with partitioning the cache. That’s because it would limit a party
>>> from being able to inferr that you’ve visited some other site by measuring
>>> a side effect tied to how quickly a resource loads. That observation could
>>> potentially be made even if that specific adversary doesn’t have any of
>>> their own content loaded on the other site.
>>>
>>>
>>>
>>> Of course, if there is an entity with a resource loaded across both
>>> sites with a 3p cookie *and* they’re willing to share that
>>> info/collude, there’s not much benefit. And even when partitioned, if 3p
>>> cookies are enabled, there are potentially measurable side effects that
>>> differ based on if the resource request had some specific state in a 3p
>>> cookie.
>>>
>>>
>>>
>>> Does that incremental security benefit of partitioning the cache justify
>>> the performance costs when 3p cookies are still enabled? I’m not sure.
>>>
>>>
>>>
>>> Even if partitioning was eliminated, a site could protect themselves a
>>> bit by specifying Vary: Origin, but that probably doesn’t sufficiently
>>> cover iframe scenarios (nor would I expect most sites to hold it right).
>>>
>>>
>>>
>>> *From:* Rick Byers <[email protected]>
>>> *Sent:* Wednesday, October 29, 2025 11:56 AM
>>> *To:* Patrick Meenan <[email protected]>
>>> *Cc:* Mike Taylor <[email protected]>; blink-dev <
>>> [email protected]>
>>> *Subject:* [EXTERNAL] Re: [blink-dev] Intent to ship: Cache sharing for
>>> extremely-pervasive resources
>>>
>>>
>>>
>>> If this is enabled only when 3PCs are enabled, then what are the
>>> tradeoffs of going through all this complexity and governance vs. just
>>> broadly coupling HTTP cache keying behavior to 3PC status in some way? What
>>> can a tracker credibly do with a single-keyed HTTP cache that they cannot
>>> do with 3PCs? Are there also concerns about accidental cross-site resource
>>> sharing which could be mitigated more simply by other means, eg. by scoping
>>> to just to ETag-based caching?
>>>
>>>
>>>
>>> I remember the controversy and some real evidence of harm to users and
>>> businesses in 2020 when we partitioned the HTTP cache, but I was convinced
>>> that we had to accept that harm in order to credibly achieve 3PCD. At the
>>> time I was personally a fan of a proposal like this (even for users without
>>> 3PCs) in order to mitigate the harm. But now it seems to me that if we're
>>> going to start talking about poking holes in that decision, perhaps we
>>> should be doing a larger review of the options in that space with the
>>> knowledge that most Chrome users are likely to continue to have 3PCs
>>> enabled. WDYT?
>>>
>>>
>>>
>>> Thanks,
>>>
>>>    Rick
>>>
>>>
>>>
>>> On Mon, Oct 27, 2025 at 10:27 AM Patrick Meenan <[email protected]>
>>> wrote:
>>>
>>> I don't believe the security/privacy protections actually rely on the
>>> assertions (and it's unlikely those would be public). It's more for
>>> awareness and to make sure they don't accidentally break something with
>>> their app if they were relying on the responses being partitioned by site.
>>>
>>>
>>>
>>> As far as query params go, the browser code already only filters for
>>> requests with no query params so any that do rely on query params won't get
>>> included anyway.
>>>
>>>
>>>
>>> The same goes for cookies. Since the feature is only enabled when
>>> third-party cookies are enabled, adding cookies to these responses or
>>> putting unique content in them won't actually pierce any new boundaries but
>>> it goes against the intent of only using it for public/static resources and
>>> they'd lose the benefit of the shared cache when it gets updated. Same goes
>>> for the fingerprinting risks if the pattern was abused.
>>>
>>>
>>>
>>> On Mon, Oct 27, 2025 at 9:39 AM Mike Taylor <[email protected]>
>>> wrote:
>>>
>>> On 10/22/25 5:48 p.m., Patrick Meenan wrote:
>>>
>>> The candidate list goes down to 20k occurrences in order to catch
>>> resources that were updated mid-crawl and may have multiple entries with
>>> different hashes that add up to 100k+ occurrences. In the candidate list,
>>> without any filtering, the 100k cutoff is around 600, I'd estimate that
>>> well less than 25% of the candidates make it through the filtering for
>>> stable pattern, correct resource type and reliable pattern. First release
>>> will likely be 100-200 and I don't expect it will ever grow above 500.
>>>
>>> Thanks - I see the living document has been updated to mention 500 as a
>>> ceiling.
>>>
>>>
>>>
>>> As far as cadence goes, I expect there will be a lot of activity for the
>>> next few releases as individual patterns are coordinated with the origin
>>> owners but then it will settle down to a much more bursty pattern of
>>> updates every few Chrome releases (likely linked with an origin changing
>>> their application and adding more/different resources). And yes, it is
>>> manual.
>>>
>>> As far as the process goes, resource owners need to actively assert that
>>> their resource is appropriate for the single-keyed cache and that they
>>> would like it included (usually in response to active outreach from us but
>>> we have the external-facing list for owner-initiated contact as well).  The
>>> design doc has the documentation for what it means to be appropriate (and
>>> the doc will be moved to a readme page in the repository next to the actual
>>> list so it's not a hard-to-find Google doc):
>>>
>>> Will there be any kind of public record of this assertion? What happens
>>> if a site starts using query params or sending cookies? Does the person in
>>> charge of manual list curation discover that in the next release? Does that
>>> require a new release (I don't know if this lives in component updater, or
>>> in the binary itself)?
>>>
>>>
>>>
>>> *5. Require resource owner opt-in*
>>> For each URL to be included, reach out to the team/company responsible
>>> for the resource to validate the URL pattern and get assurances that the
>>> pattern will always serve the same content to all sites and not be abused
>>> for tracking (by using unique URLs within the pattern mask as a bit-mask
>>> for fingerprinting). They will also need to validate that the URLs covered
>>> by the pattern will not rely on being able to set cookies over HTTP using a
>>> Set-Cookie HTTP response header because they will not be re-applied
>>> across cache boundaries (the set-cookie is not cached with the resource).
>>>
>>>
>>>
>>> On Wed, Oct 22, 2025 at 5:31 PM Mike Taylor <[email protected]>
>>> wrote:
>>>
>>> On 10/18/25 8:34 a.m., Patrick Meenan wrote:
>>>
>>> Sorry, I missed a step in making the candidate resource list public. I
>>> have moved it to my chromium account and made it public here
>>> <https://docs.google.com/spreadsheets/d/1TgWhdeqKbGm6hLM9WqnnXLn-iiO4Y9HTjDXjVO2aBqI/edit?usp=sharing>.
>>>
>>>
>>>
>>>
>>> Not everything in that list meets all of the criteria - it's just the
>>> first step in the manual curation (same URL served the same content across
>>> > 20k sites in the HTTP Archive dataset).
>>>
>>>
>>>
>>> The manual steps frome there for meeting the criteria are basically:
>>>
>>>
>>>
>>> - Cull the list for scripts, stylesheets and compression dictionaries.
>>>
>>> - Remove any URLs that use query parameters.
>>>
>>> - Exclude any responses that set cookies.
>>>
>>> - Identify URLs that are not manually versioned by site embedders (i.e.
>>> the embedded resource can not get stale). This is either in-place updating
>>> resources or automatically versioned resources.
>>>
>>> - Only include URLs that can reliably target a single resource by
>>> pattern (i.e. ..../<hash>-common.js but not ..../<hash>.js)
>>>
>>> - Get confirmation from the resource owner that the given URL Pattern is
>>> and will continue to be appropriate for the single-keyed cache
>>>
>>> A few questions on list curation:
>>>
>>> Can you clarify how big the list will be? The privacy review at
>>> https://chromestatus.com/feature/5202380930678784?gate=5174931459145728 
>>> mentions
>>> ~500, while the design doc mentions 1000. I see the candidate resource list
>>> starts at ~5000, then presumably manual curation begins to get to one of
>>> those numbers.
>>>
>>> What is the expected list curation/update cadence? Is it actually manual?
>>>
>>> Is there any recourse process for owners of resources that don't want to
>>> be included? Do we have documentation on what it mean to be appropriate for
>>> the single-keyed cache?
>>>
>>> thanks,
>>> Mike
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "blink-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion visit
>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAPq58w6UFSnxxzhGKBnY1BJKiZZeH7BUm7PmcjQm_%2BLjGyrtYg%40mail.gmail.com
>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAPq58w6UFSnxxzhGKBnY1BJKiZZeH7BUm7PmcjQm_%2BLjGyrtYg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "blink-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>>> To view this discussion visit
>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAFUtAY9Nffq00r-xbiu2BO00y%2B_2knAi-zheMs9hrE-dB%2BTZ3w%40mail.gmail.com
>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAFUtAY9Nffq00r-xbiu2BO00y%2B_2knAi-zheMs9hrE-dB%2BTZ3w%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAPq58w5Eh%3DArHTzUjRZC_TVk7r4eRFGUZOFRyPyk8U3orwhkGQ%40mail.gmail.com.

Re: [EXTERNAL] Re: [blink-dev] Intent to ship: Cache sharing for extremely-pervasive resources

Reply via email to