Re: [EXTERNAL] Re: [blink-dev] Intent to ship: Cache sharing for extremely-pervasive resources

Patrick Meenan Sun, 09 Nov 2025 07:46:41 -0800

On Sat, Nov 8, 2025 at 1:32 PM Yoav Weiss (@Shopify) <[email protected]>
wrote:


> I'm extremely supportive of this effort, with multiple hats on.
>
> I'd have loved if this wasn't restricted to users with 3P cookies enabled,
> but one can imagine abuse where pervasive resource *patterns* are used, but
> with unique hashes that are not deployed in the wild, and where each such
> URL is used as a cross-origin bit of entropy.
>

Yep, there are 2 risks for explicit tracking (that are effectively moot
when you can track directly anyway). Differing the content of some of the
responses some of the time (maybe for a slightly different URL than the
"current" version that still matches the pattern) and using a broad sample
of not-current URLs across a bunch of origins as a fingerprint. We can make
some of that harder but I couldn't think of any way to completely eliminate
the risk.


> On Sat, Nov 8, 2025 at 7:04 AM Patrick Meenan <[email protected]>
> wrote:
>
>> The list construction should already be completely objective. I changed
>> the manual origin-owner validation to trust and require "cache-control:
>> public" instead. The rest of the criteria
>> <https://docs.google.com/document/d/1xaoF9iSOojrlPrHZaKIJMK4iRZKA3AD6pQvbSy4ueUQ/edit?tab=t.0>
>> should be well-defined and objective. I'm not sure if they can be fully
>> automated yet (though that might just be my pre-AI thinking).
>>
>> The main need for humans in the loop right now is to create the patterns
>> so that they each represent a "single" resource that is stable over time
>> with URL changes (version/hash) and distinguishing those stable files from
>> random hash bundles that aren't stable from release to release. That's
>> fairly easy for a human to do (and get right).
>>
>
> This is something that origins that use compression dictionaries already
> do by themselves - define the "match" pattern that covers the URL's
> semantics. Can we somehow use that for automation where it exists?
>

We can use the match patterns for script and style destinations as an input
when defining the patterns. If the resource URL matches the match pattern
and the match pattern is reasonably long (not /app/*.js) then it's probably
a good pattern (and could be validated across months of HTTP Archive logs).
There are patterns where dictionaries aren't used as strict delta updates
for the same file (i.e. a script with a lot of common code that portions of
which might be in other scripts used on other pages) so I wouldn't want to
use it blindly but it is a very strong possibility.


>
>
>>
>>
>>
>> On Fri, Nov 7, 2025 at 4:47 PM Rick Byers <[email protected]> wrote:
>>
>>> Thanks Pat. I am personally a big fan of things which increase publisher
>>> ad revenue across the web broadly without hurting (or ideally improving)
>>> the user experience, and this seems likely to do exactly that. In
>>> particular I recall all the debate around stale-while-revalidate
>>> <https://web.dev/articles/stale-while-revalidate> and am proud that we
>>> pushed
>>> <https://groups.google.com/a/chromium.org/g/blink-dev/c/rspPrQHfFkI/m/c5j3xJQRDAAJ?e=48417069>
>>> through it with urgency and confirmed it indeed increased publisher ad
>>> revenue across the web
>>> <https://web.dev/case-studies/ads-case-study-stale-while-revalidate>.
>>>
>>> Reading the Mozilla feedback carefully the point that resonates most
>>> with me is the risk of "gatekeeping" and the potential to mitigate that by
>>> establishing objective rules for inclusion. Is it plausible to imagine a
>>> version of this where the list construction would be entirely objective?
>>> What would the tradeoffs be?
>>>
>>> Thanks,
>>>    Rick
>>>
>>>
>>>
>>>
>>> On Thu, Oct 30, 2025 at 3:50 PM Patrick Meenan <[email protected]>
>>> wrote:
>>>
>>>> Reaching out to site owners was mostly for a sanity check that the
>>>> resource is not expecting to be partitioned for some reason (even though
>>>> the payloads are known to be identical). If it helps, we can replace the
>>>> reach-out step with a requirement that the responses be "Cache-Control:
>>>> public" (and hard-enforce it in the browser by not writing the resource to
>>>> cache if it isn't). That is an explicit indicator that the resources are
>>>> cacheable in shared upstream caches.
>>>>
>>>> I removed the 2 items from the design doc that were specifically
>>>> targeted at direct fingerprinting since that's moot with the 3PC link (as
>>>> well as the fingerprinting bits from the validation with resource owners).
>>>>
>>>> On the site-preferencing concern, it doesn't actually preference large
>>>> sites but it does preference currently-popular third-party resources (most
>>>> of which are provided by large corporations). The benefit is spread across
>>>> all of the sites that they are embedded in (funnily enough, most large
>>>> sites won't benefit because they don't tend to use third-parties).
>>>>
>>>> Determining the common resources at a local level exposes the same XS
>>>> Leak issues as allowing all resources (i.e. your local map tiles will show
>>>> up in multiple cache partitions because they all reference your current
>>>> location but they can be used to identify your location since they are not
>>>> globally common). Instead of using the HTTP Archive to collect the
>>>> candidates, we could presumably build a centralized list based on
>>>> aggregated common resources that are seen across cache partitions by each
>>>> user but that feels like an awful lot of complexity for a very small number
>>>> of resulting resources.
>>>>
>>>> On the test results, sorry, I thought I had included the experiment
>>>> results in the I2S but it looks like I may not have.
>>>>
>>>> The test was specifically just with the patterns for the Google ads
>>>> scripts because we aren't expecting this feature to impact the vitals for
>>>> the main page/content since most of the pervasive resources are third-party
>>>> content that is usually async already and not critical-path. It's possible
>>>> some video or map embeds might trigger LCP in some cases but that's the
>>>> exception more than the norm. This is more geared to making those
>>>> supporting things work better while maintaining the user experience. Ads
>>>> has the kind of instrumentation that we'd need to be able to get visibility
>>>> into the success (or failure) of that assumption and to be able to measure
>>>> small changes.
>>>>
>>>> The results were stat-sig positive but relatively small. The ad iframes
>>>> displayed their content slightly faster and transmitted fewer bytes for
>>>> each frame (very low single digit percentages).
>>>>
>>>> The guardrail metrics, including vitals) were all neutral which is what
>>>> we were hoping for (improvement without a cost of increased contention).
>>>>
>>>> If you'd feel more comfortable with gathering more data, I wouldn't be
>>>> opposed to running the full list at 1% to check the guardrail metrics again
>>>> before fully launching. We won't necessarily expect to see positive
>>>> movement to justify a launch since the resources are still async but we can
>>>> validate that assumption with the full list at least (if that is the only
>>>> remaining concern).
>>>>
>>>>
>>>> On Thu, Oct 30, 2025 at 5:28 PM Rick Byers <[email protected]> wrote:
>>>>
>>>>> Thanks Erik and Patrick, of course that makes sense. Sorry for the
>>>>> naive question. My naive reading of the design doc suggested to me that a
>>>>> lot of the privacy mitigations were about preventing the cross-site
>>>>> tracking risk. Could the design be simplified by removing some of those
>>>>> mitigations? For example, the section about reaching out to the resource
>>>>> owners, to what extent is that really necessary when all we're trying to
>>>>> mitigate is XS leaks? Don't the popularity properties alone mitigate that
>>>>> sufficiently?
>>>>>
>>>>> What can you share about the magnitude of the performance benefit in
>>>>> practice in your experiments? In particular for LCP, since we know
>>>>> <https://wpostats.com/> that correlates well with user engagement
>>>>> (and against abandonment) and so presumably user value.
>>>>>
>>>>> The concern about not wanting to further advantage more popular sites
>>>>> over less popular ones resonates with me. Part of that argument seems to
>>>>> apply broadly to the idea of any LRU cache (especially one with a reuse
>>>>> bias which I believe ours has
>>>>> <https://www.chromium.org/developers/design-documents/network-stack/disk-cache/#eviction>?).
>>>>> But perhaps an important distinction here is that the benefits are
>>>>> determined globally vs. on a user-by-user basis? But I think any solution
>>>>> that worked on a user-by-user basis would have the XS leak problem, right?
>>>>> Perhaps it's worth reflecting on our stance on using crowd-sourced data to
>>>>> try to improve the experience for all users while still being fair to 
>>>>> sites
>>>>> broadly. In general I think this is something Chromium is much more open 
>>>>> to
>>>>> (where it brings significant user benefit) than other engines. For 
>>>>> example,
>>>>> our Media Engagement Index
>>>>> <https://developer.chrome.com/blog/autoplay> system has some similar
>>>>> properties in terms of using aggregate user behaviour to help decide which
>>>>> sites have the power to play audio on page load and which don't. I was
>>>>> personally uncertain at the time if the complexity would prove to be worth
>>>>> the benefit, but now I'm quite convinced it is. Playing audio on load is
>>>>> just something users and developers want in a few cases, but not most
>>>>> cases. I wonder if perhaps cross-site caching is similar?
>>>>>
>>>>> Rick
>>>>>
>>>>> On Thu, Oct 30, 2025 at 9:09 AM Matt Menke <[email protected]> wrote:
>>>>>
>>>>>> Note that even with Vary: Origin, we still have to load the HTTP
>>>>>> request headers from the disk cache to apply the vary header, which leaks
>>>>>> timing information, so "Vary: Origin" is not a sufficient security
>>>>>> mechanism to prevent that sort of cross-site attack.
>>>>>>
>>>>>> On Wednesday, October 29, 2025 at 5:08:42 PM UTC-4 Erik Anderson
>>>>>> wrote:
>>>>>>
>>>>>>> My understanding was that there was believed to be a meaningful
>>>>>>> security benefit with partitioning the cache. That’s because it would 
>>>>>>> limit
>>>>>>> a party from being able to inferr that you’ve visited some other site by
>>>>>>> measuring a side effect tied to how quickly a resource loads. That
>>>>>>> observation could potentially be made even if that specific adversary
>>>>>>> doesn’t have any of their own content loaded on the other site.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Of course, if there is an entity with a resource loaded across both
>>>>>>> sites with a 3p cookie *and* they’re willing to share that
>>>>>>> info/collude, there’s not much benefit. And even when partitioned, if 3p
>>>>>>> cookies are enabled, there are potentially measurable side effects that
>>>>>>> differ based on if the resource request had some specific state in a 3p
>>>>>>> cookie.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Does that incremental security benefit of partitioning the cache
>>>>>>> justify the performance costs when 3p cookies are still enabled? I’m not
>>>>>>> sure.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Even if partitioning was eliminated, a site could protect themselves
>>>>>>> a bit by specifying Vary: Origin, but that probably doesn’t
>>>>>>> sufficiently cover iframe scenarios (nor would I expect most sites to 
>>>>>>> hold
>>>>>>> it right).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From:* Rick Byers <[email protected]>
>>>>>>> *Sent:* Wednesday, October 29, 2025 11:56 AM
>>>>>>> *To:* Patrick Meenan <[email protected]>
>>>>>>> *Cc:* Mike Taylor <[email protected]>; blink-dev <
>>>>>>> [email protected]>
>>>>>>> *Subject:* [EXTERNAL] Re: [blink-dev] Intent to ship: Cache sharing
>>>>>>> for extremely-pervasive resources
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> If this is enabled only when 3PCs are enabled, then what are the
>>>>>>> tradeoffs of going through all this complexity and governance vs. just
>>>>>>> broadly coupling HTTP cache keying behavior to 3PC status in some way? 
>>>>>>> What
>>>>>>> can a tracker credibly do with a single-keyed HTTP cache that they 
>>>>>>> cannot
>>>>>>> do with 3PCs? Are there also concerns about accidental cross-site 
>>>>>>> resource
>>>>>>> sharing which could be mitigated more simply by other means, eg. by 
>>>>>>> scoping
>>>>>>> to just to ETag-based caching?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I remember the controversy and some real evidence of harm to users
>>>>>>> and businesses in 2020 when we partitioned the HTTP cache, but I was
>>>>>>> convinced that we had to accept that harm in order to credibly achieve
>>>>>>> 3PCD. At the time I was personally a fan of a proposal like this (even 
>>>>>>> for
>>>>>>> users without 3PCs) in order to mitigate the harm. But now it seems to 
>>>>>>> me
>>>>>>> that if we're going to start talking about poking holes in that 
>>>>>>> decision,
>>>>>>> perhaps we should be doing a larger review of the options in that space
>>>>>>> with the knowledge that most Chrome users are likely to continue to
>>>>>>> have 3PCs enabled. WDYT?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>    Rick
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 27, 2025 at 10:27 AM Patrick Meenan <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>> I don't believe the security/privacy protections actually rely on
>>>>>>> the assertions (and it's unlikely those would be public). It's more for
>>>>>>> awareness and to make sure they don't accidentally break something with
>>>>>>> their app if they were relying on the responses being partitioned by 
>>>>>>> site.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> As far as query params go, the browser code already only filters for
>>>>>>> requests with no query params so any that do rely on query params won't 
>>>>>>> get
>>>>>>> included anyway.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The same goes for cookies. Since the feature is only enabled when
>>>>>>> third-party cookies are enabled, adding cookies to these responses or
>>>>>>> putting unique content in them won't actually pierce any new boundaries 
>>>>>>> but
>>>>>>> it goes against the intent of only using it for public/static resources 
>>>>>>> and
>>>>>>> they'd lose the benefit of the shared cache when it gets updated. Same 
>>>>>>> goes
>>>>>>> for the fingerprinting risks if the pattern was abused.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 27, 2025 at 9:39 AM Mike Taylor <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>> On 10/22/25 5:48 p.m., Patrick Meenan wrote:
>>>>>>>
>>>>>>> The candidate list goes down to 20k occurrences in order to catch
>>>>>>> resources that were updated mid-crawl and may have multiple entries with
>>>>>>> different hashes that add up to 100k+ occurrences. In the candidate 
>>>>>>> list,
>>>>>>> without any filtering, the 100k cutoff is around 600, I'd estimate that
>>>>>>> well less than 25% of the candidates make it through the filtering for
>>>>>>> stable pattern, correct resource type and reliable pattern. First 
>>>>>>> release
>>>>>>> will likely be 100-200 and I don't expect it will ever grow above 500.
>>>>>>>
>>>>>>> Thanks - I see the living document has been updated to mention 500
>>>>>>> as a ceiling.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> As far as cadence goes, I expect there will be a lot of activity for
>>>>>>> the next few releases as individual patterns are coordinated with the
>>>>>>> origin owners but then it will settle down to a much more bursty 
>>>>>>> pattern of
>>>>>>> updates every few Chrome releases (likely linked with an origin changing
>>>>>>> their application and adding more/different resources). And yes, it is
>>>>>>> manual.
>>>>>>>
>>>>>>> As far as the process goes, resource owners need to actively assert
>>>>>>> that their resource is appropriate for the single-keyed cache and that 
>>>>>>> they
>>>>>>> would like it included (usually in response to active outreach from us 
>>>>>>> but
>>>>>>> we have the external-facing list for owner-initiated contact as well).  
>>>>>>> The
>>>>>>> design doc has the documentation for what it means to be appropriate 
>>>>>>> (and
>>>>>>> the doc will be moved to a readme page in the repository next to the 
>>>>>>> actual
>>>>>>> list so it's not a hard-to-find Google doc):
>>>>>>>
>>>>>>> Will there be any kind of public record of this assertion? What
>>>>>>> happens if a site starts using query params or sending cookies? Does the
>>>>>>> person in charge of manual list curation discover that in the next 
>>>>>>> release?
>>>>>>> Does that require a new release (I don't know if this lives in component
>>>>>>> updater, or in the binary itself)?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *5. Require resource owner opt-in*
>>>>>>> For each URL to be included, reach out to the team/company
>>>>>>> responsible for the resource to validate the URL pattern and get 
>>>>>>> assurances
>>>>>>> that the pattern will always serve the same content to all sites and 
>>>>>>> not be
>>>>>>> abused for tracking (by using unique URLs within the pattern mask as a
>>>>>>> bit-mask for fingerprinting). They will also need to validate that the 
>>>>>>> URLs
>>>>>>> covered by the pattern will not rely on being able to set cookies over 
>>>>>>> HTTP
>>>>>>> using a Set-Cookie HTTP response header because they will not be
>>>>>>> re-applied across cache boundaries (the set-cookie is not cached with 
>>>>>>> the
>>>>>>> resource).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 22, 2025 at 5:31 PM Mike Taylor <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>> On 10/18/25 8:34 a.m., Patrick Meenan wrote:
>>>>>>>
>>>>>>> Sorry, I missed a step in making the candidate resource list public.
>>>>>>> I have moved it to my chromium account and made it public here
>>>>>>> <https://docs.google.com/spreadsheets/d/1TgWhdeqKbGm6hLM9WqnnXLn-iiO4Y9HTjDXjVO2aBqI/edit?usp=sharing>.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Not everything in that list meets all of the criteria - it's just
>>>>>>> the first step in the manual curation (same URL served the same content
>>>>>>> across > 20k sites in the HTTP Archive dataset).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The manual steps frome there for meeting the criteria are basically:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> - Cull the list for scripts, stylesheets and compression
>>>>>>> dictionaries.
>>>>>>>
>>>>>>> - Remove any URLs that use query parameters.
>>>>>>>
>>>>>>> - Exclude any responses that set cookies.
>>>>>>>
>>>>>>> - Identify URLs that are not manually versioned by site embedders
>>>>>>> (i.e. the embedded resource can not get stale). This is either in-place
>>>>>>> updating resources or automatically versioned resources.
>>>>>>>
>>>>>>> - Only include URLs that can reliably target a single resource by
>>>>>>> pattern (i.e. ..../<hash>-common.js but not ..../<hash>.js)
>>>>>>>
>>>>>>> - Get confirmation from the resource owner that the given URL
>>>>>>> Pattern is and will continue to be appropriate for the single-keyed 
>>>>>>> cache
>>>>>>>
>>>>>>> A few questions on list curation:
>>>>>>>
>>>>>>> Can you clarify how big the list will be? The privacy review at
>>>>>>> https://chromestatus.com/feature/5202380930678784?gate=5174931459145728 
>>>>>>> mentions
>>>>>>> ~500, while the design doc mentions 1000. I see the candidate resource 
>>>>>>> list
>>>>>>> starts at ~5000, then presumably manual curation begins to get to one of
>>>>>>> those numbers.
>>>>>>>
>>>>>>> What is the expected list curation/update cadence? Is it actually
>>>>>>> manual?
>>>>>>>
>>>>>>> Is there any recourse process for owners of resources that don't
>>>>>>> want to be included? Do we have documentation on what it mean to be
>>>>>>> appropriate for the single-keyed cache?
>>>>>>>
>>>>>>> thanks,
>>>>>>> Mike
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "blink-dev" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To view this discussion visit
>>>>>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAPq58w6UFSnxxzhGKBnY1BJKiZZeH7BUm7PmcjQm_%2BLjGyrtYg%40mail.gmail.com
>>>>>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAPq58w6UFSnxxzhGKBnY1BJKiZZeH7BUm7PmcjQm_%2BLjGyrtYg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "blink-dev" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>>
>>>>>>> To view this discussion visit
>>>>>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAFUtAY9Nffq00r-xbiu2BO00y%2B_2knAi-zheMs9hrE-dB%2BTZ3w%40mail.gmail.com
>>>>>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAFUtAY9Nffq00r-xbiu2BO00y%2B_2knAi-zheMs9hrE-dB%2BTZ3w%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>> --
>> You received this message because you are subscribed to the Google Groups
>> "blink-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion visit
>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAPq58w4ceQ4Df%2BzFCYwFM5MSAh4APVXtCHj9Q7o5CP_B%3DKs1kA%40mail.gmail.com
>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAPq58w4ceQ4Df%2BzFCYwFM5MSAh4APVXtCHj9Q7o5CP_B%3DKs1kA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAPq58w5R56xfGBsnOknw1Ha0ns%2BQW%2BQhtvPkR0aqHZAmnhiOOg%40mail.gmail.com.

Re: [EXTERNAL] Re: [blink-dev] Intent to ship: Cache sharing for extremely-pervasive resources

Reply via email to