> On Jul 13, 2016, at 4:21 PM, Михаил Голубев <[email protected]> wrote:
> 
> Right, sorry, that initial question wasn't clear about that. 
> 
> We need the latest versions only for installed packages. Nonetheless, as you 
> noted, it's still several dozens consecutive requests to 
> "/simple/<package_name>" for each PyCharm session of every user. 
> 
> Can you handle that?


The short answer is yes.

The longer answer is, that we have Fastly acting as a CDN in front of PyPI and 
serving an item out of the cache in Fastly is essentially free for us in terms 
of resources (obviously Fastly needs to handle that load, but they’re well 
equipped to handle much larger loads than we are). Thus, the more cacheable 
(and the longer lived a particular cache item can be) the easier it is for us 
to scale a particular URL on PyPI.

The url you’re currently using has a view downsides that prevent it from being 
able to be cached effectively:

* The URL is a “UI” URL, so it includes information like current logged in user 
and thus we need to Vary: Cookie which means it’s less likely to be cached at 
all since each unique cookie header adds another response to be cached for that 
URL, and Fastly will only save ~200 responses per URL before it starts to evict 
some.

* Similarly to above, since it’s a “UI” URL people expect it to update fairly 
quickly, because legacy PyPI wasn’t implemented with long lived caching with 
purging on updates in mind, it was easier to simply implement it with a short 
(5 minute IIRC) TTL on the cached object rather than long lived TTLs with 
purging (as we do in the “API” urls).

* Responses that act as collections of projects need to be invalidated anytime 
something changes that may invalidate that collection. In an API that lists 
every project and the latest version, that means it needs to be invalidated 
anytime something releases a new version.

Compare that to looking at /simple/ and then either accessing /simple/<foo>/ or 
/pypi/<foo>/json (all of which are cached for long periods of time and purged 
on demand).

* None of those are “UI” URLs, so they have long cache times and they do not 
Vary on Cookie.

* For /simple/ we don’t list any versions we only list projects themselves. 
This means that we only need to invalidate this page whenever a brand new 
project is added to PyPI or an existing project is completely deleted. This 
occurs far less than someone releasing an existing project.

* For /simple/ we don’t need to do any particularly heavy duty querying, it’s a 
simple select on an ~80k length table (versus a select on an 80k length table, 
with a join to a 500k length table) and is fairly quick to render.

* For /simple/<foo>/ and /pypi/<foo>/json these are scoped to an individual 
project, so they can be cached for a very long time and only invalidated when 
that particular project releases, not when _any_ project releases. This means 
that the likelihood we can serve one of these out of cache is VERY high.

* For /simple/<foo>/ and /pypi/<foo>/json our SQL queries are relatively quick 
because they don’t need to operate over the entire table, but only over the 
records for one single project.

Given all of the above, and the fact that listing every project and their 
latest version is *slow* and resource intensive, yes it’s very likely that 
doing that will be far better for our ability to serve your requests, because 
the extra requests will almost certainly be able to be served straight from the 
Fastly caches and never hit our origin servers at all.

—
Donald Stufft



_______________________________________________
Distutils-SIG maillist  -  [email protected]
https://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to