On November 11, 2015 at 1:30:57 AM, Thomas Güttler 
([email protected]) wrote:
>  
> Maybe I am missing something, but still think server side dependency 
> resolution is possible.  
>  

I don’t believe it’s possible nor desirable to have the server handle 
dependency resolution, at least not without removing some currently supported 
features and locking out some future features from ever happening.

Currently pip can be configured with multiple repository locations that it will 
use when resolving dependencies. By default this only includes PyPI but people 
can either remove that, or add additional repository locations. In order to 
support this we need a resolver that can union multiple repositories together 
before doing the resolving. If the repository itself was the one handling the 
resolution than we are locked into a single repository per invocation of pip.

Additionally, pip can also be configured to use a simple directory full of 
files as a repository. Since this is just a simple directory, there *is* no 
server process running that would allow for a server side resolver to happen 
and pip either *must* handle the resolution itself in this case or it must 
disallow these feature all together.

Additionally, the fact that we currently treat the server as a “dumb” server, 
means that someone can implement a PEP 503 compatible repository very trivially 
with pretty much any web server that supports static files and automatically 
generating an index for static files. Switching to server side resolution would 
require removing this capability and force everyone to run a dedicated 
repository software that can handle that resolution.

Additionally, we want there to be as little variance in the requests that 
people make to the repository as possible. We utilize a caching CDN layer which 
handles > 80% of the total traffic to PyPI which is the primary reason we’ve 
been able to scale to handling 5TB and ~50 million requests a day with a 
skeleton crew of people. If we move to a server side dependency resolution than 
we reduce our ability to ensure that as many requests as possible are served 
directly out of the cache rather than having to be go back to our backend 
servers.

Finally, we want to move further away from trusting the actual repository where 
we can. In the future we’ll be allowing package signing that will make it 
possible to survive a compromise of the repository. However there is no way to 
do that if the repository needs to be able to dynamically generate a list of 
packages that need to be installed as part of a resolution process because by 
definition that needs to be done on the fly and thus must be signed by a key 
that the repository has access too if it’s signed at all. However, since the 
metadata for a package can be signed once and then it never changes, that can 
be signed by a human when they are uploading to PyPI and than pip can verify 
the signature on that metadata before feeding it into the resolver. This would 
allow us to treat PyPI as just an untrusted middleman instead of something that 
is essentially going to be allowed to force us to execute arbitrary code 
whenever someone does a pip install (because it’ll be able to instruct us to 
install any package, and packages can contain arbitrary code).

Hopefully that answers your question about why it’s unlikely that we’ll ever 
move to a server side dependency resolver because even though it is possible to 
do so, doing it would severely regress a number of very important features.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA


_______________________________________________
Distutils-SIG maillist  -  [email protected]
https://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to