Re: [Distutils] Serverside Dependency Resolution and Virtualenv Build Server

Thomas Güttler Thu, 12 Nov 2015 00:56:20 -0800

Am 11.11.2015 um 13:59 schrieb Donald Stufft:

On November 11, 2015 at 1:30:57 AM, Thomas Güttler 
([email protected]) wrote:


Maybe I am missing something, but still think server side dependency resolution 
is possible.


I don’t believe it’s possible nor desirable to have the server handle 
dependency resolution, at least not without
removing some currently supported features and locking out some future features 
from ever happening.


I can understand you, if you say it is not desirable.

I like the general concept of simple clients and solving complicated stuff at 
the server.

Now to "possible":

 - What features are not supported if you do resolve dependencies on the server?
 - What features are not possible in the future?


Currently pip can be configured with multiple repository locations that it will 
use when resolving dependencies. By
default this only includes PyPI but people can either remove that, or add 
additional repository locations. In order
to support this we need a resolver that can union multiple repositories 
together before doing the resolving. If the
repository itself was the one handling the resolution than we are locked into a 
single repository per invocation of
pip.

I am aware of that. In our company the CI system has no access to pypi.org. All packages come from our package serverwhich contains a mirror of some pypi packages.


If this can be done on the client side today, I see no problem doing this on 
the server-side tomorrow.

Additionally, pip can also be configured to use a simple directory full of 
files as a repository. Since this is just
a simple directory, there *is* no server process running that would allow for a 
server side resolver to happen and
pip either *must* handle the resolution itself in this case or it must disallow 
these feature all together.


Same as above: can be done on a server, too.


Additionally, the fact that we currently treat the server as a “dumb” server, 
means that someone can implement a PEP
503 compatible repository very trivially with pretty much any web server that 
supports static files and automatically
generating an index for static files. Switching to server side resolution would 
require removing this capability and
force everyone to run a dedicated repository software that can handle that 
resolution.


You currently treat the server as a "dump" server. That's ok.

Did I think I want to replace your server with my idea? I am very sorry if you 
thought this way.

My solution is optional and just an idea. I never meant that pypi.or or the new 
wheel server should use my idea.

You use the word "force". Nobody gets forced just because there is an 
alternative.

Additionally, we want there to be as little variance in the requests that 
people make to the repository as possible.
We utilize a caching CDN layer which handles > 80% of the total traffic to PyPI 
which is the primary reason we’ve
been able to scale to handling 5TB and ~50 million requests a day with a 
skeleton crew of people. If we move to a
server side dependency resolution than we reduce our ability to ensure that as 
many requests as possible are served
directly out of the cache rather than having to be go back to our backend 
servers.


Your thoughts were too fast. There are a lot of private package hostings 
servers in intranets of companies.

In this context the load can be handled very well. And if you have CI-Systems 
asking for the same stuff
over and over again, caching could improve the speed very much. You can do 
caching at high level: all
projects going through CI in one company benefit.

Finally, we want to move further away from trusting the actual repository where 
we can. In the future we’ll be
allowing package signing that will make it possible to survive a compromise of 
the repository. However there is no
way to do that if the repository needs to be able to dynamically generate a 
list of packages that need to be
installed as part of a resolution process because by definition that needs to 
be done on the fly and thus must be
signed by a key that the repository has access too if it’s signed at all. 
However, since the metadata for a package
can be signed once and then it never changes, that can be signed by a human 
when they are uploading to PyPI and than
pip can verify the signature on that metadata before feeding it into the 
resolver. This would allow us to treat PyPI
as just an untrusted middleman instead of something that is essentially going 
to be allowed to force us to execute
arbitrary code whenever someone does a pip install (because it’ll be able to 
instruct us to install any package, and
packages can contain arbitrary code).


My idea is made of two parts which don't depend on each other.

The main (first) part is dep resolution on server:

Input: install_requires list with fuzzy version requirements
Output: version pinned package list.

If the server was hacked. What could a black hat hacker have done?
He could send you an evil line in the result. Instead of "Django==1.8.3" he 
could
send you "Django-with-my-evil-hacks-included==1.8.3".

It is still up to the client if he install the requirements that the server 
gave you.
These packages can be downloaded individually and checked with the way you want
pip the check packages in the future.

I understand you fear for the second part: Creating one package from a list
of version-pinned requirements.

Hopefully that answers your question about why it’s unlikely that we’ll ever 
move to a server side dependency
resolver because even though it is possible to do so, doing it would severely 
regress a number of very important
features.


I just wanted to share my idea: 
https://github.com/guettli/virtualenv-build-server

The idea is in the public domain. I will happily coach developers who
want to implement it. I won't implement the idea myself :-)

Regards,
  Thomas Güttler



--
Thomas Guettler http://www.thomas-guettler.de/
_______________________________________________
Distutils-SIG maillist  -  [email protected]
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] Serverside Dependency Resolution and Virtualenv Build Server

Reply via email to