Re: [Distutils] Serverside Dependency Resolution and Virtualenv Build Server

Wes Turner Thu, 12 Nov 2015 05:00:08 -0800

On Nov 12, 2015 6:32 AM, "Leonardo Rochael Almeida" <leoroch...@gmail.com>
wrote:
>
> Hi Thomas,
>
> I think your idea could be very useful as an accelerator if installation
in closed environments, as you suggested in your last e-mail, but which
wasn't clear in your first.
>
> After all, in closed environments you have control of the machine
architecture of all clients, and can be reasonably sure that the wheels you
build server-side are installable client-side.
>
> By default, when proposing ideas on this list, people tend to assume
they're ideas being proposed to PyPI itself, unless there is a very clear
mention that this is not the case, hence Donald's answer.
>
> My only comment about your idea would be that since packages get upgraded
all the time, then the "fuzzy set of requirements" can't be treated as the
cache key, otherwise your pre-built virtualenvs will get stale all the
time...
>
> Rather, the cache key of the pre-built virtual environments should be the
"fixed set of packages with exactly pinned versions" that was resolved from
the fuzzy set.


* [(PKG, VERSTR)]
* {sys.platform: platform strings}
* [or] the revision of a meta-(package/module) and build options
  * e.g. --make-relocatable, prefix

... like a PPA build farm with a parameterized test 'grid'?

>
> Regards,
>
> Leo
>
>
> On 12 November 2015 at 06:55, Thomas Güttler <guettl...@thomas-guettler.de>
wrote:
>>
>> Am 11.11.2015 um 13:59 schrieb Donald Stufft:
>>>
>>> On November 11, 2015 at 1:30:57 AM, Thomas Güttler (
guettl...@thomas-guettler.de) wrote:
>>>>
>>>>
>>>> Maybe I am missing something, but still think server side dependency
resolution is possible.
>>>>
>>>
>>> I don’t believe it’s possible nor desirable to have the server handle
dependency resolution, at least not without
>>> removing some currently supported features and locking out some future
features from ever happening.
>>
>>
>> I can understand you, if you say it is not desirable.
>>
>> I like the general concept of simple clients and solving complicated
stuff at the server.
>>
>> Now to "possible":
>>
>>  - What features are not supported if you do resolve dependencies on the
server?
>>  - What features are not possible in the future?
>>
>>
>>>
>>> Currently pip can be configured with multiple repository locations that
it will use when resolving dependencies. By
>>> default this only includes PyPI but people can either remove that, or
add additional repository locations. In order
>>> to support this we need a resolver that can union multiple repositories
together before doing the resolving. If the
>>> repository itself was the one handling the resolution than we are
locked into a single repository per invocation of
>>> pip.
>>
>>
>> I am aware of that. In our company the CI system has no access to
pypi.org. All packages come from our package server which contains a mirror
of some pypi packages.
>>
>> If this can be done on the client side today, I see no problem doing
this on the server-side tomorrow.
>>
>>> Additionally, pip can also be configured to use a simple directory full
of files as a repository. Since this is just
>>> a simple directory, there *is* no server process running that would
allow for a server side resolver to happen and
>>> pip either *must* handle the resolution itself in this case or it must
disallow these feature all together.
>>
>>
>> Same as above: can be done on a server, too.
>>
>>>
>>> Additionally, the fact that we currently treat the server as a “dumb”
server, means that someone can implement a PEP
>>> 503 compatible repository very trivially with pretty much any web
server that supports static files and automatically
>>> generating an index for static files. Switching to server side
resolution would require removing this capability and
>>> force everyone to run a dedicated repository software that can handle
that resolution.
>>
>>
>> You currently treat the server as a "dump" server. That's ok.
>>
>> Did I think I want to replace your server with my idea? I am very sorry
if you thought this way.
>>
>> My solution is optional and just an idea. I never meant that pypi.or or
the new wheel server should use my idea.
>>
>> You use the word "force". Nobody gets forced just because there is an
alternative.
>>
>>> Additionally, we want there to be as little variance in the requests
that people make to the repository as possible.
>>> We utilize a caching CDN layer which handles > 80% of the total traffic
to PyPI which is the primary reason we’ve
>>> been able to scale to handling 5TB and ~50 million requests a day with
a skeleton crew of people. If we move to a
>>> server side dependency resolution than we reduce our ability to ensure
that as many requests as possible are served
>>> directly out of the cache rather than having to be go back to our
backend servers.
>>
>>
>> Your thoughts were too fast. There are a lot of private package hostings
servers in intranets of companies.
>>
>> In this context the load can be handled very well. And if you have
CI-Systems asking for the same stuff
>> over and over again, caching could improve the speed very much. You can
do caching at high level: all
>> projects going through CI in one company benefit.
>>
>>> Finally, we want to move further away from trusting the actual
repository where we can. In the future we’ll be
>>> allowing package signing that will make it possible to survive a
compromise of the repository. However there is no
>>> way to do that if the repository needs to be able to dynamically
generate a list of packages that need to be
>>> installed as part of a resolution process because by definition that
needs to be done on the fly and thus must be
>>> signed by a key that the repository has access too if it’s signed at
all. However, since the metadata for a package
>>> can be signed once and then it never changes, that can be signed by a
human when they are uploading to PyPI and than
>>> pip can verify the signature on that metadata before feeding it into
the resolver. This would allow us to treat PyPI
>>> as just an untrusted middleman instead of something that is essentially
going to be allowed to force us to execute
>>> arbitrary code whenever someone does a pip install (because it’ll be
able to instruct us to install any package, and
>>> packages can contain arbitrary code).
>>
>>
>> My idea is made of two parts which don't depend on each other.
>>
>> The main (first) part is dep resolution on server:
>>
>> Input: install_requires list with fuzzy version requirements
>> Output: version pinned package list.
>>
>> If the server was hacked. What could a black hat hacker have done?
>> He could send you an evil line in the result. Instead of "Django==1.8.3"
he could
>> send you "Django-with-my-evil-hacks-included==1.8.3".
>>
>> It is still up to the client if he install the requirements that the
server gave you.
>> These packages can be downloaded individually and checked with the way
you want
>> pip the check packages in the future.
>>
>> I understand you fear for the second part: Creating one package from a
list
>> of version-pinned requirements.
>>
>>> Hopefully that answers your question about why it’s unlikely that we’ll
ever move to a server side dependency
>>> resolver because even though it is possible to do so, doing it would
severely regress a number of very important
>>> features.
>>
>>
>> I just wanted to share my idea:
https://github.com/guettli/virtualenv-build-server
>>
>> The idea is in the public domain. I will happily coach developers who
>> want to implement it. I won't implement the idea myself :-)
>>
>> Regards,
>>   Thomas Güttler
>>
>>
>>
>> --
>> Thomas Guettler http://www.thomas-guettler.de/
>>
>> _______________________________________________
>> Distutils-SIG maillist  -  Distutils-SIG@python.org
>> https://mail.python.org/mailman/listinfo/distutils-sig
>
>
>
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>

_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] Serverside Dependency Resolution and Virtualenv Build Server

Reply via email to