Re: [Distutils] Reproducible builds (Sdist)

2017-10-01 Thread Nick Coghlan
On 30 September 2017 at 06:02, Thomas Kluyver  wrote:
> On Fri, Sep 29, 2017, at 07:16 PM, Matthias Bussonnier wrote:
>> Second; is there a convention to store the SDE value ? I don't seem to
>> be able to find one. It is nice to have reproducible build; but if
>> it's a pain for reproducers to find the SDE value that highly decrease
>> the value of SDE build.
>
> Does it make sense to add a new optional metadata field to store the
> value of SOURCE_DATE_EPOCH if it's set when a distribution is built? I
> guess it could cause problems if unpacking & repacking a tarball means
> that its metadata is no longer accurate, though.

For distro level reproducible build purposes, we typically treat the
published tarball *as* the original sources, and don't really worry
about the question of "Can we reproduce that tarball, from that VCS
tree?".

This stems from the original model of open source distribution, where
publication *was* a matter of putting a tarball up on a website
somewhere, and it was an open question as to whether or not the
publisher was even using a version control system at all (timeline:
RCS=1982, CVS=1986, SVN=2000, git/hg=2005, with Linux distributions
getting their start in the early-to-mid 1990's).

So SOURCE_DATE_EPOCH gets applied *after* unpacking the original
tarball, rather than being used to *create* the tarball (we already
know when the publisher created it, since that's part of the tarball
metadata).

Python's sdists mess with that assumption a bit, since it's fairly
common to include generated C files that aren't part of the original
source tree, and Cython explicitly recommends doing so in order to
avoid requiring Cython as a build time dependency:
http://docs.cython.org/en/latest/src/reference/compilation.html#distributing-cython-modules

So in many ways, this isn't the problem that SOURCE_DATE_EPOCH on its
own is designed to solve - instead, it's asking the question of "How
do I handle the case where my nominal source archive is itself a built
artifact?", which means you not only need to record source timestamps
of the original inputs you used to build the artifact (which the
version control system will give you), you also need to record details
of the build tools used (e.g. using a different version of Cython will
generate different code, and hence different "source" archives), and
decide what to do with any timestamps on the *output* artifacts you
generate (e.g. you may decide to force them to match the commit date
from the VCS).

So saying "SOURCE_DATE_EPOCH will be set to the VCS commit date when
creating an sdist" would be a reasonable thing for an sdist creation
tool to decide to do, and combined with something like `Pipfile.lock`
in `pipenv`, or a `dev-requirements.txt` with fully pinned versions,
*would* go a long way towards giving you reproducible sdist archives.

However, it's not a problem to be solved by adding anything to the
produced sdist: it's a property of the publishing tools that create
sdists to aim to ensure that given the same inputs, on a different
machine, at a different time, you will nevertheless still get the same
result.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Extracting distutils into setuptools

2017-10-01 Thread Donald Stufft


> On Oct 1, 2017, at 1:53 PM, xoviat  wrote:
> 
> After thinking again about that possibilities that we've discussed here, I 
> realized that a previously proposed alternative would eliminate external 
> build-time dependencies and allow us to merge setuptools with distutils: an 
> "ensuresetuptools" module. This was proposed by @zooba, but basically the 
> idea would be to bundle a wheel of setuptools (setuptools is 
> py2.py3.none-any) that Python could install without requiring network access 
> or other modules. If distutils is required during the build process, then 
> this idea should conform to all of the requirements proposed here and take 
> distutils off of the CPython release schedule.


This isn’t as easy as ensurepip, because ensurepip can wait until the end of 
the build process when the entire Python installation has been built. The same 
isn’t true for a hypothetical ensuresetuptools module. This is because we end 
up in a circular dependency, if installing a wheel requires a C extension (like 
say zlib) then we can’t install that wheel prior to building zlib, but if we 
need to install that wheel to build zlib then we end up stuck.

I’m not sure what all c-extensions are used in the process of installing a 
wheel— certainly zlib is but we could maybe build a special wheel that only 
uses the STORED algorithm and doesn’t do compression (does the zipfile module 
work with zlib doesn’t exist?). I’m going to guess there are others though and 
no idea if they are able to be avoided.

I still think a better idea if we want to go down that route is to modify the 
CPython build process to not depend on distutils at all.


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Extracting distutils into setuptools

2017-10-01 Thread xoviat
After thinking again about that possibilities that we've discussed here, I
realized that a previously proposed alternative would eliminate external
build-time dependencies and allow us to merge setuptools with distutils: an
"ensuresetuptools" module. This was proposed by @zooba, but basically the
idea would be to bundle a wheel of setuptools (setuptools is
py2.py3.none-any) that Python could install without requiring network
access or other modules. If distutils is required during the build process,
then this idea should conform to all of the requirements proposed here and
take distutils off of the CPython release schedule.

Although it may seem more complicated at first, once implemented,
maintenance from the CPython side would be minimal (a bot could update the
wheel, although I'm not sure whether this is done with ensurepip), there is
precedent for this (with ensurepip), end-users could continue to use
distutils without any modifications to their scripts (ensuresetuptools
could be run during the installation process, and even if it isn't, then a
single command line could install distutils/setuptools), and it would allow
simplified setuptools maintenance (no monkeypatching).

2017-09-30 20:14 GMT-05:00 Donald Stufft :

> I think that the CPython builds a python executable, then uses that built
> executable to finish the installation.
>
>
> On Sep 30, 2017, at 9:11 PM, xoviat  wrote:
>
> It would be nice to know whether this information is correct, or whether I
> hold an invalid belief.
>
> 2017-09-30 20:09 GMT-05:00 xoviat :
>
>> I have personally not built Python myself (though I've built many an
>> extension), but what I can say is that I got the idea from Larry Hastings.
>> According to him (this if for the Gilectomy fork):
>>
>> "Second, as you hack on the Gilectomy you may break your "python"
>> executable rather badly. This is of course expected. However, the python
>> Makefile itself depends on having a working local python interpreter, so
>> when you break that you often break your build too."
>>
>> 2017-09-30 19:59 GMT-05:00 Donald Stufft :
>>
>>>
>>>
>>> On Sep 30, 2017, at 3:52 PM, xoviat  wrote:
>>>
>>> I don't think CPython needs to bundle all of its build-time
>>> dependencies. That principle doesn't really apply to other Python programs
>>> nor most other programs in general. AFAIK, CPython already has a build-time
>>> dependency on another, external, Python, so it wouldn't be too much to
>>> require the external Python to have setuptools installed with something
>>> like pyproject.toml (other programming languages usually bootstrap
>>> themselves with previous versions of the language along with some
>>> associated build tools).
>>>
>>>
>>> As far as I can tell, CPython does *not* have a build time dependency on
>>> having Python available. I just spun up a bare alpine linux container and
>>> compiled CPython `master` branch on it. As far as I can tell the only
>>> Python that exists in this container is the one I just compiled.
>>>
>>> That means that in order for CPython to depend on distutils to build as
>>> you indicate, it would also need to start depending on an existing version
>>> of Python being available. I don’t think that’s a great idea. I think
>>> Python should not depend on Python to build.
>>>
>>
>>
>
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig