Re: [Distutils] Reproducible builds (Sdist)
Hi all, On Fri, Sep 29, 2017 at 12:04 PM, Jakub Wilk wrote: > It not enough to normalize timestamps. You need to normalize permissions and > ownership, too. > > (I'm using https://pypi.python.org/pypi/distutils644 for normalizing > permissions/ownership in my own packages.) > Thanks Jakub this will be helpful for me; > Yeah, I don't believe distutils honors SOURCE_DATE_EPOCH at the moment. > >> Second; is there a convention to store the SDE value ? > > In the changelog. I'll consider that as well; On Sun, Oct 1, 2017 at 10:31 PM, Nick Coghlan wrote: > On 30 September 2017 at 06:02, Thomas Kluyver wrote: >> On Fri, Sep 29, 2017, at 07:16 PM, Matthias Bussonnier wrote: > > For distro level reproducible build purposes, we typically treat the > published tarball *as* the original sources, and don't really worry > about the question of "Can we reproduce that tarball, from that VCS > tree?". Thanks for the detail explanation Nick, even if this was not the original goal of SDE, I would still like to have it reproducible build of sdist even if my package does not have source generation like Cython; I'll embed the timestamp in the commit for now; and see if I can also extract the timestamp from the commit log. AFAICT it's `git log -1 --pretty=format:%ct` if it's of interest to anyone. My interest in this is to have CI to build the sdist, and make sure independant machines can get the same artifact in order to have a potentially distributed agreement on what the sdist is. Is there any plan (or would it be accepted), to try to upstream patches like distutils644 Jakub linked to ? Thanks, -- Matthias ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Reproducible builds (Sdist)
On 30 September 2017 at 06:02, Thomas Kluyver wrote: > On Fri, Sep 29, 2017, at 07:16 PM, Matthias Bussonnier wrote: >> Second; is there a convention to store the SDE value ? I don't seem to >> be able to find one. It is nice to have reproducible build; but if >> it's a pain for reproducers to find the SDE value that highly decrease >> the value of SDE build. > > Does it make sense to add a new optional metadata field to store the > value of SOURCE_DATE_EPOCH if it's set when a distribution is built? I > guess it could cause problems if unpacking & repacking a tarball means > that its metadata is no longer accurate, though. For distro level reproducible build purposes, we typically treat the published tarball *as* the original sources, and don't really worry about the question of "Can we reproduce that tarball, from that VCS tree?". This stems from the original model of open source distribution, where publication *was* a matter of putting a tarball up on a website somewhere, and it was an open question as to whether or not the publisher was even using a version control system at all (timeline: RCS=1982, CVS=1986, SVN=2000, git/hg=2005, with Linux distributions getting their start in the early-to-mid 1990's). So SOURCE_DATE_EPOCH gets applied *after* unpacking the original tarball, rather than being used to *create* the tarball (we already know when the publisher created it, since that's part of the tarball metadata). Python's sdists mess with that assumption a bit, since it's fairly common to include generated C files that aren't part of the original source tree, and Cython explicitly recommends doing so in order to avoid requiring Cython as a build time dependency: http://docs.cython.org/en/latest/src/reference/compilation.html#distributing-cython-modules So in many ways, this isn't the problem that SOURCE_DATE_EPOCH on its own is designed to solve - instead, it's asking the question of "How do I handle the case where my nominal source archive is itself a built artifact?", which means you not only need to record source timestamps of the original inputs you used to build the artifact (which the version control system will give you), you also need to record details of the build tools used (e.g. using a different version of Cython will generate different code, and hence different "source" archives), and decide what to do with any timestamps on the *output* artifacts you generate (e.g. you may decide to force them to match the commit date from the VCS). So saying "SOURCE_DATE_EPOCH will be set to the VCS commit date when creating an sdist" would be a reasonable thing for an sdist creation tool to decide to do, and combined with something like `Pipfile.lock` in `pipenv`, or a `dev-requirements.txt` with fully pinned versions, *would* go a long way towards giving you reproducible sdist archives. However, it's not a problem to be solved by adding anything to the produced sdist: it's a property of the publishing tools that create sdists to aim to ensure that given the same inputs, on a different machine, at a different time, you will nevertheless still get the same result. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Reproducible builds (Sdist)
* Matthias Bussonnier , 2017-09-29, 11:16: I'm interested in the reproducible build of an _sdist_. That is to say the process of going from a given commit to the corresponding TGZ file. It is my understanding that setting SOURCE_DATE_EPOCH (SDE for short) should allow a reproducible building of an Sdist; It not enough to normalize timestamps. You need to normalize permissions and ownership, too. (I'm using https://pypi.python.org/pypi/distutils644 for normalizing permissions/ownership in my own packages.) I cannot seem to be able to do that without unpacking and repacking the tgz myself; Yeah, I don't believe distutils honors SOURCE_DATE_EPOCH at the moment. Second; is there a convention to store the SDE value ? In the changelog. -- Jakub Wilk ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Reproducible builds (Sdist)
> Does it make sense to add a new optional metadata field to store the > value of SOURCE_DATE_EPOCH if it's set when a distribution is built? I > guess it could cause problems if unpacking & repacking a tarball means > that its metadata is no longer accurate, though. That make sens – and that would be useful, but then that mean you need to have the sdist to reproduce the sdist... I was more thinking of a location in the source-tree/commit; for example in pyproject.toml's tool section. So if I give you only that you can tell me "When I build the sdist I get this sha256", and I can do the same independently. -- M On Fri, Sep 29, 2017 at 1:02 PM, Thomas Kluyver wrote: > On Fri, Sep 29, 2017, at 07:16 PM, Matthias Bussonnier wrote: >> Second; is there a convention to store the SDE value ? I don't seem to >> be able to find one. It is nice to have reproducible build; but if >> it's a pain for reproducers to find the SDE value that highly decrease >> the value of SDE build. > > Does it make sense to add a new optional metadata field to store the > value of SOURCE_DATE_EPOCH if it's set when a distribution is built? I > guess it could cause problems if unpacking & repacking a tarball means > that its metadata is no longer accurate, though. > > Thomas > ___ > Distutils-SIG maillist - Distutils-SIG@python.org > https://mail.python.org/mailman/listinfo/distutils-sig ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Reproducible builds (Sdist)
On Fri, Sep 29, 2017, at 07:16 PM, Matthias Bussonnier wrote: > Second; is there a convention to store the SDE value ? I don't seem to > be able to find one. It is nice to have reproducible build; but if > it's a pain for reproducers to find the SDE value that highly decrease > the value of SDE build. Does it make sense to add a new optional metadata field to store the value of SOURCE_DATE_EPOCH if it's set when a distribution is built? I guess it could cause problems if unpacking & repacking a tarball means that its metadata is no longer accurate, though. Thomas ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
[Distutils] Reproducible builds (Sdist)
Hello there, I'm going to ask questions about Reproducible Builds, a previous thread have been started in March[1], but does not cover some of the questions I have. In particular I'm interested in the reproducible build of an _sdist_. That is to say the process of going from a given commit to the corresponding TGZ file. It is my understanding that setting SOURCE_DATE_EPOCH (SDE for short) should allow a reproducible building of an Sdist; And by reproducible I mean that the tgz itself is the same byte for byte; (the unpacked-content being the same is a weaker form I'm less interested in). Is this assumption correct? In particular I cannot seem to be able to do that without unpacking and repacking the tgz myself; because the copy_tree-taring and the gziping by default embed the current timestamp of when these functions were ran. Am I missing something ? Second; is there a convention to store the SDE value ? I don't seem to be able to find one. It is nice to have reproducible build; but if it's a pain for reproducers to find the SDE value that highly decrease the value of SDE build. Also congrats for pep 517 and thanks for everyone who participated; Thanks -- Matthias 1: https://mail.python.org/pipermail/distutils-sig/2017-March/030284.html ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] reproducible builds
On 21/03/2017 16:52, Brett Cannon wrote: On Tue, 21 Mar 2017 at 04:54 Marius Gedminas wrote: . Python 3.6 changed the dict implementation so the ordering is always stable (and matches insertion order). Do realize that is an implementation detail and not guaranteed by the language specification, so it won't necessarily hold in the future or for other interpreters. -Brett one of the main issues in the reportlab pdf variability are the dict objects which come out as << /Key1 value . /Key n >> I think we have these coming out in sorted order without reliance on the underlying dicts. Up to now we used pixel equality ie the appearance, but as I understand it, reproducibility means byte equality which is harder. A bit of work has been done making the variation between Python 2.7 & 3.6 renderings go away. This reproducibility effort has revealed several bugs which is in itself useful. -- Robin Becker ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] reproducible builds
On Tue, 21 Mar 2017 at 04:54 Marius Gedminas wrote: > On Mon, Mar 20, 2017 at 11:30:59AM +, Robin Becker wrote: > > thanks for this; it seems the emphasis is on security. If the intent is > that > > reportlab should be able to reliably reproduce the same binary output > then I > > think I need to do more than just fix a couple of dates. We use many > > dictionary like objects to produce PDF and I am not sure all are sorted > by > > key during output. > > I'm sure the reproducible builds folks will send you patches if they > find any spots that you missed. ;-) > > > Is there a way to excite dictionary ordering changes? I believe there was > > some way to modify the hashing introduced when the dos dictionary attacks > > were an issue. Would it be sufficient to generate documents with say > Python > > 2.7 and check against 3.6? > > Python 3.6 changed the dict implementation so the ordering is always stable > (and matches insertion order). > Do realize that is an implementation detail and not guaranteed by the language specification, so it won't necessarily hold in the future or for other interpreters. -Brett ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] reproducible builds
On 21/03/2017 11:46, Marius Gedminas wrote: On Mon, Mar 20, 2017 at 11:30:59AM +, Robin Becker wrote: . I'm sure the reproducible builds folks will send you patches if they find any spots that you missed. ;-) Is there a way to excite dictionary ordering changes? I believe there was some way to modify the hashing introduced when the dos dictionary attacks were an issue. Would it be sufficient to generate documents with say Python 2.7 and check against 3.6? Python 3.6 changed the dict implementation so the ordering is always stable (and matches insertion order). You'll want to test with Python 3.5, which perturbs the dict ordering randomly, as a side effect of the randomized string/bytes hashes (unless you fix it by setting the PYTHONHASHSEED environment variable[*]) [*] https://docs.python.org/3.3/using/cmdline.html#envvar-PYTHONHASHSEED ... thanks for this Marius; having started on the reproducibility trail I find the python 3.x output has more mismatches than I like ('cos of missed bugs). -- Robin Becker ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] reproducible builds
On Mon, Mar 20, 2017 at 11:30:59AM +, Robin Becker wrote: > thanks for this; it seems the emphasis is on security. If the intent is that > reportlab should be able to reliably reproduce the same binary output then I > think I need to do more than just fix a couple of dates. We use many > dictionary like objects to produce PDF and I am not sure all are sorted by > key during output. I'm sure the reproducible builds folks will send you patches if they find any spots that you missed. ;-) > Is there a way to excite dictionary ordering changes? I believe there was > some way to modify the hashing introduced when the dos dictionary attacks > were an issue. Would it be sufficient to generate documents with say Python > 2.7 and check against 3.6? Python 3.6 changed the dict implementation so the ordering is always stable (and matches insertion order). You'll want to test with Python 3.5, which perturbs the dict ordering randomly, as a side effect of the randomized string/bytes hashes (unless you fix it by setting the PYTHONHASHSEED environment variable[*]) [*] https://docs.python.org/3.3/using/cmdline.html#envvar-PYTHONHASHSEED Regards, Marius Gedminas -- Yes, always begin work on inherited code by removing comments. Even if they were maintained (they are not) they are natural language written by engineers who cannot be understood ordering coffee in a diner. Getting back to comments not being maintained, my saying on that one is, "Comments do not run." -- Kenny Tilton signature.asc Description: PGP signature ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] reproducible builds
On 20 March 2017 at 23:34, Thomas Kluyver wrote: > On Mon, Mar 20, 2017, at 01:02 PM, Robin Becker wrote: > > I guess the algorithm variation across pythons would make dictionary > order quite variable. > > For a Python based tool, I think it's reasonable that reproducing a > build requires running with the same version of Python. > > The requirement would be that, with enough information about the build > environment, you *can* produce an identical PDF. It needn't (AFAIK) be > identical every time anyone builds it. > Right, one of the other aspects of reproducible-builds is looking into ways to define and distribute build environments in addition to the application source code: https://reproducible-builds.org/docs/definition-strategies/ Within a given binary context (e.g. Debian packages), that may be a text description, like Debian's buildinfo files: https://wiki.debian.org/ReproducibleBuilds/BuildinfoFiles For Fedora/RHEL/CentOS, the equivalent would probably be to extract a suitable config from the build system: https://fedoraproject.org/wiki/Using_the_Koji_build_system#Using_koji_to_generate_a_mock_config_to_replicate_a_buildroot In other cases, the build environment may itself by a binary artifact (e.g. the manylinux1 container images, or the "Holy Build Box" machine images). Fully eliminating non-determinism usually does requiring switching to explicit sorting and ordered containers in build tools and scripts, as otherwise even things like directory listings or JSON serialisation can introduce variations in output when a build is run on a different machine. The reproducible-builds project offers some interesting tools to identify and analyse cases of non-reproducible outputs: https://reproducible-builds.org/tools/ However, nobody can reasonably expect arbitrary upstream projects (especially volunteer run ones) to be going out and pre-emptively solving that kind of problem - the most it's realistic to aim for is to encourage projects to be accommodating when upstream changes are proposed to introduce more determinism into the build processes for particular projects, as well as into the artifact generation process for tools that may be used as part of the build process for other projects. (And I agree with Thomas that it's likely the latter case that applies for reportlab-generated PDFs) Cheers, Nick. P.S. Prompted by Gary Berhnhardt, one of the ways I've started thinking about the whole question of "built artifacts" in general is as a complex distributed caching problem, with reproducible builds being a way of ensuring that it's possible to check the validity of particular cache entries -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] reproducible builds
On Mon, Mar 20, 2017, at 01:02 PM, Robin Becker wrote: > Well now I am confused. The date / times mentioned in the debian patch > are those > we force into the documents produced by the reportlab package when it is > used. > > They would not normally be part of the package itself. Although the > reportlab > documentation is available in the source I'm fairly sure we don't include > it in > the wheels. I'm guessing, but I imagine that Debian may be using reportlab in the builds of other packages, to build documentation. It's normal for Debian packages to include built docs, unlike wheels. So they would want it to create PDFs reproducibly, but the PDFs generated in your test suite probably don't matter. > I guess the algorithm variation across pythons would make dictionary order > quite variable. For a Python based tool, I think it's reasonable that reproducing a build requires running with the same version of Python. The requirement would be that, with enough information about the build environment, you *can* produce an identical PDF. It needn't (AFAIK) be identical every time anyone builds it. Thomas ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] reproducible builds
On 20/03/2017 11:35, Thomas Kluyver wrote: On Mon, Mar 20, 2017, at 09:00 AM, Robin Becker wrote: Obviously if I have the ability to embed repr(some_object) into the document output then it will vary (unless the underlying python is reproducible). I'm not sure if debian runs the whole reportlab test suite, but it makes sense to get this kind of variablity out. AIUI, it's fine to have the *ability* to produce non-deterministic output, and it doesn't matter if your tests do that. The aim of reproducible builds is to be able to go from the same source code to an identical binary package. Documents generated by running the tests are presumably not included in binary packages, so it doesn't matter if they change. Well now I am confused. The date / times mentioned in the debian patch are those we force into the documents produced by the reportlab package when it is used. They would not normally be part of the package itself. Although the reportlab documentation is available in the source I'm fairly sure we don't include it in the wheels. Of course if the debian packaging includes output created by reportlab then that document would receive the current (ie variable) time. In addition any random behaviour created by the reportlab generation code would also be embedded in the document. If the debian variable is intended create reproducible PDF as part of their packaging of reportlab or some other package then I'm fairly sure that other variation will need to be checked in addition to the control that the SOURCE_DATE_EPOCH variable would give. Perhaps Matthias could comment; I know little about how the debian packaging works. I believe there was some way to modify the hashing introduced when the dos dictionary attacks were an issue. The PYTHONHASHSEED environment variable: https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED If you have non-determinism introduced by Python hashing, setting a constant value of PYTHONHASHSEED should be an easy way to work around it. Well years ago we tried to get some random behaviour in text selection by setting a seed value eg 23..22 (but that doesn't work across pythons). I guess the algorithm variation across pythons would make dictionary order quite variable. C:\Users\rptlab>\python27\python Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:53:40) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. import random random.seed(23..22) from random import randint, choice randint(10,25) 15 C:\Users\rptlab>\python36\python Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. import random random.seed(23..22) from random import randint, choice randint(10,25) 21 -- Robin Becker ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] reproducible builds
As Thomas mentioned PYTHONHASHSEED is sufficient to solve non-determinism by the hashing. In my experience this hashing, along with datetimes (e.g. in the bytecode) are typically the only causes of non-determinism in Python packages. Someone from I think Debian did mention [1] that they cannot always set PYTHONHASHSEED and so in certain cases they apply patches to fix non-determinism. This is what they might be after in the case of `reportlab` but you best ask them. I'm not yet sure what to think of that patching approach. E.g., if one couldn't set PYTHONHASHSEED when building the bytecode in the interpreter itself, then one would have to convert all sets to lists with potential negative performance effects. On Mon, Mar 20, 2017 at 12:35 PM, Thomas Kluyver wrote: > On Mon, Mar 20, 2017, at 09:00 AM, Robin Becker wrote: > > Obviously if I have the ability to embed repr(some_object) > > into the document output then it will vary (unless the underlying python > > is reproducible). I'm not sure if debian runs the whole reportlab test > > suite, but it makes sense to get this kind of variablity out. > > AIUI, it's fine to have the *ability* to produce non-deterministic > output, and it doesn't matter if your tests do that. The aim of > reproducible builds is to be able to go from the same source code to an > identical binary package. Documents generated by running the tests are > presumably not included in binary packages, so it doesn't matter if they > change. > > > I believe there was some way to modify the hashing introduced when the > dos dictionary attacks were an issue. > > The PYTHONHASHSEED environment variable: > https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED > > If you have non-determinism introduced by Python hashing, setting a > constant value of PYTHONHASHSEED should be an easy way to work around > it. > ___ > Distutils-SIG maillist - Distutils-SIG@python.org > https://mail.python.org/mailman/listinfo/distutils-sig > ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] reproducible builds
On Mon, Mar 20, 2017, at 09:00 AM, Robin Becker wrote: > Obviously if I have the ability to embed repr(some_object) > into the document output then it will vary (unless the underlying python > is reproducible). I'm not sure if debian runs the whole reportlab test > suite, but it makes sense to get this kind of variablity out. AIUI, it's fine to have the *ability* to produce non-deterministic output, and it doesn't matter if your tests do that. The aim of reproducible builds is to be able to go from the same source code to an identical binary package. Documents generated by running the tests are presumably not included in binary packages, so it doesn't matter if they change. > I believe there was some way to modify the hashing introduced when the dos > dictionary attacks were an issue. The PYTHONHASHSEED environment variable: https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED If you have non-determinism introduced by Python hashing, setting a constant value of PYTHONHASHSEED should be an easy way to work around it. ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] reproducible builds
On 18/03/2017 07:20, Nick Coghlan wrote: ... While the reproducible builds effort started in Debian and is furthest advanced there, it's not distro specific - interested developers working on other distros were already looking into it, and the Core Infrastructure Initiative has backed it as one of their security assurance initiatives. Software Freedom Conservancy have a decent write-up on the current state of things after December's Reproducible Builds Summit: https://sfconservancy.org/blog/2016/dec/26/reproducible-builds-summit-report/ thanks for this; it seems the emphasis is on security. If the intent is that reportlab should be able to reliably reproduce the same binary output then I think I need to do more than just fix a couple of dates. We use many dictionary like objects to produce PDF and I am not sure all are sorted by key during output. Is there a way to excite dictionary ordering changes? I believe there was some way to modify the hashing introduced when the dos dictionary attacks were an issue. Would it be sufficient to generate documents with say Python 2.7 and check against 3.6? However, you'll probably want to make yourself a helper function that uses SOURCE_DATE_EPOCH if defined, and falls back to the current time otherwise. That way you'll get reproducible behaviour when a build system configures the setting, while retaining your current behaviour for environments that don't. good advice and that's what I am doing. Cheers, Nick. P.S. A question well worth asking for *us* is whether or not setting SOURCE_DATE_EPOCH appropriately (if it isn't already set in the current environment) should be part of the build system abstraction PEPs. -- Robin Becker ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] reproducible builds
On 17/03/2017 17:49, David Wilson wrote: Hey Robin, What happens if other distros decide not to use this environment variable? Do I really want distro specific code in the package? AFAIK this is seeing a great deal of use outside of Debian and even Linux, for instance GCC also supports this variable. In short where does the distro responsibility and package maintainers boundary need to be? I guess it mostly comes down to whether you'd like them to carry the debt of a vendor patch to implement the behaviour for you in a way you don't like, or you'd prefer to retain full control. :) So it's more a preference than a responsibility. David . I think I accept the need to support this variable. Our original use case was for testing purposes where we altered dates injected into the produced pdf meta data and also in some cases the content. However, if that is the implied intent of the debian variable then I will also need to modify the behaviour of some other tests eg in one case the produced pdf output looks like this The value of i is not larger than 3 The value of i is equal to 3 The value of i is not less than 3 The value of i is 3 The value of i is 2 The value of i is 1 {'doc': , 'currentFrame': 'normal', 'currentPageTemplate': 'First', 'aW': 439.27559055118104, 'aH': 685.8897637795275, 'aWH': (439.27559055118104, 685.8897637795275), 'i': 0, 'availableWidth': 439.27559055118104, 'availableHeight': 619.8897637795275} The current page number is 1 ie we are introspecting internals and injecting that into the document content. I imagine I need to clean up the reporting to avoid getting addresses etc etc into the documents. Obviously if I have the ability to embed repr(some_object) into the document output then it will vary (unless the underlying python is reproducible). I'm not sure if debian runs the whole reportlab test suite, but it makes sense to get this kind of variablity out. When we make significant changes to existing behaviours our current workflow consists of generating a large number of outputs and then rendering them into jpeg pages with ghost script. Differences in the jpegs can be used to spot problems. -- Robin Becker ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] reproducible builds
> On Mar 18, 2017, at 3:20 AM, Nick Coghlan wrote: > > P.S. A question well worth asking for *us* is whether or not setting > SOURCE_DATE_EPOCH appropriately (if it isn't already set in the current > environment) should be part of the build system abstraction PEPs. > If it’s getting standard use (and it sounds like it is), then I think it should yes. — Donald Stufft ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] reproducible builds
On 18 March 2017 at 03:19, Robin Becker wrote: > An issue has been raised for reportlab to support a specific environment > variable namely SOURCE_DATE_EPOCH. The intent is that we should get our > time from this variable rather than time.localtime(time.time()) so that > produced documents are more invariant. > > First off is this a reasonable request? The variable is defined by debian > here https://reproducible-builds.org/specs/source-date-epoch/ > > What happens if other distros decide not to use this environment variable? > Do I really want distro specific code in the package? > While the reproducible builds effort started in Debian and is furthest advanced there, it's not distro specific - interested developers working on other distros were already looking into it, and the Core Infrastructure Initiative has backed it as one of their security assurance initiatives. Software Freedom Conservancy have a decent write-up on the current state of things after December's Reproducible Builds Summit: https://sfconservancy.org/blog/2016/dec/26/reproducible-builds-summit-report/ However, you'll probably want to make yourself a helper function that uses SOURCE_DATE_EPOCH if defined, and falls back to the current time otherwise. That way you'll get reproducible behaviour when a build system configures the setting, while retaining your current behaviour for environments that don't. Cheers, Nick. P.S. A question well worth asking for *us* is whether or not setting SOURCE_DATE_EPOCH appropriately (if it isn't already set in the current environment) should be part of the build system abstraction PEPs. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] reproducible builds
Hey Robin, > What happens if other distros decide not to use this environment variable? > Do I really want distro specific code in the package? AFAIK this is seeing a great deal of use outside of Debian and even Linux, for instance GCC also supports this variable. > In short where does the distro responsibility and package maintainers > boundary need to be? I guess it mostly comes down to whether you'd like them to carry the debt of a vendor patch to implement the behaviour for you in a way you don't like, or you'd prefer to retain full control. :) So it's more a preference than a responsibility. David ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] reproducible builds
Nixpkgs [1] uses SOURCE_DATE_EPOCH as well. We can reproducibly build the Python interpreter (and packages with [2]). [1] https://github.com/NixOS/nixpkgs [2] https://bitbucket.org/pypa/wheel/pull-requests/77 On Fri, Mar 17, 2017 at 6:46 PM, Matthias Klose wrote: > On 17.03.2017 18:19, Robin Becker wrote: > > An issue has been raised for reportlab to support a specific environment > > variable namely SOURCE_DATE_EPOCH. The intent is that we should get our > time > > from this variable rather than time.localtime(time.time()) so that > produced > > documents are more invariant. > > > > First off is this a reasonable request? The variable is defined by > debian here > > https://reproducible-builds.org/specs/source-date-epoch/ > > > > What happens if other distros decide not to use this environment > variable? Do I > > really want distro specific code in the package? > > > > In addition we already have our own mechanism for making the produced > documents > > invariant although it might require an extension to support externally > specified > > date & time as in the debian variable. > > > > In short where does the distro responsibility and package maintainers > boundary > > need to be? > > the reproducible-builds thing is not just a Debian thing, it's supported by > other distros and upstream projects. > > Matthias > > ___ > Distutils-SIG maillist - Distutils-SIG@python.org > https://mail.python.org/mailman/listinfo/distutils-sig > ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] reproducible builds
On 17.03.2017 18:19, Robin Becker wrote: > An issue has been raised for reportlab to support a specific environment > variable namely SOURCE_DATE_EPOCH. The intent is that we should get our time > from this variable rather than time.localtime(time.time()) so that produced > documents are more invariant. > > First off is this a reasonable request? The variable is defined by debian here > https://reproducible-builds.org/specs/source-date-epoch/ > > What happens if other distros decide not to use this environment variable? Do > I > really want distro specific code in the package? > > In addition we already have our own mechanism for making the produced > documents > invariant although it might require an extension to support externally > specified > date & time as in the debian variable. > > In short where does the distro responsibility and package maintainers boundary > need to be? the reproducible-builds thing is not just a Debian thing, it's supported by other distros and upstream projects. Matthias ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] reproducible builds
Flit already supports $SOURCE_DATE_EPOCH for building wheels. I think the environment variable is a good idea: if it gets wide support, you will be able to set a single thing to affect lots of different build tools, rather than working out where you need to add command line arguments to half a dozen different build steps. Thomas On Fri, Mar 17, 2017, at 05:33 PM, Matthias Bussonnier wrote: > On Fri, Mar 17, 2017 at 10:19 AM, Robin Becker > wrote: > > An issue has been raised for reportlab to support a specific environment > > variable namely SOURCE_DATE_EPOCH. The intent is that we should get our time > > from this variable rather than time.localtime(time.time()) so that produced > > documents are more invariant. > > > > First off is this a reasonable request? The variable is defined by debian > > here https://reproducible-builds.org/specs/source-date-epoch/ > > > > What happens if other distros decide not to use this environment variable? > > Do I really want distro specific code in the package? > > For what it is worth, it seem like it will make its way into CPython as > well: > https://github.com/python/cpython/pull/296 > > And IFAICT, this env variable naming is already more than just debian. > > -- > M > > > > > > In addition we already have our own mechanism for making the produced > > documents > > invariant although it might require an extension to support externally > > specified date & time as in the debian variable. > > > > In short where does the distro responsibility and package maintainers > > boundary need to be? > > -- > > Robin Becker > > ___ > > Distutils-SIG maillist - Distutils-SIG@python.org > > https://mail.python.org/mailman/listinfo/distutils-sig > ___ > Distutils-SIG maillist - Distutils-SIG@python.org > https://mail.python.org/mailman/listinfo/distutils-sig ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] reproducible builds
On Fri, Mar 17, 2017 at 10:19 AM, Robin Becker wrote: > An issue has been raised for reportlab to support a specific environment > variable namely SOURCE_DATE_EPOCH. The intent is that we should get our time > from this variable rather than time.localtime(time.time()) so that produced > documents are more invariant. > > First off is this a reasonable request? The variable is defined by debian > here https://reproducible-builds.org/specs/source-date-epoch/ > > What happens if other distros decide not to use this environment variable? > Do I really want distro specific code in the package? For what it is worth, it seem like it will make its way into CPython as well: https://github.com/python/cpython/pull/296 And IFAICT, this env variable naming is already more than just debian. -- M > > In addition we already have our own mechanism for making the produced > documents > invariant although it might require an extension to support externally > specified date & time as in the debian variable. > > In short where does the distro responsibility and package maintainers > boundary need to be? > -- > Robin Becker > ___ > Distutils-SIG maillist - Distutils-SIG@python.org > https://mail.python.org/mailman/listinfo/distutils-sig ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
[Distutils] reproducible builds
An issue has been raised for reportlab to support a specific environment variable namely SOURCE_DATE_EPOCH. The intent is that we should get our time from this variable rather than time.localtime(time.time()) so that produced documents are more invariant. First off is this a reasonable request? The variable is defined by debian here https://reproducible-builds.org/specs/source-date-epoch/ What happens if other distros decide not to use this environment variable? Do I really want distro specific code in the package? In addition we already have our own mechanism for making the produced documents invariant although it might require an extension to support externally specified date & time as in the debian variable. In short where does the distro responsibility and package maintainers boundary need to be? -- Robin Becker ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig