Re: [Distutils] Reproducible builds (Sdist)

2017-10-02 Thread Matthias Bussonnier
Hi all,

On Fri, Sep 29, 2017 at 12:04 PM, Jakub Wilk  wrote:
> It not enough to normalize timestamps. You need to normalize permissions and
> ownership, too.
>
> (I'm using https://pypi.python.org/pypi/distutils644 for normalizing
> permissions/ownership in my own packages.)
>
Thanks Jakub this will be helpful for me;

> Yeah, I don't believe distutils honors SOURCE_DATE_EPOCH at the moment.
>
>> Second; is there a convention to store the SDE value ?
>
> In the changelog.

I'll consider that as well;


On Sun, Oct 1, 2017 at 10:31 PM, Nick Coghlan  wrote:
> On 30 September 2017 at 06:02, Thomas Kluyver  wrote:
>> On Fri, Sep 29, 2017, at 07:16 PM, Matthias Bussonnier wrote:
>
> For distro level reproducible build purposes, we typically treat the
> published tarball *as* the original sources, and don't really worry
> about the question of "Can we reproduce that tarball, from that VCS
> tree?".

Thanks for the detail explanation Nick, even if this was not the
original goal of SDE,
I would still like to have it reproducible build of sdist even if my package
does not have source generation like Cython;  I'll embed the timestamp in the
commit for now;  and see if I can also extract the timestamp from the
commit log.
AFAICT it's `git log -1 --pretty=format:%ct` if it's of interest to anyone.

My interest in this is to have CI to build the sdist, and make sure independant
machines can get the same artifact in order to have a potentially distributed
agreement on what the sdist is.

Is there any plan (or would it be accepted), to try to upstream patches like
distutils644 Jakub linked to ?

Thanks,
-- 
Matthias
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Reproducible builds (Sdist)

2017-10-01 Thread Nick Coghlan
On 30 September 2017 at 06:02, Thomas Kluyver  wrote:
> On Fri, Sep 29, 2017, at 07:16 PM, Matthias Bussonnier wrote:
>> Second; is there a convention to store the SDE value ? I don't seem to
>> be able to find one. It is nice to have reproducible build; but if
>> it's a pain for reproducers to find the SDE value that highly decrease
>> the value of SDE build.
>
> Does it make sense to add a new optional metadata field to store the
> value of SOURCE_DATE_EPOCH if it's set when a distribution is built? I
> guess it could cause problems if unpacking & repacking a tarball means
> that its metadata is no longer accurate, though.

For distro level reproducible build purposes, we typically treat the
published tarball *as* the original sources, and don't really worry
about the question of "Can we reproduce that tarball, from that VCS
tree?".

This stems from the original model of open source distribution, where
publication *was* a matter of putting a tarball up on a website
somewhere, and it was an open question as to whether or not the
publisher was even using a version control system at all (timeline:
RCS=1982, CVS=1986, SVN=2000, git/hg=2005, with Linux distributions
getting their start in the early-to-mid 1990's).

So SOURCE_DATE_EPOCH gets applied *after* unpacking the original
tarball, rather than being used to *create* the tarball (we already
know when the publisher created it, since that's part of the tarball
metadata).

Python's sdists mess with that assumption a bit, since it's fairly
common to include generated C files that aren't part of the original
source tree, and Cython explicitly recommends doing so in order to
avoid requiring Cython as a build time dependency:
http://docs.cython.org/en/latest/src/reference/compilation.html#distributing-cython-modules

So in many ways, this isn't the problem that SOURCE_DATE_EPOCH on its
own is designed to solve - instead, it's asking the question of "How
do I handle the case where my nominal source archive is itself a built
artifact?", which means you not only need to record source timestamps
of the original inputs you used to build the artifact (which the
version control system will give you), you also need to record details
of the build tools used (e.g. using a different version of Cython will
generate different code, and hence different "source" archives), and
decide what to do with any timestamps on the *output* artifacts you
generate (e.g. you may decide to force them to match the commit date
from the VCS).

So saying "SOURCE_DATE_EPOCH will be set to the VCS commit date when
creating an sdist" would be a reasonable thing for an sdist creation
tool to decide to do, and combined with something like `Pipfile.lock`
in `pipenv`, or a `dev-requirements.txt` with fully pinned versions,
*would* go a long way towards giving you reproducible sdist archives.

However, it's not a problem to be solved by adding anything to the
produced sdist: it's a property of the publishing tools that create
sdists to aim to ensure that given the same inputs, on a different
machine, at a different time, you will nevertheless still get the same
result.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Reproducible builds (Sdist)

2017-09-29 Thread Jakub Wilk

* Matthias Bussonnier , 2017-09-29, 11:16:

I'm interested in the reproducible build of an _sdist_.
That is to say the process of going from a given commit to the 
corresponding TGZ file. It is my understanding that setting 
SOURCE_DATE_EPOCH (SDE for short) should allow a reproducible building 
of an Sdist;


It not enough to normalize timestamps. You need to normalize permissions 
and ownership, too.


(I'm using https://pypi.python.org/pypi/distutils644 for normalizing 
permissions/ownership in my own packages.)


I cannot seem to be able to do that without unpacking and repacking the 
tgz myself;


Yeah, I don't believe distutils honors SOURCE_DATE_EPOCH at the 
moment.



Second; is there a convention to store the SDE value ?


In the changelog.

--
Jakub Wilk
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Reproducible builds (Sdist)

2017-09-29 Thread Matthias Bussonnier
> Does it make sense to add a new optional metadata field to store the
> value of SOURCE_DATE_EPOCH if it's set when a distribution is built? I
> guess it could cause problems if unpacking & repacking a tarball means
> that its metadata is no longer accurate, though.

That make sens – and that would be useful, but then that mean you need
to have the sdist to reproduce the sdist...
I was more thinking of a location in the source-tree/commit; for
example in pyproject.toml's tool section.
So if I give you only that you can tell me "When I build the sdist I
get this sha256", and I can do the same independently.

-- 
M

On Fri, Sep 29, 2017 at 1:02 PM, Thomas Kluyver  wrote:
> On Fri, Sep 29, 2017, at 07:16 PM, Matthias Bussonnier wrote:
>> Second; is there a convention to store the SDE value ? I don't seem to
>> be able to find one. It is nice to have reproducible build; but if
>> it's a pain for reproducers to find the SDE value that highly decrease
>> the value of SDE build.
>
> Does it make sense to add a new optional metadata field to store the
> value of SOURCE_DATE_EPOCH if it's set when a distribution is built? I
> guess it could cause problems if unpacking & repacking a tarball means
> that its metadata is no longer accurate, though.
>
> Thomas
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Reproducible builds (Sdist)

2017-09-29 Thread Thomas Kluyver
On Fri, Sep 29, 2017, at 07:16 PM, Matthias Bussonnier wrote:
> Second; is there a convention to store the SDE value ? I don't seem to
> be able to find one. It is nice to have reproducible build; but if
> it's a pain for reproducers to find the SDE value that highly decrease
> the value of SDE build.

Does it make sense to add a new optional metadata field to store the
value of SOURCE_DATE_EPOCH if it's set when a distribution is built? I
guess it could cause problems if unpacking & repacking a tarball means
that its metadata is no longer accurate, though.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] Reproducible builds (Sdist)

2017-09-29 Thread Matthias Bussonnier
Hello there,

I'm going to ask questions about Reproducible Builds, a previous
thread have been started in March[1], but does not cover some of the
questions I have.

In particular I'm interested in the reproducible build of an _sdist_.
That is to say the process of going from a given commit to the
corresponding TGZ file. It is my understanding that setting
SOURCE_DATE_EPOCH (SDE for short) should allow a reproducible building
of an Sdist;
And by reproducible I mean that the tgz itself is the same byte for
byte;  (the unpacked-content being the same is a weaker form I'm less
interested in).
Is this assumption correct?

In particular I cannot seem to be able to do that without unpacking
and repacking the tgz myself; because the copy_tree-taring and the
gziping by default embed the current timestamp of when these functions
were ran. Am I missing something ?

Second; is there a convention to store the SDE value ? I don't seem to
be able to find one. It is nice to have reproducible build; but if
it's a pain for reproducers to find the SDE value that highly decrease
the value of SDE build.

Also congrats for pep 517 and thanks for everyone who participated;

Thanks
-- 
Matthias

1: https://mail.python.org/pipermail/distutils-sig/2017-March/030284.html
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] reproducible builds

2017-03-22 Thread Robin Becker

On 21/03/2017 16:52, Brett Cannon wrote:

On Tue, 21 Mar 2017 at 04:54 Marius Gedminas  wrote:

.


Python 3.6 changed the dict implementation so the ordering is always stable
(and matches insertion order).



Do realize that is an implementation detail and not guaranteed by the
language specification, so it won't necessarily hold in the future or for
other interpreters.

-Brett


one of the main issues in the reportlab pdf variability are the dict objects 
which come out as


<<
/Key1 value
.
/Key n
>>

I think we have these coming out in sorted order without reliance on the 
underlying dicts. Up to now we used pixel equality ie the appearance, but as I 
understand it, reproducibility means byte equality which is harder. A bit of 
work has been done making the variation between Python 2.7 & 3.6 renderings go 
away. This reproducibility effort has revealed several bugs which is in itself 
useful.

--
Robin Becker
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] reproducible builds

2017-03-21 Thread Brett Cannon
On Tue, 21 Mar 2017 at 04:54 Marius Gedminas  wrote:

> On Mon, Mar 20, 2017 at 11:30:59AM +, Robin Becker wrote:
> > thanks for this; it seems the emphasis is on security. If the intent is
> that
> > reportlab should be able to reliably reproduce the same binary output
> then I
> > think I need to do more than just fix a couple of dates. We use many
> > dictionary like objects to produce PDF and I am not sure all are sorted
> by
> > key during output.
>
> I'm sure the reproducible builds folks will send you patches if they
> find any spots that you missed.  ;-)
>
> > Is there a way to excite dictionary ordering changes? I believe there was
> > some way to modify the hashing introduced when the dos dictionary attacks
> > were an issue. Would it be sufficient to generate documents with say
> Python
> > 2.7 and check against 3.6?
>
> Python 3.6 changed the dict implementation so the ordering is always stable
> (and matches insertion order).
>

Do realize that is an implementation detail and not guaranteed by the
language specification, so it won't necessarily hold in the future or for
other interpreters.

-Brett
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] reproducible builds

2017-03-21 Thread Robin Becker

On 21/03/2017 11:46, Marius Gedminas wrote:

On Mon, Mar 20, 2017 at 11:30:59AM +, Robin Becker wrote:

.


I'm sure the reproducible builds folks will send you patches if they
find any spots that you missed.  ;-)


Is there a way to excite dictionary ordering changes? I believe there was
some way to modify the hashing introduced when the dos dictionary attacks
were an issue. Would it be sufficient to generate documents with say Python
2.7 and check against 3.6?


Python 3.6 changed the dict implementation so the ordering is always stable
(and matches insertion order).

You'll want to test with Python 3.5, which perturbs the dict ordering
randomly, as a side effect of the randomized string/bytes hashes (unless
you fix it by setting the PYTHONHASHSEED environment variable[*])

  [*] https://docs.python.org/3.3/using/cmdline.html#envvar-PYTHONHASHSEED

...
thanks for this Marius; having started on the reproducibility trail I find the 
python 3.x output has more mismatches than I like ('cos of missed bugs).

--
Robin Becker
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] reproducible builds

2017-03-21 Thread Marius Gedminas
On Mon, Mar 20, 2017 at 11:30:59AM +, Robin Becker wrote:
> thanks for this; it seems the emphasis is on security. If the intent is that
> reportlab should be able to reliably reproduce the same binary output then I
> think I need to do more than just fix a couple of dates. We use many
> dictionary like objects to produce PDF and I am not sure all are sorted by
> key during output.

I'm sure the reproducible builds folks will send you patches if they
find any spots that you missed.  ;-)

> Is there a way to excite dictionary ordering changes? I believe there was
> some way to modify the hashing introduced when the dos dictionary attacks
> were an issue. Would it be sufficient to generate documents with say Python
> 2.7 and check against 3.6?

Python 3.6 changed the dict implementation so the ordering is always stable
(and matches insertion order).

You'll want to test with Python 3.5, which perturbs the dict ordering
randomly, as a side effect of the randomized string/bytes hashes (unless
you fix it by setting the PYTHONHASHSEED environment variable[*])

  [*] https://docs.python.org/3.3/using/cmdline.html#envvar-PYTHONHASHSEED

Regards,
Marius Gedminas
-- 
Yes, always begin work on inherited code by removing comments. Even if they
were maintained (they are not) they are natural language written by engineers
who cannot be understood ordering coffee in a diner. Getting back to comments
not being maintained, my saying on that one is, "Comments do not run."
-- Kenny Tilton


signature.asc
Description: PGP signature
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] reproducible builds

2017-03-20 Thread Nick Coghlan
On 20 March 2017 at 23:34, Thomas Kluyver  wrote:

> On Mon, Mar 20, 2017, at 01:02 PM, Robin Becker wrote:
> > I guess the algorithm variation across pythons would make dictionary
> order quite variable.
>
> For a Python based tool, I think it's reasonable that reproducing a
> build requires running with the same version of Python.
>
> The requirement would be that, with enough information about the build
> environment, you *can* produce an identical PDF. It needn't (AFAIK) be
> identical every time anyone builds it.
>

Right, one of the other aspects of reproducible-builds is looking into ways
to define and distribute build environments in addition to the application
source code: https://reproducible-builds.org/docs/definition-strategies/

Within a given binary context (e.g. Debian packages), that may be a text
description, like Debian's buildinfo files:
https://wiki.debian.org/ReproducibleBuilds/BuildinfoFiles

For Fedora/RHEL/CentOS, the equivalent would probably be to extract a
suitable config from the build system:
https://fedoraproject.org/wiki/Using_the_Koji_build_system#Using_koji_to_generate_a_mock_config_to_replicate_a_buildroot

In other cases, the build environment may itself by a binary artifact (e.g.
the manylinux1 container images, or the "Holy Build Box" machine images).

Fully eliminating non-determinism usually does requiring switching to
explicit sorting and ordered containers in build tools and scripts, as
otherwise even things like directory listings or JSON serialisation can
introduce variations in output when a build is run on a different machine.
The reproducible-builds project offers some interesting tools to identify
and analyse cases of non-reproducible outputs:
https://reproducible-builds.org/tools/

However, nobody can reasonably expect arbitrary upstream projects
(especially volunteer run ones) to be going out and pre-emptively solving
that kind of problem - the most it's realistic to aim for is to encourage
projects to be accommodating when upstream changes are proposed to
introduce more determinism into the build processes for particular
projects, as well as into the artifact generation process for tools that
may be used as part of the build process for other projects. (And I agree
with Thomas that it's likely the latter case that applies for
reportlab-generated PDFs)

Cheers,
Nick.

P.S. Prompted by Gary Berhnhardt, one of the ways I've started thinking
about the whole question of "built artifacts" in general is as a complex
distributed caching problem, with reproducible builds being a way of
ensuring that it's possible to check the validity of particular cache
entries

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] reproducible builds

2017-03-20 Thread Thomas Kluyver
On Mon, Mar 20, 2017, at 01:02 PM, Robin Becker wrote:
> Well now I am confused. The date / times mentioned in the debian patch
> are those 
> we force into the documents produced by the reportlab package when it is
> used.
> 
> They would not normally be part of the package itself. Although the
> reportlab 
> documentation is available in the source I'm fairly sure we don't include
> it in 
> the wheels.

I'm guessing, but I imagine that Debian may be using reportlab in the
builds of other packages, to build documentation. It's normal for Debian
packages to include built docs, unlike wheels. So they would want it to
create PDFs reproducibly, but the PDFs generated in your test suite
probably don't matter.

> I guess the algorithm variation across pythons would make dictionary order 
> quite variable.

For a Python based tool, I think it's reasonable that reproducing a
build requires running with the same version of Python.

The requirement would be that, with enough information about the build
environment, you *can* produce an identical PDF. It needn't (AFAIK) be
identical every time anyone builds it.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] reproducible builds

2017-03-20 Thread Robin Becker

On 20/03/2017 11:35, Thomas Kluyver wrote:

On Mon, Mar 20, 2017, at 09:00 AM, Robin Becker wrote:

Obviously if I have the ability to embed  repr(some_object)
into the document output then it will vary (unless the underlying python
is reproducible). I'm not sure if debian runs the whole reportlab test
suite, but it makes sense to get this kind of variablity out.


AIUI, it's fine to have the *ability* to produce non-deterministic
output, and it doesn't matter if your tests do that. The aim of
reproducible builds is to be able to go from the same source code to an
identical binary package. Documents generated by running the tests are
presumably not included in binary packages, so it doesn't matter if they
change.



Well now I am confused. The date / times mentioned in the debian patch are those 
we force into the documents produced by the reportlab package when it is used.


They would not normally be part of the package itself. Although the reportlab 
documentation is available in the source I'm fairly sure we don't include it in 
the wheels.


Of course if the debian packaging includes output created by reportlab then that 
document would receive the current (ie variable) time. In addition any random 
behaviour created by the reportlab generation code would also be embedded in the 
document.


If the debian variable is intended create reproducible PDF as part of their 
packaging of reportlab or some other package then I'm fairly sure that other 
variation will need to be checked in addition to the control that the 
SOURCE_DATE_EPOCH variable would give. Perhaps Matthias could comment; I know 
little about how the debian packaging works.



 I believe there was some way to modify the hashing introduced when the dos 
dictionary attacks were an issue.


The PYTHONHASHSEED environment variable:
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED

If you have non-determinism introduced by Python hashing, setting a
constant value of PYTHONHASHSEED should be an easy way to work around
it.



Well years ago we tried to get some random behaviour in text selection by 
setting a seed value eg 23..22 (but that doesn't work across  pythons). I 
guess the algorithm variation across pythons would make dictionary order quite 
variable.




C:\Users\rptlab>\python27\python
Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:53:40) [MSC v.1500 64 bit 
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

import random
random.seed(23..22)
from random import randint, choice
randint(10,25)

15






C:\Users\rptlab>\python36\python
Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit 
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

import random
random.seed(23..22)
from random import randint, choice
randint(10,25)

21




--
Robin Becker
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] reproducible builds

2017-03-20 Thread Freddy Rietdijk
As Thomas mentioned PYTHONHASHSEED is sufficient to solve non-determinism
by the hashing. In my experience this hashing, along with datetimes (e.g.
in the bytecode) are typically the only causes of non-determinism in Python
packages.

Someone from I think Debian did mention [1] that they cannot always set
PYTHONHASHSEED and so in certain cases they apply patches to fix
non-determinism. This is what they might be after in the case of
`reportlab` but you best ask them.

I'm not yet sure what to think of that patching approach. E.g., if one
couldn't set PYTHONHASHSEED when building the bytecode in the interpreter
itself, then one would have to convert all sets to lists with potential
negative performance effects.

On Mon, Mar 20, 2017 at 12:35 PM, Thomas Kluyver 
wrote:

> On Mon, Mar 20, 2017, at 09:00 AM, Robin Becker wrote:
> > Obviously if I have the ability to embed  repr(some_object)
> > into the document output then it will vary (unless the underlying python
> > is reproducible). I'm not sure if debian runs the whole reportlab test
> > suite, but it makes sense to get this kind of variablity out.
>
> AIUI, it's fine to have the *ability* to produce non-deterministic
> output, and it doesn't matter if your tests do that. The aim of
> reproducible builds is to be able to go from the same source code to an
> identical binary package. Documents generated by running the tests are
> presumably not included in binary packages, so it doesn't matter if they
> change.
>
> >  I believe there was some way to modify the hashing introduced when the
> dos dictionary attacks were an issue.
>
> The PYTHONHASHSEED environment variable:
> https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED
>
> If you have non-determinism introduced by Python hashing, setting a
> constant value of PYTHONHASHSEED should be an easy way to work around
> it.
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] reproducible builds

2017-03-20 Thread Thomas Kluyver
On Mon, Mar 20, 2017, at 09:00 AM, Robin Becker wrote:
> Obviously if I have the ability to embed  repr(some_object) 
> into the document output then it will vary (unless the underlying python
> is reproducible). I'm not sure if debian runs the whole reportlab test
> suite, but it makes sense to get this kind of variablity out.

AIUI, it's fine to have the *ability* to produce non-deterministic
output, and it doesn't matter if your tests do that. The aim of
reproducible builds is to be able to go from the same source code to an
identical binary package. Documents generated by running the tests are
presumably not included in binary packages, so it doesn't matter if they
change.

>  I believe there was some way to modify the hashing introduced when the dos 
> dictionary attacks were an issue. 

The PYTHONHASHSEED environment variable:
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED

If you have non-determinism introduced by Python hashing, setting a
constant value of PYTHONHASHSEED should be an easy way to work around
it.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] reproducible builds

2017-03-20 Thread Robin Becker

On 18/03/2017 07:20, Nick Coghlan wrote:
...




While the reproducible builds effort started in Debian and is furthest
advanced there, it's not distro specific - interested developers working on
other distros were already looking into it, and the Core Infrastructure
Initiative has backed it as one of their security assurance initiatives.
Software Freedom Conservancy have a decent write-up on the current state of
things after December's Reproducible Builds Summit:
https://sfconservancy.org/blog/2016/dec/26/reproducible-builds-summit-report/
thanks for this; it seems the emphasis is on security. If the intent is that 
reportlab should be able to reliably reproduce the same binary output then I 
think I need to do more than just fix a couple of dates. We use many dictionary 
like objects to produce PDF and I am not sure all are sorted by key during output.


Is there a way to excite dictionary ordering changes? I believe there was some 
way to modify the hashing introduced when the dos dictionary attacks were an 
issue. Would it be sufficient to generate documents with say Python 2.7 and 
check against 3.6?




However, you'll probably want to make yourself a helper function that uses
SOURCE_DATE_EPOCH if defined, and falls back to the current time otherwise.
That way you'll get reproducible behaviour when a build system configures
the setting, while retaining your current behaviour for environments that
don't.


good advice and that's what I am doing.




Cheers,
Nick.

P.S. A question well worth asking for *us* is whether or not setting
SOURCE_DATE_EPOCH appropriately (if it isn't already set in the current
environment) should be part of the build system abstraction PEPs.




--
Robin Becker
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] reproducible builds

2017-03-20 Thread Robin Becker

On 17/03/2017 17:49, David Wilson wrote:

Hey Robin,


What happens if other distros decide not to use this environment variable?
Do I really want distro specific code in the package?


AFAIK this is seeing a great deal of use outside of Debian and even
Linux, for instance GCC also supports this variable.



In short where does the distro responsibility and package maintainers
boundary need to be?


I guess it mostly comes down to whether you'd like them to carry the
debt of a vendor patch to implement the behaviour for you in a way you
don't like, or you'd prefer to retain full control. :)  So it's more a
preference than a responsibility.


David
.

I think I accept the need to support this variable. Our original use case was 
for testing purposes where we altered dates injected into the produced pdf meta 
data and also in some cases the content.


However, if that is the implied intent of the debian variable then I will also 
need to modify the behaviour of some other tests eg in one case the produced pdf 
output looks like this




The value of i is not larger than 3
The value of i is equal to 3
The value of i is not less than 3
The value of i is 3
The value of i is 2
The value of i is 1
{'doc': , 'currentFrame': 'normal', 'currentPageTemplate': 'First', 
'aW':
439.27559055118104, 'aH': 685.8897637795275, 'aWH': (439.27559055118104,
685.8897637795275), 'i': 0, 'availableWidth': 439.27559055118104, 
'availableHeight':
619.8897637795275}
The current page number is 1


ie we are introspecting internals and injecting that into the document content. 
I imagine I need to clean up the reporting to avoid getting addresses etc etc 
into the documents. Obviously if I have the ability to embed repr(some_object) 
into the document output then it will vary (unless the underlying python is 
reproducible). I'm not sure if debian runs the whole reportlab test suite, but 
it makes sense to get this kind of variablity out.


When we make significant changes to existing behaviours our current workflow 
consists of generating a large number of outputs and then rendering them into 
jpeg pages with ghost script. Differences in the jpegs can be used to spot problems.

--
Robin Becker
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] reproducible builds

2017-03-18 Thread Donald Stufft

> On Mar 18, 2017, at 3:20 AM, Nick Coghlan  wrote:
> 
> P.S. A question well worth asking for *us* is whether or not setting 
> SOURCE_DATE_EPOCH appropriately (if it isn't already set in the current 
> environment) should be part of the build system abstraction PEPs. 
> 


If it’s getting standard use (and it sounds like it is), then I think it should 
yes.

—
Donald Stufft



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] reproducible builds

2017-03-18 Thread Nick Coghlan
On 18 March 2017 at 03:19, Robin Becker  wrote:

> An issue has been raised for reportlab to support a specific environment
> variable namely SOURCE_DATE_EPOCH. The intent is that we should get our
> time from this variable rather than time.localtime(time.time()) so that
> produced documents are more invariant.
>
> First off is this a reasonable request? The variable is defined by debian
> here https://reproducible-builds.org/specs/source-date-epoch/
>
> What happens if other distros decide not to use this environment variable?
> Do I really want distro specific code in the package?
>

While the reproducible builds effort started in Debian and is furthest
advanced there, it's not distro specific - interested developers working on
other distros were already looking into it, and the Core Infrastructure
Initiative has backed it as one of their security assurance initiatives.
Software Freedom Conservancy have a decent write-up on the current state of
things after December's Reproducible Builds Summit:
https://sfconservancy.org/blog/2016/dec/26/reproducible-builds-summit-report/

However, you'll probably want to make yourself a helper function that uses
SOURCE_DATE_EPOCH if defined, and falls back to the current time otherwise.
That way you'll get reproducible behaviour when a build system configures
the setting, while retaining your current behaviour for environments that
don't.

Cheers,
Nick.

P.S. A question well worth asking for *us* is whether or not setting
SOURCE_DATE_EPOCH appropriately (if it isn't already set in the current
environment) should be part of the build system abstraction PEPs.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] reproducible builds

2017-03-17 Thread David Wilson
Hey Robin,

> What happens if other distros decide not to use this environment variable?
> Do I really want distro specific code in the package?

AFAIK this is seeing a great deal of use outside of Debian and even
Linux, for instance GCC also supports this variable.


> In short where does the distro responsibility and package maintainers
> boundary need to be?

I guess it mostly comes down to whether you'd like them to carry the
debt of a vendor patch to implement the behaviour for you in a way you
don't like, or you'd prefer to retain full control. :)  So it's more a
preference than a responsibility.


David
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] reproducible builds

2017-03-17 Thread Freddy Rietdijk
Nixpkgs [1] uses SOURCE_DATE_EPOCH as well. We can reproducibly build the
Python interpreter (and packages with [2]).

[1] https://github.com/NixOS/nixpkgs
[2] https://bitbucket.org/pypa/wheel/pull-requests/77


On Fri, Mar 17, 2017 at 6:46 PM, Matthias Klose  wrote:

> On 17.03.2017 18:19, Robin Becker wrote:
> > An issue has been raised for reportlab to support a specific environment
> > variable namely SOURCE_DATE_EPOCH. The intent is that we should get our
> time
> > from this variable rather than time.localtime(time.time()) so that
> produced
> > documents are more invariant.
> >
> > First off is this a reasonable request? The variable is defined by
> debian here
> > https://reproducible-builds.org/specs/source-date-epoch/
> >
> > What happens if other distros decide not to use this environment
> variable? Do I
> > really want distro specific code in the package?
> >
> > In addition we already have our own mechanism for making the produced
> documents
> > invariant although it might require an extension to support externally
> specified
> > date & time as in the debian variable.
> >
> > In short where does the distro responsibility and package maintainers
> boundary
> > need to be?
>
> the reproducible-builds thing is not just a Debian thing, it's supported by
> other distros and upstream projects.
>
> Matthias
>
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] reproducible builds

2017-03-17 Thread Matthias Klose
On 17.03.2017 18:19, Robin Becker wrote:
> An issue has been raised for reportlab to support a specific environment
> variable namely SOURCE_DATE_EPOCH. The intent is that we should get our time
> from this variable rather than time.localtime(time.time()) so that produced
> documents are more invariant.
> 
> First off is this a reasonable request? The variable is defined by debian here
> https://reproducible-builds.org/specs/source-date-epoch/
> 
> What happens if other distros decide not to use this environment variable? Do 
> I
> really want distro specific code in the package?
> 
> In addition we already have our own mechanism for making the produced 
> documents
> invariant although it might require an extension to support externally 
> specified
> date & time as in the debian variable.
> 
> In short where does the distro responsibility and package maintainers boundary
> need to be?

the reproducible-builds thing is not just a Debian thing, it's supported by
other distros and upstream projects.

Matthias

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] reproducible builds

2017-03-17 Thread Thomas Kluyver
Flit already supports $SOURCE_DATE_EPOCH for building wheels.

I think the environment variable is a good idea: if it gets wide
support, you will be able to set a single thing to affect lots of
different build tools, rather than working out where you need to add
command line arguments to half a dozen different build steps.

Thomas

On Fri, Mar 17, 2017, at 05:33 PM, Matthias Bussonnier wrote:
> On Fri, Mar 17, 2017 at 10:19 AM, Robin Becker 
> wrote:
> > An issue has been raised for reportlab to support a specific environment
> > variable namely SOURCE_DATE_EPOCH. The intent is that we should get our time
> > from this variable rather than time.localtime(time.time()) so that produced
> > documents are more invariant.
> >
> > First off is this a reasonable request? The variable is defined by debian
> > here https://reproducible-builds.org/specs/source-date-epoch/
> >
> > What happens if other distros decide not to use this environment variable?
> > Do I really want distro specific code in the package?
> 
> For what it is worth, it seem like it will make its way into CPython as
> well:
> https://github.com/python/cpython/pull/296
> 
> And IFAICT, this env variable naming is already more than just debian.
> 
> -- 
> M
> 
> 
> >
> > In addition we already have our own mechanism for making the produced
> > documents
> > invariant although it might require an extension to support externally
> > specified date & time as in the debian variable.
> >
> > In short where does the distro responsibility and package maintainers
> > boundary need to be?
> > --
> > Robin Becker
> > ___
> > Distutils-SIG maillist  -  Distutils-SIG@python.org
> > https://mail.python.org/mailman/listinfo/distutils-sig
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] reproducible builds

2017-03-17 Thread Matthias Bussonnier
On Fri, Mar 17, 2017 at 10:19 AM, Robin Becker  wrote:
> An issue has been raised for reportlab to support a specific environment
> variable namely SOURCE_DATE_EPOCH. The intent is that we should get our time
> from this variable rather than time.localtime(time.time()) so that produced
> documents are more invariant.
>
> First off is this a reasonable request? The variable is defined by debian
> here https://reproducible-builds.org/specs/source-date-epoch/
>
> What happens if other distros decide not to use this environment variable?
> Do I really want distro specific code in the package?

For what it is worth, it seem like it will make its way into CPython as well:
https://github.com/python/cpython/pull/296

And IFAICT, this env variable naming is already more than just debian.

-- 
M


>
> In addition we already have our own mechanism for making the produced
> documents
> invariant although it might require an extension to support externally
> specified date & time as in the debian variable.
>
> In short where does the distro responsibility and package maintainers
> boundary need to be?
> --
> Robin Becker
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] reproducible builds

2017-03-17 Thread Robin Becker
An issue has been raised for reportlab to support a specific environment 
variable namely SOURCE_DATE_EPOCH. The intent is that we should get our time 
from this variable rather than time.localtime(time.time()) so that produced 
documents are more invariant.


First off is this a reasonable request? The variable is defined by debian here 
https://reproducible-builds.org/specs/source-date-epoch/


What happens if other distros decide not to use this environment variable? Do I 
really want distro specific code in the package?


In addition we already have our own mechanism for making the produced documents
invariant although it might require an extension to support externally specified 
date & time as in the debian variable.


In short where does the distro responsibility and package maintainers boundary 
need to be?

--
Robin Becker
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig