Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-20 Thread Brett Cannon
On Fri, 8 Sep 2017 at 10:53 Benjamin Peterson  wrote:

> Thank you all for the feedback. I've now updated the PEP to specify a
> 4-word pyc header with a bit field in every case.
>
> On Fri, Sep 8, 2017, at 09:43, Nick Coghlan wrote:
> > On 8 September 2017 at 07:55, Antoine Pitrou 
> wrote:
> > > On Fri, 8 Sep 2017 07:49:46 -0700
> > > Nick Coghlan  wrote:
> > >> > I'd rather a single magic number and a separate bitfield that tells
> > >> > what the header encodes exactly.  We don't *have* to fight for a
> tiny
> > >> > size reduction of pyc files.
> > >>
> > >> One of Benjamin's goals was for the existing timestamp-based pyc
> > >> format to remain completely unchanged, so we need some kind of marker
> > >> in the magic number to indicate whether the file is using the new
> > >> format or nor.
> > >
> > > I don't think that's a useful goal, as long as we bump the magic
> number.
> >
> > Yeah, we (me, Benjamin, Greg) discussed that here, and we agree -
> > there isn't actually any benefit to keeping the timestamp based pyc's
> > using the same layout, since the magic number is already going to
> > change anyway.
> >
> > Given that, I think your suggested 16 byte header layout would be a
> > good one: 4 byte magic number, 4 bytes reserved for format flags, 8
> > bytes with an interpretation that depends on the format flags.
> >
>

+1 from me!
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-08 Thread Benjamin Peterson
Thank you all for the feedback. I've now updated the PEP to specify a
4-word pyc header with a bit field in every case.

On Fri, Sep 8, 2017, at 09:43, Nick Coghlan wrote:
> On 8 September 2017 at 07:55, Antoine Pitrou  wrote:
> > On Fri, 8 Sep 2017 07:49:46 -0700
> > Nick Coghlan  wrote:
> >> > I'd rather a single magic number and a separate bitfield that tells
> >> > what the header encodes exactly.  We don't *have* to fight for a tiny
> >> > size reduction of pyc files.
> >>
> >> One of Benjamin's goals was for the existing timestamp-based pyc
> >> format to remain completely unchanged, so we need some kind of marker
> >> in the magic number to indicate whether the file is using the new
> >> format or nor.
> >
> > I don't think that's a useful goal, as long as we bump the magic number.
> 
> Yeah, we (me, Benjamin, Greg) discussed that here, and we agree -
> there isn't actually any benefit to keeping the timestamp based pyc's
> using the same layout, since the magic number is already going to
> change anyway.
> 
> Given that, I think your suggested 16 byte header layout would be a
> good one: 4 byte magic number, 4 bytes reserved for format flags, 8
> bytes with an interpretation that depends on the format flags.
> 
> Cheers,
> Nick.
> 
> -- 
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/benjamin%40python.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-08 Thread Nick Coghlan
On 8 September 2017 at 07:55, Antoine Pitrou  wrote:
> On Fri, 8 Sep 2017 07:49:46 -0700
> Nick Coghlan  wrote:
>> > I'd rather a single magic number and a separate bitfield that tells
>> > what the header encodes exactly.  We don't *have* to fight for a tiny
>> > size reduction of pyc files.
>>
>> One of Benjamin's goals was for the existing timestamp-based pyc
>> format to remain completely unchanged, so we need some kind of marker
>> in the magic number to indicate whether the file is using the new
>> format or nor.
>
> I don't think that's a useful goal, as long as we bump the magic number.

Yeah, we (me, Benjamin, Greg) discussed that here, and we agree -
there isn't actually any benefit to keeping the timestamp based pyc's
using the same layout, since the magic number is already going to
change anyway.

Given that, I think your suggested 16 byte header layout would be a
good one: 4 byte magic number, 4 bytes reserved for format flags, 8
bytes with an interpretation that depends on the format flags.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-08 Thread Antoine Pitrou
On Fri, 8 Sep 2017 07:49:46 -0700
Nick Coghlan  wrote:
> On 8 September 2017 at 03:04, Antoine Pitrou  wrote:
> > On Thu, 7 Sep 2017 18:47:20 -0700
> > Nick Coghlan  wrote:  
> >> However, I do wonder whether we could encode *all* the mode settings
> >> into the magic number, such that we did something like reserving the
> >> top 3 bits for format flags:
> >>
> >> * number & 0x1FFF -> the traditional magic number
> >> * number & 0x8000 -> timestamp or hash?
> >> * number & 0x4000 -> checked or not?
> >> * number & 0x2000 -> reserved for future format changes  
> >
> > I'd rather a single magic number and a separate bitfield that tells
> > what the header encodes exactly.  We don't *have* to fight for a tiny
> > size reduction of pyc files.  
> 
> One of Benjamin's goals was for the existing timestamp-based pyc
> format to remain completely unchanged, so we need some kind of marker
> in the magic number to indicate whether the file is using the new
> format or nor.

I don't think that's a useful goal, as long as we bump the magic number.

Note the header format was already changed in the past when we added a
"size" field beside the "timestamp" field, to resolve collisions due to
timestamp granularity.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-08 Thread Nick Coghlan
On 8 September 2017 at 03:04, Antoine Pitrou  wrote:
> On Thu, 7 Sep 2017 18:47:20 -0700
> Nick Coghlan  wrote:
>> However, I do wonder whether we could encode *all* the mode settings
>> into the magic number, such that we did something like reserving the
>> top 3 bits for format flags:
>>
>> * number & 0x1FFF -> the traditional magic number
>> * number & 0x8000 -> timestamp or hash?
>> * number & 0x4000 -> checked or not?
>> * number & 0x2000 -> reserved for future format changes
>
> I'd rather a single magic number and a separate bitfield that tells
> what the header encodes exactly.  We don't *have* to fight for a tiny
> size reduction of pyc files.

One of Benjamin's goals was for the existing timestamp-based pyc
format to remain completely unchanged, so we need some kind of marker
in the magic number to indicate whether the file is using the new
format or nor.

I'd also be fine with using a single bit for that, such that the only
bitmasking needed was:

* number & 0x8000 -> legacy format or new format?
* number & 0x7FFF -> the magic number itself

And any further flags would go in a separate field.

That's essentially what PEP 552 already suggests, the only adjustment
is the idea of specifically using the high order bit in the magic
number field to indicate the pyc format in use rather than leaving the
explanation of how the two magic numbers will differ unspecified.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-08 Thread Antoine Pitrou
On Thu, 7 Sep 2017 18:47:20 -0700
Nick Coghlan  wrote:
> However, I do wonder whether we could encode *all* the mode settings
> into the magic number, such that we did something like reserving the
> top 3 bits for format flags:
> 
> * number & 0x1FFF -> the traditional magic number
> * number & 0x8000 -> timestamp or hash?
> * number & 0x4000 -> checked or not?
> * number & 0x2000 -> reserved for future format changes

I'd rather a single magic number and a separate bitfield that tells
what the header encodes exactly.  We don't *have* to fight for a tiny
size reduction of pyc files.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-07 Thread Barry Warsaw
On Sep 7, 2017, at 16:58, Gregory P. Smith  wrote:

> Input from OS package distributors would be interesting.  Would they use this?

I suspect it won’t be that interesting to the Debian ecosystem, since we 
generate pyc files on package install.  We do that because we can support 
multiple versions of Python installed simultaneously and we don’t know which 
versions are installed on the target machine.  I suppose our stdlib package 
could ship pycs, but we don’t.

Reproducible builds may still be interesting in other situations though, such 
as CI machines, but then SOURCE_DATE_EPOCH is probably good enough.

-Barry




signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-07 Thread Benjamin Peterson


On Thu, Sep 7, 2017, at 16:58, Gregory P. Smith wrote:
> +1 on this PEP.

Thanks!

> Questions:
> 
> Input from OS package distributors would be interesting.  Would they use
> this?  Which way would it impact their startup time (loading the .py file
> vs just statting it.  does that even matter?  source files are often
> eventually loaded for linecache use in tracebacks anyways)?

I an anticipate distributors will use the mode where the pyc is simply
trusted and the source file isn't hashed. That would make the io
overhead identical to today.

> 
> Would they benefit from a pyc that can contain _both_ timestamp+length,
> and
> the source_hash?  if both were present, I assume that only one would be
> checked at startup.  i'm not sure what would make the decision of what to
> check.  one fails, check the other?  i personally do not have a use for
> this case so i'd omit the complexity without a demonstrated need.

Yeah, it could act as a multi-tiered cache key. I agree with your
conclusion to pass for now.

> 
> Something to also state in the PEP:
> 
> This is intentionally not a "secure" hash.  Security is explicitly a
> non-goal.

Added a sentence.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-07 Thread Nick Coghlan
On 7 September 2017 at 16:58, Gregory P. Smith  wrote:
> +1 on this PEP.
>
> The TL;DR summary of this PEP:
>   The pyc date+length metadata check was a convenient hack.  It still works
> well for many people and use cases, it isn't going away.
>   PEP 552 proposes a new alternate hack that relies on file contents instead
> of os and filesystem date metadata.
> Assumption: The hash function is significantly faster than re-parsing
> the source.  (guaranteed to be true)
>
> Questions:
>
> Input from OS package distributors would be interesting.  Would they use
> this?  Which way would it impact their startup time (loading the .py file vs
> just statting it.  does that even matter?  source files are often eventually
> loaded for linecache use in tracebacks anyways)?

Christian and I asked some of our security folks for their personal
wishlists recently, and one of the items that came up was "The
recompile is based on a timestamp. How do you know the pyc file on
disk really is related to the py file that is human readable? Can it
be based on a hash or something like that?"

This is a restating of the reproducible build use case: for a given
version of Python, a given source file should always give the same
source hash and marshaled code object, and once it does, it's easier
to do an independent compilation from the source file and check you
get the same answer.

While you can implement that for timestamp based formats by adjusting
input file metadata (and that's exactly what distros do with
_SOURCE_DATE_EPOCH), it's still pretty annoying, and not particularly
build cache friendly, since the same file in different source
artifacts may produce different build outputs.

> Would they benefit from a pyc that can contain _both_ timestamp+length, and
> the source_hash?  if both were present, I assume that only one would be
> checked at startup.  i'm not sure what would make the decision of what to
> check.  one fails, check the other?  i personally do not have a use for this
> case so i'd omit the complexity without a demonstrated need.

I don't see any way we'd benefit from having both items present.

However, I do wonder whether we could encode *all* the mode settings
into the magic number, such that we did something like reserving the
top 3 bits for format flags:

* number & 0x1FFF -> the traditional magic number
* number & 0x8000 -> timestamp or hash?
* number & 0x4000 -> checked or not?
* number & 0x2000 -> reserved for future format changes

By default we'd still produce the checked-timestamp format, but
managed build systems (including Linux distros) could opt-in to the
unchecked-hash format.

> Something to also state in the PEP:
>
> This is intentionally not a "secure" hash.  Security is explicitly a
> non-goal.

I don't think it's so much that security is a non-goal, as that the
(admittedly minor) security improvement comes from making it easier to
reproduce the expected machine-readable output from a given
human-readable input, rather than from the nature of the hashing
function used.

> Rationale behind my support:

+1 from me as well, for the reasons Greg gives (while Fedora doesn't
currently do any per-file build artifact caching, I hope we will in
the future, and output formats based on input artifact hashes will
make that much easier than formats based on input timestamps).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-07 Thread Gregory P. Smith
+1 on this PEP.

The TL;DR summary of this PEP:
  The pyc date+length metadata check was a convenient hack.  It still works
well for many people and use cases, it isn't going away.
  PEP 552 proposes a new alternate hack that relies on file contents
instead of os and filesystem date metadata.
Assumption: The hash function is significantly faster than re-parsing
the source.  (guaranteed to be true)

Questions:

Input from OS package distributors would be interesting.  Would they use
this?  Which way would it impact their startup time (loading the .py file
vs just statting it.  does that even matter?  source files are often
eventually loaded for linecache use in tracebacks anyways)?

Would they benefit from a pyc that can contain _both_ timestamp+length, and
the source_hash?  if both were present, I assume that only one would be
checked at startup.  i'm not sure what would make the decision of what to
check.  one fails, check the other?  i personally do not have a use for
this case so i'd omit the complexity without a demonstrated need.

Something to also state in the PEP:

This is intentionally not a "secure" hash.  Security is explicitly a
non-goal.

Rationale behind my support:

We use a superset of Bazel at Google (unsurprising) and have had to jump
through a lot of messy hoops to deal with timestamp metadata winding up in
output files vs deterministic builds.  What Benjamin describes here sounds
exactly like what we would want.

It allows deterministic builds in distributed build and cached operation
systems where timestamps are never going to be guaranteed.

It allows the check to work on filesystems which do not preserve timestamps.

Also importantly, it allows the check to be disabled via the check_source
bit.  Today we use a modified importer at work that skips checking
timestamps anyways as the way we ship applications where the entire set of
dependencies present is already guaranteed at build time to be correct and
being modified at runtime is not possible or not a concern. This PEP would
avoid the need for an extra importer or modified interpreter logic to make
this happen.

-G

On Thu, Sep 7, 2017 at 3:47 PM Benjamin Peterson 
wrote:

>
>
> On Thu, Sep 7, 2017, at 14:43, Guido van Rossum wrote:
> > On Thu, Sep 7, 2017 at 2:40 PM, Benjamin Peterson 
> > wrote:
> >
> > >
> > >
> > > On Thu, Sep 7, 2017, at 14:19, Guido van Rossum wrote:
> > > > Nice one.
> > > >
> > > > It would be nice to specify the various APIs needed as well.
> > >
> > > The compileall and py_compile ones?
> > >
> >
> > Yes, and the SipHash mod to specify the key you mentioned.
>
> Done.
>
> >
> > >
> > > > Why do you keep the mtime-based format as an option? (Maybe because
> it's
> > > > faster? Did you measure it?)
> > >
> > > I haven't actually measured anything, but stating a file will
> definitely
> > > be faster than reading it completely and hashing it. I suppose if the
> > > speed difference between timestamp-based and hash-based pycs turned out
> > > to be small we could feel good about dropping the timestamp format
> > > completely. However, that difference might be hard to determine
> > > definitely as I expect the speed hit will vary widely based on system
> > > parameters such as disk speed and page cache size.
> > >
> > > My goal in this PEP was to preserve the current pyc invalidation
> > > behavior, which works well today for many use cases, as the default.
> The
> > > hash-based pycs are reserved for distribution and other power use
> cases.
> > >
> >
> > OK, maybe you can clarify that a bit in the PEP.
>
> I've added a paragraph to the Rationale section.
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/greg%40krypto.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-07 Thread Benjamin Peterson


On Thu, Sep 7, 2017, at 14:43, Guido van Rossum wrote:
> On Thu, Sep 7, 2017 at 2:40 PM, Benjamin Peterson 
> wrote:
> 
> >
> >
> > On Thu, Sep 7, 2017, at 14:19, Guido van Rossum wrote:
> > > Nice one.
> > >
> > > It would be nice to specify the various APIs needed as well.
> >
> > The compileall and py_compile ones?
> >
> 
> Yes, and the SipHash mod to specify the key you mentioned.

Done.

> 
> >
> > > Why do you keep the mtime-based format as an option? (Maybe because it's
> > > faster? Did you measure it?)
> >
> > I haven't actually measured anything, but stating a file will definitely
> > be faster than reading it completely and hashing it. I suppose if the
> > speed difference between timestamp-based and hash-based pycs turned out
> > to be small we could feel good about dropping the timestamp format
> > completely. However, that difference might be hard to determine
> > definitely as I expect the speed hit will vary widely based on system
> > parameters such as disk speed and page cache size.
> >
> > My goal in this PEP was to preserve the current pyc invalidation
> > behavior, which works well today for many use cases, as the default. The
> > hash-based pycs are reserved for distribution and other power use cases.
> >
> 
> OK, maybe you can clarify that a bit in the PEP.

I've added a paragraph to the Rationale section.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-07 Thread Benjamin Peterson


On Thu, Sep 7, 2017, at 14:54, Antoine Pitrou wrote:
> On Thu, 07 Sep 2017 14:32:19 -0700
> Benjamin Peterson  wrote:
> > > 
> > > Not sure how common that situation is (certainly the source tree wasn't
> > > read-only when you checked it out or untar'ed it), but isn't it easily
> > > circumvented by copying the source tree before building?  
> > 
> > Well, yes, in these kind of "batch" build situations, copying is
> > probably fine. However, I want to be able to have pyc determinism even
> > when developing. Copying the entire source every time I change something
> > isn't a nice.
> 
> Hmm... Are you developing from a read-only source tree?

No, but the build system is building from one (at least conceptually).

> 
> > The larger point is that while the SOURCE_EPOCH patch will likely work
> > for Linux distributions, I'm interested in being able to have
> > deterministic pycs in "normal" Python development workflows.
> 
> That's an interesting idea, but is there a concrete motivation or is it
> platonical?  After all, if you're changing something in the source tree
> it's expected that the overall "signature" of the build will be
> modified too.

Yes, I have used Bazel to build pycs. Having pycs be deterministic
allows interesting build system optimizations like Bazel distributed
caching to work well for Python.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-07 Thread Antoine Pitrou
On Thu, 07 Sep 2017 14:40:33 -0700
Benjamin Peterson  wrote:
> On Thu, Sep 7, 2017, at 14:19, Guido van Rossum wrote:
> > Nice one.
> > 
> > It would be nice to specify the various APIs needed as well.  
> 
> The compileall and py_compile ones?
> 
> > 
> > Why do you keep the mtime-based format as an option? (Maybe because it's
> > faster? Did you measure it?)  
> 
> I haven't actually measured anything, but stating a file will definitely
> be faster than reading it completely and hashing it. I suppose if the
> speed difference between timestamp-based and hash-based pycs turned out
> to be small we could feel good about dropping the timestamp format
> completely. However, that difference might be hard to determine
> definitely as I expect the speed hit will vary widely based on system
> parameters such as disk speed and page cache size.

Also, while some/many of us have fast development machines with
performant SSDs, Python can be used in situations where "disk" I/O is
still slow (imagine a Raspberry Pi system or similar, grinding through
a SD card or USB key to load py and pyc files).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-07 Thread Antoine Pitrou
On Thu, 07 Sep 2017 14:32:19 -0700
Benjamin Peterson  wrote:
> > 
> > Not sure how common that situation is (certainly the source tree wasn't
> > read-only when you checked it out or untar'ed it), but isn't it easily
> > circumvented by copying the source tree before building?  
> 
> Well, yes, in these kind of "batch" build situations, copying is
> probably fine. However, I want to be able to have pyc determinism even
> when developing. Copying the entire source every time I change something
> isn't a nice.

Hmm... Are you developing from a read-only source tree?

> The larger point is that while the SOURCE_EPOCH patch will likely work
> for Linux distributions, I'm interested in being able to have
> deterministic pycs in "normal" Python development workflows.

That's an interesting idea, but is there a concrete motivation or is it
platonical?  After all, if you're changing something in the source tree
it's expected that the overall "signature" of the build will be
modified too.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-07 Thread Guido van Rossum
On Thu, Sep 7, 2017 at 2:40 PM, Benjamin Peterson 
wrote:

>
>
> On Thu, Sep 7, 2017, at 14:19, Guido van Rossum wrote:
> > Nice one.
> >
> > It would be nice to specify the various APIs needed as well.
>
> The compileall and py_compile ones?
>

Yes, and the SipHash mod to specify the key you mentioned.

>
> > Why do you keep the mtime-based format as an option? (Maybe because it's
> > faster? Did you measure it?)
>
> I haven't actually measured anything, but stating a file will definitely
> be faster than reading it completely and hashing it. I suppose if the
> speed difference between timestamp-based and hash-based pycs turned out
> to be small we could feel good about dropping the timestamp format
> completely. However, that difference might be hard to determine
> definitely as I expect the speed hit will vary widely based on system
> parameters such as disk speed and page cache size.
>
> My goal in this PEP was to preserve the current pyc invalidation
> behavior, which works well today for many use cases, as the default. The
> hash-based pycs are reserved for distribution and other power use cases.
>

OK, maybe you can clarify that a bit in the PEP.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-07 Thread Benjamin Peterson


On Thu, Sep 7, 2017, at 14:19, Guido van Rossum wrote:
> Nice one.
> 
> It would be nice to specify the various APIs needed as well.

The compileall and py_compile ones?

> 
> Why do you keep the mtime-based format as an option? (Maybe because it's
> faster? Did you measure it?)

I haven't actually measured anything, but stating a file will definitely
be faster than reading it completely and hashing it. I suppose if the
speed difference between timestamp-based and hash-based pycs turned out
to be small we could feel good about dropping the timestamp format
completely. However, that difference might be hard to determine
definitely as I expect the speed hit will vary widely based on system
parameters such as disk speed and page cache size.

My goal in this PEP was to preserve the current pyc invalidation
behavior, which works well today for many use cases, as the default. The
hash-based pycs are reserved for distribution and other power use cases.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-07 Thread Benjamin Peterson


On Thu, Sep 7, 2017, at 14:21, Antoine Pitrou wrote:
> On Thu, 07 Sep 2017 14:08:58 -0700
> Benjamin Peterson  wrote:
> > On Thu, Sep 7, 2017, at 14:00, Antoine Pitrou wrote:
> > > On Thu, 07 Sep 2017 13:39:21 -0700
> > > Benjamin Peterson  wrote:  
> > > > Hello,
> > > > I've written a short PEP about an import extension to allow pycs to be
> > > > more deterministic by optional replacing the timestamp with a hash of
> > > > the source file: https://www.python.org/dev/peps/pep-0552/  
> > > 
> > > Why isn't https://github.com/python/cpython/pull/296 a good enough
> > > solution to this problem?  It has a simple implementation, and requires
> > > neither maintaining two different pyc formats nor reading the entire
> > > source file to check whether the pyc file is up to date.  
> > 
> > The main objection to that model is that it requires modifying source
> > timestamps, which isn't possible for builds on read-only source trees.
> 
> Not sure how common that situation is (certainly the source tree wasn't
> read-only when you checked it out or untar'ed it), but isn't it easily
> circumvented by copying the source tree before building?

Well, yes, in these kind of "batch" build situations, copying is
probably fine. However, I want to be able to have pyc determinism even
when developing. Copying the entire source every time I change something
isn't a nice.

> 
> > This proposal also allows reproducible builds even if the files are
> > being modified in an edit-run-tests cycle.
> 
> I don't follow you here.  Could you elaborate?

If you require source timestamps to be fixed and deterministic, Python
won't notice when a file is modified.

The larger point is that while the SOURCE_EPOCH patch will likely work
for Linux distributions, I'm interested in being able to have
deterministic pycs in "normal" Python development workflows.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-07 Thread Benjamin Peterson


On Thu, Sep 7, 2017, at 14:19, Freddy Rietdijk wrote:
> > The main objection to that model is that it requires modifying source
> timestamps, which isn't possible for builds on read-only source trees.
> 
> Why not set the source timestamps of the source trees to say 1 first?

If the source-tree is readonly (because you don't want your build system
to modify source files on principal), you cannot do that.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-07 Thread Antoine Pitrou
On Thu, 07 Sep 2017 14:08:58 -0700
Benjamin Peterson  wrote:
> On Thu, Sep 7, 2017, at 14:00, Antoine Pitrou wrote:
> > On Thu, 07 Sep 2017 13:39:21 -0700
> > Benjamin Peterson  wrote:  
> > > Hello,
> > > I've written a short PEP about an import extension to allow pycs to be
> > > more deterministic by optional replacing the timestamp with a hash of
> > > the source file: https://www.python.org/dev/peps/pep-0552/  
> > 
> > Why isn't https://github.com/python/cpython/pull/296 a good enough
> > solution to this problem?  It has a simple implementation, and requires
> > neither maintaining two different pyc formats nor reading the entire
> > source file to check whether the pyc file is up to date.  
> 
> The main objection to that model is that it requires modifying source
> timestamps, which isn't possible for builds on read-only source trees.

Not sure how common that situation is (certainly the source tree wasn't
read-only when you checked it out or untar'ed it), but isn't it easily
circumvented by copying the source tree before building?

> This proposal also allows reproducible builds even if the files are
> being modified in an edit-run-tests cycle.

I don't follow you here.  Could you elaborate?

Thanks

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-07 Thread Guido van Rossum
Nice one.

It would be nice to specify the various APIs needed as well.

Why do you keep the mtime-based format as an option? (Maybe because it's
faster? Did you measure it?)


On Thu, Sep 7, 2017 at 1:39 PM, Benjamin Peterson 
wrote:

> Hello,
> I've written a short PEP about an import extension to allow pycs to be
> more deterministic by optional replacing the timestamp with a hash of
> the source file: https://www.python.org/dev/peps/pep-0552/
>
> Thanks for reading,
> Benjamin
>
> P.S. I came up with the idea for this PEP while awake.
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-07 Thread Freddy Rietdijk
> The main objection to that model is that it requires modifying source
timestamps, which isn't possible for builds on read-only source trees.

Why not set the source timestamps of the source trees to say 1 first?
That's what is done with the Nix package manager. The Python interpreter is
patched (mostly similar to the referred PR) and checks whether
SOURCE_DATE_EPOCH is set, and if so, sets the mtime to 1.

On Thu, Sep 7, 2017 at 11:08 PM, Benjamin Peterson 
wrote:

>
>
> On Thu, Sep 7, 2017, at 14:00, Antoine Pitrou wrote:
> > On Thu, 07 Sep 2017 13:39:21 -0700
> > Benjamin Peterson  wrote:
> > > Hello,
> > > I've written a short PEP about an import extension to allow pycs to be
> > > more deterministic by optional replacing the timestamp with a hash of
> > > the source file: https://www.python.org/dev/peps/pep-0552/
> >
> > Why isn't https://github.com/python/cpython/pull/296 a good enough
> > solution to this problem?  It has a simple implementation, and requires
> > neither maintaining two different pyc formats nor reading the entire
> > source file to check whether the pyc file is up to date.
>
> The main objection to that model is that it requires modifying source
> timestamps, which isn't possible for builds on read-only source trees.
> This proposal also allows reproducible builds even if the files are
> being modified in an edit-run-tests cycle.
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> freddyrietdijk%40fridh.nl
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-07 Thread Benjamin Peterson


On Thu, Sep 7, 2017, at 14:00, Antoine Pitrou wrote:
> On Thu, 07 Sep 2017 13:39:21 -0700
> Benjamin Peterson  wrote:
> > Hello,
> > I've written a short PEP about an import extension to allow pycs to be
> > more deterministic by optional replacing the timestamp with a hash of
> > the source file: https://www.python.org/dev/peps/pep-0552/
> 
> Why isn't https://github.com/python/cpython/pull/296 a good enough
> solution to this problem?  It has a simple implementation, and requires
> neither maintaining two different pyc formats nor reading the entire
> source file to check whether the pyc file is up to date.

The main objection to that model is that it requires modifying source
timestamps, which isn't possible for builds on read-only source trees.
This proposal also allows reproducible builds even if the files are
being modified in an edit-run-tests cycle.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 552: deterministic pycs

2017-09-07 Thread Antoine Pitrou
On Thu, 07 Sep 2017 13:39:21 -0700
Benjamin Peterson  wrote:
> Hello,
> I've written a short PEP about an import extension to allow pycs to be
> more deterministic by optional replacing the timestamp with a hash of
> the source file: https://www.python.org/dev/peps/pep-0552/

Why isn't https://github.com/python/cpython/pull/296 a good enough
solution to this problem?  It has a simple implementation, and requires
neither maintaining two different pyc formats nor reading the entire
source file to check whether the pyc file is up to date.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 552: deterministic pycs

2017-09-07 Thread Benjamin Peterson
Hello,
I've written a short PEP about an import extension to allow pycs to be
more deterministic by optional replacing the timestamp with a hash of
the source file: https://www.python.org/dev/peps/pep-0552/

Thanks for reading,
Benjamin

P.S. I came up with the idea for this PEP while awake.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com