Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Nick Coghlan
On 13 June 2017 at 08:55, Nathaniel Smith  wrote:
> On Mon, Jun 12, 2017 at 3:34 PM, Paul Moore  wrote:
>> But honestly, I think we're at the point where someone just needs to
>> make a decision - there's very little compelling evidence either way.
>
> I was the original PEP author, but Thomas has mostly taken it over at
> this point, so I'm not sure how much you should listen to me :-). But
> if you pointed at me and told me to make some decisions, then right
> now this is what I'd do:
>
> 1) Go back to having the wheel generation hook generate a .whl file
> Rationale: the optimization benefits of generating an unpacked wheel
> are unclear, but we know that reproducible builds are important, and
> filename encoding is tricky and important, and that having a common
> well-understood standard with tooling around it is important, and on
> those axes .whl unambiguously wins. And if there do later turn out to
> be compelling optimization benefits to generating unpacked wheels,
> then we can add an optional generate_unpacked_wheel hook in the
> future.

Despite being the one to originally propose standardisation on passing
directory paths around, I'm starting to lean back towards this
approach.

My rationale for this doesn't really have a lot to do with topics
we've discussed so far, and instead asks the question: what would work
best for an installation frontend that wanted to keep the actual build
tools off the system doing the installation, while still allowing for
transparent "from sdist" installations?

And that's where cross-platform archive formats really shine: since
they're incredibly convenient for network transfers, they make fewer
implicit assumptions about *where* the work is being done. Whereas if
we rely on directories in the baseline interface specification, we're
making a lot of assumptions about how all future frontends are going
to work.

It should also be relatively straightforward to add cross-platform
validators to frontends that check that archives are well-formed, but
it's harder to write such validators that work cross-platform on
arbitrary directories.

> 2) Specify that the wheel generation hook, metadata hook, and sdist
> hook return the name of path that they created as a unicode string.
> Rationale: fixes a point of ambiguity in the current spec.

And still leaves the door open to supporting multiple wheels in the
future by returning a list instead of string.

> 3) Go back to having the sdist generation hook generate an archive
> file instead of an unpacked directory
> Rationale: Essentially the same as for point (1). The additional
> complication here is that we know that pip plans to use this as part
> of its standard build-from-unpacked-source pipeline, so if we're
> trying to optimize this case for developers with their fast SSD drives
> etc. then the compress/decompress cycle actually does matter.

I think you also raise a good point in bringing up the "First make it
work, then make it right, then make it fast" priority order.

We're currently in the "make it right" phase, and I think you're
correct that that's much easier to achieve cross platform with archive
based data exchange - otherwise we run the risk of running into
surprises with NTFS, HDFS+, etc (let alone folks running builds over
NFS or CIFS).

So the core "make it right" API would be "build_sdist" and
"build_wheel" as mandatory backend hooks, with
"prepare_wheel_metadata" and "prepare_build_files" as optional "make
it faster" helpers.

We'd then reserve "prepare_sdist_content" and "prepare_wheel_content"
as possible future "make it faster" APIs, rather than adding them now.

> 4) Specify that the sdist archive file must be .tar.gz
> Rationale: we only need one and it reduces variation, and dstufft
> likes .tar.gz better than .zip.

+1

> 5) Drop the prepare_files hook
> Rationale: it's purpose is somewhat unclear at this point, and it
> gives up the main advantage of going through sdist (the guarantee that
> building from sdist and building from unpacked tree give the same
> results) while still having the main disadvantages (you have to copy
> everything and lose incremental rebuilds).

The main reason this made it into the PEP is that build_sdist may have
additional dependencies that prepare_build_files doesn't (e.g. Thomas
has indicated that flit needs to call out to VCS tools to build a full
sdist, but can export its own input files directly).

So while it's technically a "make it faster" hook rather than an
essential "make it right" hook, I think just smoothing out the
interaction between pip & flit specifically provides sufficient value
to make it worth including in the initial version of the
specification.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Nathaniel Smith
On Mon, Jun 12, 2017 at 4:41 PM, Donald Stufft  wrote:

>
> On Jun 12, 2017, at 7:18 PM, Nathaniel Smith  wrote:
>
> On Mon, Jun 12, 2017 at 3:49 PM, Donald Stufft  wrote:
>
>>
>> On Jun 12, 2017, at 6:36 PM, Nathaniel Smith  wrote
>>
>> Another point is that tools that you might have in your build pipeline
>> -- like auditwheel -- currently use wheel files as their interchange
>> format, so you might end up having to zip, run auditwheel, unzip for
>> pip, and the pip zips again to cache the wheel…
>>
>>
>> How is that different from today? In the hypothetical build_wheel
>> producing a zip file… you produce a zip file, run auditwheel which unzips
>> it, which presumably has to zip it back up again for pip, and then pip
>> unzips it again on every single install.
>>
>> If auditwheel doesn’t start to accept unzipped wheels, then nothing
>> changes, if it does then suddenly we skip some round trips through
>> zip/unzip and things get faster for everyone.
>>
>
> I would strongly prefer auditwheel not have to accept unzipped wheel or
> generate unzipped wheels, because that just multiples the number of cases
> that need to be supported, and as you've pointed out many times, more
> potential paths = more chances for bugs. So if you have auditwheel as the
> last step in your pipeline, that means that at the end of the build what
> you have is a zipped wheel. If pip accepts zipped wheels, then we can just
> hand this over and pip drops it in its cache and unzips it into the final
> location. If pip requires unpacked wheels, then first the backend has to
> unzip it, and then pip has to do something with the unpacked directory
> (either copy it file-by-file, or possibly even zip it up again to cache it).
>
>
> Unless audit wheel is calling this backend directly, or is trying to
> implement this API to be called by pip, then it never has to do that. This
> isn’t really meant to be an end user exposed UX, this is strictly for two
> tools to talk to each other. Thus auditwheel is free to continue to work as
> it does today and it can completely ignore this spec by just continuing to
> expect someone to invoke a command that builds a wheel first.
>

Yeah, I'm talking about the currently-hypothetical situation where the
build backend wants to call auditwheel as part of its build. Auditwheel's
current design as a secondary tool you run after the "real" build is
expedient, but it would be nice if someday build systems could generate
working wheels directly...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org 
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Donald Stufft

> On Jun 12, 2017, at 7:18 PM, Nathaniel Smith  wrote:
> 
> On Mon, Jun 12, 2017 at 3:49 PM, Donald Stufft  > wrote:
> 
>> On Jun 12, 2017, at 6:36 PM, Nathaniel Smith > > wrote
>> Another point is that tools that you might have in your build pipeline
>> -- like auditwheel -- currently use wheel files as their interchange
>> format, so you might end up having to zip, run auditwheel, unzip for
>> pip, and the pip zips again to cache the wheel…
> 
> How is that different from today? In the hypothetical build_wheel producing a 
> zip file… you produce a zip file, run auditwheel which unzips it, which 
> presumably has to zip it back up again for pip, and then pip unzips it again 
> on every single install.
> 
> If auditwheel doesn’t start to accept unzipped wheels, then nothing changes, 
> if it does then suddenly we skip some round trips through zip/unzip and 
> things get faster for everyone.
> 
> I would strongly prefer auditwheel not have to accept unzipped wheel or 
> generate unzipped wheels, because that just multiples the number of cases 
> that need to be supported, and as you've pointed out many times, more 
> potential paths = more chances for bugs. So if you have auditwheel as the 
> last step in your pipeline, that means that at the end of the build what you 
> have is a zipped wheel. If pip accepts zipped wheels, then we can just hand 
> this over and pip drops it in its cache and unzips it into the final 
> location. If pip requires unpacked wheels, then first the backend has to 
> unzip it, and then pip has to do something with the unpacked directory 
> (either copy it file-by-file, or possibly even zip it up again to cache it).


Unless audit wheel is calling this backend directly, or is trying to implement 
this API to be called by pip, then it never has to do that. This isn’t really 
meant to be an end user exposed UX, this is strictly for two tools to talk to 
each other. Thus auditwheel is free to continue to work as it does today and it 
can completely ignore this spec by just continuing to expect someone to invoke 
a command that builds a wheel first.


> 
>  
> 
>> 
>> The whole conversation feels a bit like we're falling into the
>> developer trap of "oo there's a thing that might be optimizable
>> therefore we MUST optimize it" without any real assessment of the
>> benefits (I'm as guilty as anyone!). It's not even clear to me that
>> copying a tree twice *is* faster than packing and then unpacking a
>> wheel in general – if your tree consists of lots of small files and
>> you're IO-bound, then the wheel version might well be faster. (E.g. on
>> an underprovisioned virtual server, especially if using spinning media
>> - while of course we're all benchmarking on laptops with fast SSD and
>> everything in cache :-).) And in any case, I'm generally very
>> skeptical of moving away from the well-specified wheel format that
>> already has lots of tooling and consensus around it towards anything
>> ad hoc, when AFAICT no-one has even identified this as an important
>> bottleneck.
>> 
> 
> I’ve measured that 50%-75% of the time taken by ``python setup.py 
> bdist_wheel`` + unzipping the resulting wheel can be eliminated for ``pip 
> install ./pip``.
> 
> Sure, but no-one noticed or cared about this until we started talking about 
> unpacked wheels for other reasons, and then we went hunting for benchmarks to 
> justify the idea :-). And even so your benchmark is a bit cherry-picked -- 
> that %age will go down if you include the 'setup.py sdist' / 'setup.py 
> unpacked_sdist' step that you want 'pip install ./pip' to do, and even more 
> so if you test on some system with a less robust IO layer than your fancy 
> developer laptop.


Well generating an unpacked sdist rather than a packed sdist saves roughly 50% 
of the time there too. I wouldn’t specifically say nobody cared, but rather the 
nature of things meant nobody was in a position to do anything about it until 
now. It’s not like any of the tooling provided a way to do it natively, so it 
wasn’t worth the cost of monkey patching.


—
Donald Stufft



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Nathaniel Smith
On Mon, Jun 12, 2017 at 3:49 PM, Donald Stufft  wrote:

>
> On Jun 12, 2017, at 6:36 PM, Nathaniel Smith  wrote
>
> Another point is that tools that you might have in your build pipeline
> -- like auditwheel -- currently use wheel files as their interchange
> format, so you might end up having to zip, run auditwheel, unzip for
> pip, and the pip zips again to cache the wheel…
>
>
> How is that different from today? In the hypothetical build_wheel
> producing a zip file… you produce a zip file, run auditwheel which unzips
> it, which presumably has to zip it back up again for pip, and then pip
> unzips it again on every single install.
>
> If auditwheel doesn’t start to accept unzipped wheels, then nothing
> changes, if it does then suddenly we skip some round trips through
> zip/unzip and things get faster for everyone.
>

I would strongly prefer auditwheel not have to accept unzipped wheel or
generate unzipped wheels, because that just multiples the number of cases
that need to be supported, and as you've pointed out many times, more
potential paths = more chances for bugs. So if you have auditwheel as the
last step in your pipeline, that means that at the end of the build what
you have is a zipped wheel. If pip accepts zipped wheels, then we can just
hand this over and pip drops it in its cache and unzips it into the final
location. If pip requires unpacked wheels, then first the backend has to
unzip it, and then pip has to do something with the unpacked directory
(either copy it file-by-file, or possibly even zip it up again to cache it).



>
>
> The whole conversation feels a bit like we're falling into the
> developer trap of "oo there's a thing that might be optimizable
> therefore we MUST optimize it" without any real assessment of the
> benefits (I'm as guilty as anyone!). It's not even clear to me that
> copying a tree twice *is* faster than packing and then unpacking a
> wheel in general – if your tree consists of lots of small files and
> you're IO-bound, then the wheel version might well be faster. (E.g. on
> an underprovisioned virtual server, especially if using spinning media
> - while of course we're all benchmarking on laptops with fast SSD and
> everything in cache :-).) And in any case, I'm generally very
> skeptical of moving away from the well-specified wheel format that
> already has lots of tooling and consensus around it towards anything
> ad hoc, when AFAICT no-one has even identified this as an important
> bottleneck.
>
>
> I’ve measured that 50%-75% of the time taken by ``python setup.py
> bdist_wheel`` + unzipping the resulting wheel can be eliminated for ``pip
> install ./pip``.
>

Sure, but no-one noticed or cared about this until we started talking about
unpacked wheels for other reasons, and then we went hunting for benchmarks
to justify the idea :-). And even so your benchmark is a bit cherry-picked
-- that %age will go down if you include the 'setup.py sdist' / 'setup.py
unpacked_sdist' step that you want 'pip install ./pip' to do, and even more
so if you test on some system with a less robust IO layer than your fancy
developer laptop.

(I heard a rumor recently that the reason Travis-CI's MacOS builds are so
terribly behind all the time is that their hosting provider has plenty of
CPU but their SAN is at its absolute limit in terms of IOPS, so they can't
add any more capacity. Packed wheels are much friendlier than unpacked ones
when it comes to IOPS...)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org 
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Nathaniel Smith
On Mon, Jun 12, 2017 at 3:34 PM, Paul Moore  wrote:
> But honestly, I think we're at the point where someone just needs to
> make a decision - there's very little compelling evidence either way.

I was the original PEP author, but Thomas has mostly taken it over at
this point, so I'm not sure how much you should listen to me :-). But
if you pointed at me and told me to make some decisions, then right
now this is what I'd do:

1) Go back to having the wheel generation hook generate a .whl file
Rationale: the optimization benefits of generating an unpacked wheel
are unclear, but we know that reproducible builds are important, and
filename encoding is tricky and important, and that having a common
well-understood standard with tooling around it is important, and on
those axes .whl unambiguously wins. And if there do later turn out to
be compelling optimization benefits to generating unpacked wheels,
then we can add an optional generate_unpacked_wheel hook in the
future.

2) Specify that the wheel generation hook, metadata hook, and sdist
hook return the name of path that they created as a unicode string.
Rationale: fixes a point of ambiguity in the current spec.

3) Go back to having the sdist generation hook generate an archive
file instead of an unpacked directory
Rationale: Essentially the same as for point (1). The additional
complication here is that we know that pip plans to use this as part
of its standard build-from-unpacked-source pipeline, so if we're
trying to optimize this case for developers with their fast SSD drives
etc. then the compress/decompress cycle actually does matter.
However... from discussion upthread it sounds like the decision was
that for developers doing repeated rebuilds, plain 'pip install .' is
just not the tool to use; they should be using something like develop
installs (which make re-installation instantaneous), or
backend-specific mechanisms (that can do incremental builds). These
give an order of magnitude improvement over even the 'optimized'
version of 'pip install .' where the backend generates an unpacked
tree, and "special cases aren't special enough to break the rules".

4) Specify that the sdist archive file must be .tar.gz
Rationale: we only need one and it reduces variation, and dstufft
likes .tar.gz better than .zip.

5) Drop the prepare_files hook
Rationale: it's purpose is somewhat unclear at this point, and it
gives up the main advantage of going through sdist (the guarantee that
building from sdist and building from unpacked tree give the same
results) while still having the main disadvantages (you have to copy
everything and lose incremental rebuilds).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Donald Stufft

> On Jun 12, 2017, at 6:36 PM, Nathaniel Smith  wrote:
> 
> On Mon, Jun 12, 2017 at 1:57 PM, Thomas Kluyver  wrote:
>> On Mon, Jun 12, 2017, at 09:45 PM, Daniel Holth wrote:
>> 
>> I think all my wheel generators except bdist_wheel build the zipfile
>> directly.
>> 
>> 
>> There is a certain appeal to using the zipped .whl file as the canonical
>> format for all tools that produce or consume wheels, rather than defining a
>> closely related but distinct 'unpacked wheel' format. A directory and a zip
>> file do not have 100% identical features (filename encodings may differ,
>> entries in a zip file are ordered, there may be metadata in one format
>> that's not present in the other, and so on).
> 
> I find the reproducible builds argument to be a pretty compelling
> argument for generating wheels directly. (It also applies to sdists.)

We’re not preventing backends from having a stand alone tool that produces 
reproducible wheels if they’re able/willing to do that.


> Another point is that tools that you might have in your build pipeline
> -- like auditwheel -- currently use wheel files as their interchange
> format, so you might end up having to zip, run auditwheel, unzip for
> pip, and the pip zips again to cache the wheel…

How is that different from today? In the hypothetical build_wheel producing a 
zip file… you produce a zip file, run auditwheel which unzips it, which 
presumably has to zip it back up again for pip, and then pip unzips it again on 
every single install.

If auditwheel doesn’t start to accept unzipped wheels, then nothing changes, if 
it does then suddenly we skip some round trips through zip/unzip and things get 
faster for everyone.


> 
> The whole conversation feels a bit like we're falling into the
> developer trap of "oo there's a thing that might be optimizable
> therefore we MUST optimize it" without any real assessment of the
> benefits (I'm as guilty as anyone!). It's not even clear to me that
> copying a tree twice *is* faster than packing and then unpacking a
> wheel in general – if your tree consists of lots of small files and
> you're IO-bound, then the wheel version might well be faster. (E.g. on
> an underprovisioned virtual server, especially if using spinning media
> - while of course we're all benchmarking on laptops with fast SSD and
> everything in cache :-).) And in any case, I'm generally very
> skeptical of moving away from the well-specified wheel format that
> already has lots of tooling and consensus around it towards anything
> ad hoc, when AFAICT no-one has even identified this as an important
> bottleneck.
> 

I’ve measured that 50%-75% of the time taken by ``python setup.py bdist_wheel`` 
+ unzipping the resulting wheel can be eliminated for ``pip install ./pip``.


—
Donald Stufft



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Donald Stufft

> On Jun 12, 2017, at 6:34 PM, Paul Moore  wrote:
> 
> On 12 June 2017 at 21:57, Thomas Kluyver  wrote:
>> There is a certain appeal to using the zipped .whl file as the canonical
>> format for all tools that produce or consume wheels, rather than defining a
>> closely related but distinct 'unpacked wheel' format. A directory and a zip
>> file do not have 100% identical features (filename encodings may differ,
>> entries in a zip file are ordered, there may be metadata in one format
>> that's not present in the other, and so on).
> 
> This is a reasonable point. As I understand it, zipfiles are
> guaranteed to support the full Unicode range for filenames, via UTF-8.
> But it's not impossible for filesystems to only support a limited
> subset (for example, I believe the encoding used for FAT32 filesystems
> is not clearly defined, but is probably some 8-bit codepage, and Unix
> systems rely on whatever encoding the user has specified via the
> locale settings).


As always, it’s complicated — 
https://marcosc.com/2008/12/zip-files-and-encoding-i-hate-you/ 


—
Donald Stufft



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Nathaniel Smith
On Mon, Jun 12, 2017 at 1:57 PM, Thomas Kluyver  wrote:
> On Mon, Jun 12, 2017, at 09:45 PM, Daniel Holth wrote:
>
> I think all my wheel generators except bdist_wheel build the zipfile
> directly.
>
>
> There is a certain appeal to using the zipped .whl file as the canonical
> format for all tools that produce or consume wheels, rather than defining a
> closely related but distinct 'unpacked wheel' format. A directory and a zip
> file do not have 100% identical features (filename encodings may differ,
> entries in a zip file are ordered, there may be metadata in one format
> that's not present in the other, and so on).

I find the reproducible builds argument to be a pretty compelling
argument for generating wheels directly. (It also applies to sdists.)
Another point is that tools that you might have in your build pipeline
-- like auditwheel -- currently use wheel files as their interchange
format, so you might end up having to zip, run auditwheel, unzip for
pip, and the pip zips again to cache the wheel...

The whole conversation feels a bit like we're falling into the
developer trap of "oo there's a thing that might be optimizable
therefore we MUST optimize it" without any real assessment of the
benefits (I'm as guilty as anyone!). It's not even clear to me that
copying a tree twice *is* faster than packing and then unpacking a
wheel in general – if your tree consists of lots of small files and
you're IO-bound, then the wheel version might well be faster. (E.g. on
an underprovisioned virtual server, especially if using spinning media
- while of course we're all benchmarking on laptops with fast SSD and
everything in cache :-).) And in any case, I'm generally very
skeptical of moving away from the well-specified wheel format that
already has lots of tooling and consensus around it towards anything
ad hoc, when AFAICT no-one has even identified this as an important
bottleneck.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Paul Moore
On 12 June 2017 at 21:57, Thomas Kluyver  wrote:
> There is a certain appeal to using the zipped .whl file as the canonical
> format for all tools that produce or consume wheels, rather than defining a
> closely related but distinct 'unpacked wheel' format. A directory and a zip
> file do not have 100% identical features (filename encodings may differ,
> entries in a zip file are ordered, there may be metadata in one format
> that's not present in the other, and so on).

This is a reasonable point. As I understand it, zipfiles are
guaranteed to support the full Unicode range for filenames, via UTF-8.
But it's not impossible for filesystems to only support a limited
subset (for example, I believe the encoding used for FAT32 filesystems
is not clearly defined, but is probably some 8-bit codepage, and Unix
systems rely on whatever encoding the user has specified via the
locale settings).

I can't offhand imagine a practical situation where the filesystem
encoding of the build system would cause wheel generation to fail, but
would work for the rest of the build chain - but on the other hand,
this whole question is pretty borderline in any case, as far as I can
tell, so I'm somewhat inclined to go for using the format that
*doesn't* have potential encoding issues built in (I'm pretty sick of
dealing with encoding issues with pip...)

But honestly, I think we're at the point where someone just needs to
make a decision - there's very little compelling evidence either way.
Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Donald Stufft

> On Jun 12, 2017, at 6:10 PM, Thomas Kluyver  wrote:
> 
> On Mon, Jun 12, 2017, at 10:23 PM, Donald Stufft wrote:
>> because it allows the front ends more flexibility in how they use the wheels
> 
> I don't get this? Why is it more flexible?

I went into some detail here: 
https://mail.python.org/pipermail/distutils-sig/2017-June/030684.html 

> 
>> and allows us to avoid doing work, making the process involved faster for 
>> everyone.
> 
> This is true so long as backends build a directory and then skip zipping it 
> up. If backends are building a zip file and then unpacking it (which, from 
> what Daniel and I have described, may be common), then for tasks which want a 
> zip file, you're now unpacking and repacking it.

We can’t force backends (or frontend) to implement things in the most 
performant way, we can only provide the tools to make it possible for them to 
do so. They could just as easily choose to add a time.sleep(5) just for kicks 
too. However, even in that case where the author of a backend tool chooses to 
implement it less efficiently than they possibly could, that’s fine because in 
one of the cases we’re in no worse off a condition than we would be if the 
backend was doing the zipping and in two conditions we are worse off, but 
they’re also conditions where the work is done once and then shared many times, 
so the amortized cost is relatively low anyways.

> 
> So it hinges both on what backends do and on what tasks are common for 
> frontends using this interface. You might assume that the most common task 
> will be installation, which uses the files unpacked. But most installs will 
> use a pre-built wheel, so it's not obvious to me that the typical use of the 
> build interface will be to install a package.
> 

No, I assume that for 2/3 of the cases *today* it basically does not matter if 
the front end or the backend does the ziping, the wiping is going to happen 
either way. For 1 of those 2/3 cases there is a possibility of changing that in 
the future to optimize a common path more (via the wheel cache) where moving 
the zipping into the front end shaved off time. For 1/3 of those cases today it 
will cut off processing time.

So to sum up:

Today: 1/3 of the cases will be faster, 2/3 of the cases it will make no 
difference.
Possible Future: 2/3 of the cases will be faster, 1/3 of the cases it will make 
no difference.

There is basically no benefit to having the backend do it, there is no case 
where it is faster to do so unless you assume a modern CPU with an ancient and 
slow hard drive to the point that copying a file is significantly slower than 
compressing it. Just for kicks I went and did a quick test on pip, when 
generating a wheel, ~50% of the time taken running ``python setup.py 
bdist_wheel`` on my computer is taken up by compressing the file and another 
~25% of that time is taken up by decompressing the file again. 


—
Donald Stufft



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Thomas Kluyver
On Mon, Jun 12, 2017, at 10:23 PM, Donald Stufft wrote:
> because it allows the front ends more flexibility in how they use
> the wheels
I don't get this? Why is it more flexible?

> and allows us to avoid doing work, making the process involved faster
> for everyone.
This is true so long as backends build a directory and then skip zipping
it up. If backends are building a zip file and then unpacking it (which,
from what Daniel and I have described, may be common), then for tasks
which want a zip file, you're now unpacking and repacking it.
So it hinges both on what backends do and on what tasks are common for
frontends using this interface. You might assume that the most common
task will be installation, which uses the files unpacked. But most
installs will use a pre-built wheel, so it's not obvious to me that the
typical use of the build interface will be to install a package.
Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Donald Stufft

> On Jun 12, 2017, at 5:21 PM, Thomas Kluyver  wrote:
> 
> On Mon, Jun 12, 2017, at 10:09 PM, Donald Stufft wrote:
>> It’s pretty hard to screw up zipping up a directory.
> 
> If you want reproducible builds, it's very easy to screw up, and your 
> response doesn't inspire confidence that frontends will do it carefully. But 
> I see flit mainly as something you use directly to build and publish wheels, 
> and these hooks of secondary interest for e.g. installing from a VCS URL. So 
> I'll keep my zip-archive-building code in flit, and let frontend tools 
> duplicate the part of that functionality they care about.
> 
> It's also, of course, hard to screw up unzipping to a directory. ;-)
> 
> 


Sure! I’m not advocating for unpacked wheels because I think either one is 
harder or easier to screw up, but rather because it allows the front ends more 
flexibility in how they use the wheels and allows us to avoid doing work, 
making the process involved faster for everyone.

—
Donald Stufft



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Thomas Kluyver
On Mon, Jun 12, 2017, at 10:09 PM, Donald Stufft wrote:
> It’s pretty hard to screw up zipping up a directory.

If you want reproducible builds, it's very easy to screw up, and your
response doesn't inspire confidence that frontends will do it carefully.
But I see flit mainly as something you use directly to build and publish
wheels, and these hooks of secondary interest for e.g. installing from a
VCS URL. So I'll keep my zip-archive-building code in flit, and let
frontend tools duplicate the part of that functionality they care about.
It's also, of course, hard to screw up unzipping to a directory. ;-)


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Donald Stufft

> On Jun 12, 2017, at 5:05 PM, Daniel Holth  wrote:
> 
> Yes, and I worry that certain front ends will generate the zipfile 
> incorrectly. Better to do it in the back end.



It’s pretty hard to screw up zipping up a directory.

—
Donald Stufft



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Daniel Holth
Yes, and I worry that certain front ends will generate the zipfile
incorrectly. Better to do it in the back end.

On Mon, Jun 12, 2017 at 4:57 PM Thomas Kluyver  wrote:

> On Mon, Jun 12, 2017, at 09:45 PM, Daniel Holth wrote:
>
> I think all my wheel generators except bdist_wheel build the zipfile
> directly.
>
>
> There is a certain appeal to using the zipped .whl file as the canonical
> format for all tools that produce or consume wheels, rather than defining a
> closely related but distinct 'unpacked wheel' format. A directory and a zip
> file do not have 100% identical features (filename encodings may differ,
> entries in a zip file are ordered, there may be metadata in one format
> that's not present in the other, and so on).
>
> I think we're also making this change in the assumption that frontends
> will be few and backends numerous, so it makes sense to shift more work
> into frontends. That may not necessarily be true - I could imagine more
> frontends emerging while people standardise on just a few backends.
> Jupyter's frontend/kernel separation was initially designed in the belief
> that it would support one kernel and many frontends, but we've ended up
> getting a lot of kernels with just a handful of popular backends.
>
> I don't feel strongly about this - I can build a wheel and then unzip it
> again if that's what the spec says. But given the choice, I'd specify it in
> terms of a zipped .whl file rather than a directory.
>
> Thomas
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Thomas Kluyver
On Mon, Jun 12, 2017, at 09:45 PM, Daniel Holth wrote:
> I think all my wheel generators except bdist_wheel build the zipfile
> directly.
There is a certain appeal to using the zipped .whl file as the canonical
format for all tools that produce or consume wheels, rather than
defining a closely related but distinct 'unpacked wheel' format. A
directory and a zip file do not have 100% identical features (filename
encodings may differ, entries in a zip file are ordered, there may be
metadata in one format that's not present in the other, and so on).
I think we're also making this change in the assumption that
frontends will be few and backends numerous, so it makes sense to
shift more work into frontends. That may not necessarily be true - I
could imagine more frontends emerging while people standardise on
just a few backends. Jupyter's frontend/kernel separation was
initially designed in the belief that it would support one kernel and
many frontends, but we've ended up getting a lot of kernels with just
a handful of popular backends.
I don't feel strongly about this - I can build a wheel and then unzip it
again if that's what the spec says. But given the choice, I'd specify it
in terms of a zipped .whl file rather than a directory.
Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Daniel Holth
On Mon, Jun 12, 2017 at 4:41 PM Donald Stufft  wrote:

>
> On Jun 12, 2017, at 4:36 PM, Daniel Holth  wrote:
>
> It's certainly easier to build a zipfile correctly than to build a
> directory tree. Might even be faster if your filesystem is slow. Surely if
> there are multiple *.dist-info it is an error?
>
>
>
> Sure, but it’s an error that we currently run into and it tends to occur
> for the people who are least able to do anything about it. Why not design
> the interface to make the error impossible?
>

Live dangerously.

I think all my wheel generators except bdist_wheel build the zipfile
directly.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Daniel Holth
It's certainly easier to build a zipfile correctly than to build a
directory tree. Might even be faster if your filesystem is slow. Surely if
there are multiple *.dist-info it is an error?

On Mon, Jun 12, 2017 at 4:15 PM Donald Stufft  wrote:

> On Jun 12, 2017, at 4:01 PM, Thomas Kluyver  wrote:
>
> On Sat, Jun 10, 2017, at 06:14 PM, Nick Coghlan wrote:
>
> Thomas - I agree with Donald's reasoning here, so would you mind
> updating the PEP accordingly?
>
>
> I've done so here:
> https://github.com/python/peps/pull/290
>
> There are still a couple of questions on which I wasn't quite sure what
> the consensus is:
>
> -Do we want to rename the build_wheel hook now that it makes an
> unpacked wheel, e.g. export_wheel_contents to match
> export_sdist_contents?
>
>
>
> I’m neutral on this, this is just a total bike shed I think so I’m happy
> to go with whatever you prefer.
>
>
> -I have assumed that the wheel hook puts its contents in the
> directory it's passed, rather than creating a subfolder. This is in
> keeping with the structure of wheels, which do not have a single
> top-level directory (unlike sdists), but it wouldn't fit with a future
> hypothetical extension to build multiple wheels at once; we would need a
> separate hook for that.
>
>
> I don’t think having a separate hook is a bad thing here since we don’t
> really know specifically what that would look like. However I also don’t
> think doing something like what we’ve done with prepare_wheel_metadata is
> out of the question either?
>
> One thing I notice is that prepare_wheel_metadata still doesn’t provide a
> way for the backend to communicate to the frontend what .dist-info folder
> it should be looking for but it’s currently possible for (mistakeningly or
> not) to end up with one or more .dist-info files in that directory, so you
> can’t just glob looking for any dist-info.
>
> Perhaps the answer for both of these hooks is to just put the contents
> into the passed in directory (so remove the {name}-{version}.dist-info
> directory from prepare_wheel_metadata, and leave the
> build_wheel/export_wheel_contents, just putting things in the root of the
> directory and only build this API to handle a single wheel at a time.
> If/when we add support for multiple wheels at a time, we can then add a new
> hook to handle that which we can make sure actually supports everything we
> need at that point, rather than trying to guess what that might look like
> today?
>
>
> —
>
> Donald Stufft
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Daniel Holth
Once I thought it might be useful to define 'wheel internal manifest'
(whim) format which would just be a list or mapping that looks like
[('category', source path or filelike, destination path), ... ], since that
is basically what wheel is doing, putting paths in categories. Then you get
the data model without worrying about the file format.

On Mon, Jun 12, 2017 at 4:36 PM Daniel Holth  wrote:

> It's certainly easier to build a zipfile correctly than to build a
> directory tree. Might even be faster if your filesystem is slow. Surely if
> there are multiple *.dist-info it is an error?
>
> On Mon, Jun 12, 2017 at 4:15 PM Donald Stufft  wrote:
>
>> On Jun 12, 2017, at 4:01 PM, Thomas Kluyver  wrote:
>>
>> On Sat, Jun 10, 2017, at 06:14 PM, Nick Coghlan wrote:
>>
>> Thomas - I agree with Donald's reasoning here, so would you mind
>> updating the PEP accordingly?
>>
>>
>> I've done so here:
>> https://github.com/python/peps/pull/290
>>
>> There are still a couple of questions on which I wasn't quite sure what
>> the consensus is:
>>
>> -Do we want to rename the build_wheel hook now that it makes an
>> unpacked wheel, e.g. export_wheel_contents to match
>> export_sdist_contents?
>>
>>
>>
>> I’m neutral on this, this is just a total bike shed I think so I’m happy
>> to go with whatever you prefer.
>>
>>
>> -I have assumed that the wheel hook puts its contents in the
>> directory it's passed, rather than creating a subfolder. This is in
>> keeping with the structure of wheels, which do not have a single
>> top-level directory (unlike sdists), but it wouldn't fit with a future
>> hypothetical extension to build multiple wheels at once; we would need a
>> separate hook for that.
>>
>>
>> I don’t think having a separate hook is a bad thing here since we don’t
>> really know specifically what that would look like. However I also don’t
>> think doing something like what we’ve done with prepare_wheel_metadata is
>> out of the question either?
>>
>> One thing I notice is that prepare_wheel_metadata still doesn’t provide a
>> way for the backend to communicate to the frontend what .dist-info folder
>> it should be looking for but it’s currently possible for (mistakeningly or
>> not) to end up with one or more .dist-info files in that directory, so you
>> can’t just glob looking for any dist-info.
>>
>> Perhaps the answer for both of these hooks is to just put the contents
>> into the passed in directory (so remove the {name}-{version}.dist-info
>> directory from prepare_wheel_metadata, and leave the
>> build_wheel/export_wheel_contents, just putting things in the root of the
>> directory and only build this API to handle a single wheel at a time.
>> If/when we add support for multiple wheels at a time, we can then add a new
>> hook to handle that which we can make sure actually supports everything we
>> need at that point, rather than trying to guess what that might look like
>> today?
>>
>>
>> —
>>
>> Donald Stufft
>> ___
>> Distutils-SIG maillist  -  Distutils-SIG@python.org
>> https://mail.python.org/mailman/listinfo/distutils-sig
>>
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Donald Stufft

> On Jun 12, 2017, at 4:36 PM, Daniel Holth  wrote:
> 
> It's certainly easier to build a zipfile correctly than to build a directory 
> tree. Might even be faster if your filesystem is slow. Surely if there are 
> multiple *.dist-info it is an error?
> 


Sure, but it’s an error that we currently run into and it tends to occur for 
the people who are least able to do anything about it. Why not design the 
interface to make the error impossible?


—
Donald Stufft



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Donald Stufft

> On Jun 12, 2017, at 4:01 PM, Thomas Kluyver  wrote:
> 
> On Sat, Jun 10, 2017, at 06:14 PM, Nick Coghlan wrote:
>> Thomas - I agree with Donald's reasoning here, so would you mind
>> updating the PEP accordingly?
> 
> I've done so here:
> https://github.com/python/peps/pull/290
> 
> There are still a couple of questions on which I wasn't quite sure what
> the consensus is:
> 
> -Do we want to rename the build_wheel hook now that it makes an
> unpacked wheel, e.g. export_wheel_contents to match
> export_sdist_contents?


I’m neutral on this, this is just a total bike shed I think so I’m happy to go 
with whatever you prefer.


> -I have assumed that the wheel hook puts its contents in the
> directory it's passed, rather than creating a subfolder. This is in
> keeping with the structure of wheels, which do not have a single
> top-level directory (unlike sdists), but it wouldn't fit with a future
> hypothetical extension to build multiple wheels at once; we would need a
> separate hook for that.

I don’t think having a separate hook is a bad thing here since we don’t really 
know specifically what that would look like. However I also don’t think doing 
something like what we’ve done with prepare_wheel_metadata is out of the 
question either?

One thing I notice is that prepare_wheel_metadata still doesn’t provide a way 
for the backend to communicate to the frontend what .dist-info folder it should 
be looking for but it’s currently possible for (mistakeningly or not) to end up 
with one or more .dist-info files in that directory, so you can’t just glob 
looking for any dist-info.

Perhaps the answer for both of these hooks is to just put the contents into the 
passed in directory (so remove the {name}-{version}.dist-info directory from 
prepare_wheel_metadata, and leave the build_wheel/export_wheel_contents, just 
putting things in the root of the directory and only build this API to handle a 
single wheel at a time. If/when we add support for multiple wheels at a time, 
we can then add a new hook to handle that which we can make sure actually 
supports everything we need at that point, rather than trying to guess what 
that might look like today?


—
Donald Stufft



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 517: Open questions around artifact export directories

2017-06-12 Thread Thomas Kluyver
On Sat, Jun 10, 2017, at 06:14 PM, Nick Coghlan wrote:
> Thomas - I agree with Donald's reasoning here, so would you mind
> updating the PEP accordingly?

I've done so here:
https://github.com/python/peps/pull/290

There are still a couple of questions on which I wasn't quite sure what
the consensus is:

-Do we want to rename the build_wheel hook now that it makes an
unpacked wheel, e.g. export_wheel_contents to match
export_sdist_contents?
-I have assumed that the wheel hook puts its contents in the
directory it's passed, rather than creating a subfolder. This is in
keeping with the structure of wheels, which do not have a single
top-level directory (unlike sdists), but it wouldn't fit with a future
hypothetical extension to build multiple wheels at once; we would need a
separate hook for that.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig