A minor detail for pip strategy option #2 is that sdists do not have to
have PKG-INFO.

On Mon, Jul 17, 2017 at 9:02 AM Nathaniel Smith <n...@pobox.com> wrote:

> Hi all,
>
> I happened to talk to Donald on IRC last night, and said I'd write up
> some notes on that discussion. (He's also seen these notes, but is
> partially-offline dealing with jury duty.) [Edit: but apparently still
> replying to email on the list, so some of this repeats things he's
> posted since I sent it to him to look at, but oh well :-). That's life
> in the fast-paced world of F/OSS development I guess.]
>
> One thing we tried to get a handle on is what, exactly, are
> constraints that PEP 517 is trying to deal with.
>
> As far as pip goes, these are some of the situations that Donald's
> worried about handling gracefully when it comes to 'pip install .':
>
> a) The source tree is on read-only media
>
> b) People do an editable install in a source tree, and then follow it
> up with a regular install ('pip install -e . && pip install .').
> Apparently this can cause problems because of the residual metadata
> from the editable install confusing setuptools.
>
> c) Someone does 'pip install .' on a complex package, then discovers
> that while the build succeeded, it's missing some optional feature
> because some necessary system library was missing, so they install
> that library and re-try the 'pip install .'. Will the build backend
> notice that the library has appeared, or does it only do environment
> sniffing the first time it's run in a given tree?
>
> d) Mounting the same source tree into multiple different docker
> containers: 'docker run ubuntu -v .:/io pip install /io && docker run
> alpine -v .:/io install /io'. Will the build backend notice that these
> two systems have incompatible C toolchains and C ABIs, so you can't
> share .o files between them? In principle a build system could notice
> this by detecting that the compiler and system headers have changed,
> but in traditional build systems it's common to intentionally *not*
> include these in your build cache key, because you don't want to
> rebuild the world after every apt upgrade. (For example, gcc provides
> the -M switch to generate make rules to trigger rebuilds on header
> changes, and the -MM switch to do the same but ignoring changes in
> system headers; lots of projects choose to use -MM. This is related to
> the thing where incremental builds are traditionally a bit sloppy and
> intended for experts only.)
>
> e) Build systems that have different sources of truth for what goes
> into an sdist versus what goes into a wheel, and thus can easily end
> up in a situation where a direct VCS->wheel build gives a different
> result than a VCS->sdist->wheel build. The most prominent offender
> here is distutils/setuptools with its MANIFEST.in mess, but Donald is
> nervous that other systems might also reproduce this mistake.
>
> f) And finally Donald feels "it's just more hygenic to have ``pip
> install .`` not modify ., similiarly to how [he] would be upset if
> ``pip install foo-1.0.tar.gz`` modified foo-1.0.tar.gz in some way".
>
> Of course, no system can avoid every problem; the overall goal here is
> harm reduction and minimizing spurious bugs filed on pip, not
> perfection.
>
> None of these cases arise for 'pip install name' or 'pip install
> sdist.tar.gz'; it's really only 'pip install .' on an user-provided
> source tree.
>
> ----------
>
> For reference, here's my analysis of how these particular desiderata
> relate to some possible approaches:
>
> - Pip's current system for handling builds from existing source trees
> (copytree + setup.py bdist_wheel -or- setup.py install) handles (a),
> (c), (d), (f), but not (b) or (e), unless someone has previously done
> an in-place build in the tree, in which case it handles (a) and (f)
> but not (b), (c), (d), or (e). Which is kind of unfortunate, since (a)
> and (f) are probably the least important items.
>
> - When sdist->unpack->wheel is possible, it automatically handles all
> of these cases.
>
> - If a build backend *only* does in-place builds (like meson) and
> *does not support* in-place or editable installs, then having an
> out-of-place build_wheel hook automatically takes care of everything
> except (e).
>
> - Otherwise, an out-of-place build_wheel hook acts handles (a), (c),
> (d), (f) but not (b) or (e), unless someone has previously done an
> in-place build in the tree, in which case it handles (a) and (f) but
> not (b), (c), (d), or (e) (assuming that the build system can get
> confused between artifacts left behind by in-place builds when doing
> an out-of-place build, which appears to be a common feature of
> existing systems).
>
>   Notice that this means that as far as this score card is concerned,
> out-of-place build_wheel is identical to pip's current copytree +
> in-place build_wheel. The key insight here is that an out-of-tree
> build is nicer for the *next* person to use this source tree, but when
> pip runs what it cares about is whether the *last* build was in-tree
> or out-of-tree, and that's not something that it has any control over.
>
> - If you have a system that supports in-place builds but not
> build_sdist, then copytree + a "clean build" hook would cover
> everything except (e). Notice that, for the same reason as above, it
> doesn't matter whether the clean build is in-tree or out-of-tree, just
> that it has the ability to avoid any interference from junk left
> inside the tree by any previous build.
>
> ----------
>
> Anyway. Back to things we discussed.
>
> As far as we understand, the way that flit comes into this is that
> flit cannot generate an sdist:
> - from a VCS directory when the VCS tools are unavailable
> - from an unpacked sdist
>
> (Thomas, is that right?)
>
> So on that basis, we considered a few different strategies that pip
> might take for handling 'pip install .':
>
> Pip Strategy Option 1: First make sure that the setuptools backend
> build_wheel hook takes responsibility for doing the
> sdist->unpack->wheel dance when running from a VCS checkout. And then,
> now that we don't have to worry about setuptools messing things up,
> make pip's strategy be to just do a plain in-place or out-of-place
> build_wheel.
>
> As you can probably guess from the list above, Donald didn't feel
> comfortable with this, at least to start with -- he thinks that it
> might be possible for pip to transition to this in the future if it
> turns out that new backends are generally of high quality, but until
> we have more experience, he's nervous. This makes sense to me too -- I
> think we can plausibly mandate that build backends take care of (e)
> (MANIFEST.in problems -- that's basically the idea of making
> setuptools's build backend responsible for doing the
> sdist->unpac->wheel dance), and (a) (read-only source tree) is a rare
> edge case that can reasonably be handled by a special case in the
> frontend, but all the other items actually are regular occurences in
> modern build systems.
>
> Really he wants pip to go via sdist 99% of the time, because that's
> the case that Just Works, and then have whatever fallback is necessary
> for the 1%. In some sense it doesn't even matter that much what the
> fallback is, because if this rare edge case is a bit slow or a bit
> more error prone, well... it's a rare edge case. Even for flit, 99% of
> the time people will just be installing wheels, not downloading and
> manually unpacking sdists, so as long as it *works* then it doesn't
> have to be perfect. Anyway, this leads to the last two possible
> strategies we discussed:
>
> Pip Strategy Option 2: Check to see if a PKG-INFO file is present. If
> so, do an in-place build_wheel; otherwise, do
> build_sdist->unpack->build_wheel. The key thing here is that because
> it's pip checking for the PKG-INFO instead of querying the build
> backend, then every source tree has to either *be* an sdist, or be
> able *produce* an sdist; there's no fallback for non-sdist trees that
> can't produce sdists. (At least if you want to support 'pip install
> .')
>
> Pip Strategy Option 3: Ask the build backend whether it can do a
> build_sdist. If so, do build_sdist->unpack->build_wheel; otherwise, do
> copytree->in-place build_wheel, or out-of-place build_wheel, or
> whatever, it doesn't matter that much to pip.
>
> As far as Donald is concerned, either of these options would be fine.
> For "option 2" (having pip check for PKG-INFO), the main potential
> flaw we noted is that while it's OK with flit not being able to build
> an sdist from an sdist, it will error out in the case where you try to
> do 'pip install .' on a flit VCS checkout and flit can't find the VCS
> tools. Donald is fine with that, on the grounds that oh well, if
> you're working with a VCS checkout and need VCS tools to get
> consistent result then printing an error message telling people to
> install them is a fine outcome. (In particular, he observes that flit
> *can* produce sdist-inconsistent results in this case: if there's a
> .py file present in the source but not added to the VCS, and you do
> build_wheel, then he thinks that the .py file will be installed iff
> the VCS tools are unavailable, and would never be included in any
> sdist. Thomas, is this right?) However, he thinks that Thomas objected
> to raising an error in this case, and wants to allow flit to work.
> (Thomas, is that right?) So possibly that's an argument for preferring
> "option 3" (ask the backend if it can do an sdist).
>
> Another point about option 2 that we didn't discuss but that I'll
> mention now is that it does kinda enforce that all build backends
> support building sdists in general, which might be good or bad
> depending on what you think about that.
>
> -----------
>
> A few more general points that Donald made:
>
> - He doesn't care much about in-place vs out-of-place in general, we
> can provide either or both as far as he's concerned. It doesn't matter
> to pip.
>
> - But, if we do support both in-place and out-of-place, then he
> strongly feels that there should *not* be any semantic difference
> between them "because it just adds another "path" and possible
> inconsistency", plus if a build backend is able to do something smart
> it should just do it all the time. So I think this is a point of
> disagreement between Donald and Nick.
>
> - If pip is going to go with "option 3" (i.e., attempting to do
> build_sdist->unpack->build_wheel, unless the build backend says it
> can't), then he strongly feels that this fallback should be triggered
> by some *explicit* signal from the backend saying that build_sdist is
> not supported; in general he wants build_sdist raising an error to be
> an immediate failure, rather than triggering the fallback path.
> Basically what he's saying is that he wants something like my
> NotImplemented proposal or equivalent.
>
> Regarding this last point: I want to point out this would also mean
> that when building via pip, the fallback could *only* be triggered by
> backends that explicitly ask for it, so they're the only ones who
> could possibly end up exposing their users to any weirdness in the
> fallback path, and it's up to them to figure out how to deal with it.
> In particular, it means that setuptools and flit are both handled fine
> regardless of what we decide about in-place vs. out-of-place, because
> setuptools will always go via the sdist path and thus be built from a
> pristine directory, and flit doesn't care about in-place vs.
> out-of-place anyway. Basically, as long as pip has some kind of
> build_sdist and build_wheel and build_sdist can signal "not
> implemented" in a clean way, then pip is happy.
>
> My analysis above suggests that beyond this, there's *one* possible
> extension that pip might want: In the future, it might turn out that
> there is a build backend that is like flit in that it can't always
> generate an sdist, but is like setuptools in that it can produce
> broken results when run in a unclean directory. If that happens, and
> if we decide that it's important for such a backend to play nicely
> with pip *and* to support incremental builds *and* to do both of these
> through the PEP 517 hook interface, then it might be useful to have
> something like a "make_clean_then_build_wheel" hook for pip to call.
> My feeling is we can probably defer this for now though? This is like
> 3 edge cases on top of each other and we don't even know if such a
> backend will ever exist.
>
> ----------
>
> So none of the above actually has much to do with the in-place vs
> out-of-place debate; basically the conclusion is that pip doesn't
> care. But we still have to pick something.
>
> Donald and I both have some preference for only having one way to do
> it; given that the semantics are supposed to be identical, why bother
> making everyone implement both? So: should we start with in-place (and
> possibly add out-of-place as an option later for specific use cases
> like build pipelines), or should we start by mandating out-of-place?
>
> Neither of us have a particularly strong opinion on this. I guess I
> have mild preference for starting with in-place, because:
>
> - Out-of-place is more complicated to implement than in-place, and
> it's going to be difficult to explain to build system authors why
> we're forcing them to do this extra work for some obscure cases they
> might not care about and that we haven't articulated well.
>
> - It might seem like only supporting out-of-place builds would
> simplify pip's life, because the problems above that are caused by
> in-place build detritus would go away. But this isn't really true,
> because even if we don't *expose* in-place build functionality through
> the PEP 517 hooks, then most build systems are still going to
> implement this, and that means we still need to be prepared to handle
> the case where the user has done an in-place build, which is the
> tricky one. Plus, editable installs intrinsically leave detritus in
> the source tree.
>
> - For me, one of the major goals of PEP 517 is to make life easier for
> *really* gnarly projects -- and here I'm not thinking of like numpy or
> scipy, they have it easy; I'm thinking of the folks who have, like, a
> dozen C different libraries with build systems from the 80s vendored
> inside their source tree. I'm nervous that forcing these folks with
> complicated embedded multi-package builds to support out-of-tree
> builds would be a significant burden and present an obstacle to
> uptake. Out-of-place builds are definitely a more advanced feature and
> the direction good build systems move in, but I don't want to leave
> anyone behind.
>
> - We have zero users for this functionality right now, which is
> usually not a good situation for trying to write a spec.
>
> - I don't think it will be a big deal to add out-of-place support
> later if we want to.
>
> But really it's all that other stuff that's important to sort out first.
>
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to