A minor detail for pip strategy option #2 is that sdists do not have to have PKG-INFO.
On Mon, Jul 17, 2017 at 9:02 AM Nathaniel Smith <n...@pobox.com> wrote: > Hi all, > > I happened to talk to Donald on IRC last night, and said I'd write up > some notes on that discussion. (He's also seen these notes, but is > partially-offline dealing with jury duty.) [Edit: but apparently still > replying to email on the list, so some of this repeats things he's > posted since I sent it to him to look at, but oh well :-). That's life > in the fast-paced world of F/OSS development I guess.] > > One thing we tried to get a handle on is what, exactly, are > constraints that PEP 517 is trying to deal with. > > As far as pip goes, these are some of the situations that Donald's > worried about handling gracefully when it comes to 'pip install .': > > a) The source tree is on read-only media > > b) People do an editable install in a source tree, and then follow it > up with a regular install ('pip install -e . && pip install .'). > Apparently this can cause problems because of the residual metadata > from the editable install confusing setuptools. > > c) Someone does 'pip install .' on a complex package, then discovers > that while the build succeeded, it's missing some optional feature > because some necessary system library was missing, so they install > that library and re-try the 'pip install .'. Will the build backend > notice that the library has appeared, or does it only do environment > sniffing the first time it's run in a given tree? > > d) Mounting the same source tree into multiple different docker > containers: 'docker run ubuntu -v .:/io pip install /io && docker run > alpine -v .:/io install /io'. Will the build backend notice that these > two systems have incompatible C toolchains and C ABIs, so you can't > share .o files between them? In principle a build system could notice > this by detecting that the compiler and system headers have changed, > but in traditional build systems it's common to intentionally *not* > include these in your build cache key, because you don't want to > rebuild the world after every apt upgrade. (For example, gcc provides > the -M switch to generate make rules to trigger rebuilds on header > changes, and the -MM switch to do the same but ignoring changes in > system headers; lots of projects choose to use -MM. This is related to > the thing where incremental builds are traditionally a bit sloppy and > intended for experts only.) > > e) Build systems that have different sources of truth for what goes > into an sdist versus what goes into a wheel, and thus can easily end > up in a situation where a direct VCS->wheel build gives a different > result than a VCS->sdist->wheel build. The most prominent offender > here is distutils/setuptools with its MANIFEST.in mess, but Donald is > nervous that other systems might also reproduce this mistake. > > f) And finally Donald feels "it's just more hygenic to have ``pip > install .`` not modify ., similiarly to how [he] would be upset if > ``pip install foo-1.0.tar.gz`` modified foo-1.0.tar.gz in some way". > > Of course, no system can avoid every problem; the overall goal here is > harm reduction and minimizing spurious bugs filed on pip, not > perfection. > > None of these cases arise for 'pip install name' or 'pip install > sdist.tar.gz'; it's really only 'pip install .' on an user-provided > source tree. > > ---------- > > For reference, here's my analysis of how these particular desiderata > relate to some possible approaches: > > - Pip's current system for handling builds from existing source trees > (copytree + setup.py bdist_wheel -or- setup.py install) handles (a), > (c), (d), (f), but not (b) or (e), unless someone has previously done > an in-place build in the tree, in which case it handles (a) and (f) > but not (b), (c), (d), or (e). Which is kind of unfortunate, since (a) > and (f) are probably the least important items. > > - When sdist->unpack->wheel is possible, it automatically handles all > of these cases. > > - If a build backend *only* does in-place builds (like meson) and > *does not support* in-place or editable installs, then having an > out-of-place build_wheel hook automatically takes care of everything > except (e). > > - Otherwise, an out-of-place build_wheel hook acts handles (a), (c), > (d), (f) but not (b) or (e), unless someone has previously done an > in-place build in the tree, in which case it handles (a) and (f) but > not (b), (c), (d), or (e) (assuming that the build system can get > confused between artifacts left behind by in-place builds when doing > an out-of-place build, which appears to be a common feature of > existing systems). > > Notice that this means that as far as this score card is concerned, > out-of-place build_wheel is identical to pip's current copytree + > in-place build_wheel. The key insight here is that an out-of-tree > build is nicer for the *next* person to use this source tree, but when > pip runs what it cares about is whether the *last* build was in-tree > or out-of-tree, and that's not something that it has any control over. > > - If you have a system that supports in-place builds but not > build_sdist, then copytree + a "clean build" hook would cover > everything except (e). Notice that, for the same reason as above, it > doesn't matter whether the clean build is in-tree or out-of-tree, just > that it has the ability to avoid any interference from junk left > inside the tree by any previous build. > > ---------- > > Anyway. Back to things we discussed. > > As far as we understand, the way that flit comes into this is that > flit cannot generate an sdist: > - from a VCS directory when the VCS tools are unavailable > - from an unpacked sdist > > (Thomas, is that right?) > > So on that basis, we considered a few different strategies that pip > might take for handling 'pip install .': > > Pip Strategy Option 1: First make sure that the setuptools backend > build_wheel hook takes responsibility for doing the > sdist->unpack->wheel dance when running from a VCS checkout. And then, > now that we don't have to worry about setuptools messing things up, > make pip's strategy be to just do a plain in-place or out-of-place > build_wheel. > > As you can probably guess from the list above, Donald didn't feel > comfortable with this, at least to start with -- he thinks that it > might be possible for pip to transition to this in the future if it > turns out that new backends are generally of high quality, but until > we have more experience, he's nervous. This makes sense to me too -- I > think we can plausibly mandate that build backends take care of (e) > (MANIFEST.in problems -- that's basically the idea of making > setuptools's build backend responsible for doing the > sdist->unpac->wheel dance), and (a) (read-only source tree) is a rare > edge case that can reasonably be handled by a special case in the > frontend, but all the other items actually are regular occurences in > modern build systems. > > Really he wants pip to go via sdist 99% of the time, because that's > the case that Just Works, and then have whatever fallback is necessary > for the 1%. In some sense it doesn't even matter that much what the > fallback is, because if this rare edge case is a bit slow or a bit > more error prone, well... it's a rare edge case. Even for flit, 99% of > the time people will just be installing wheels, not downloading and > manually unpacking sdists, so as long as it *works* then it doesn't > have to be perfect. Anyway, this leads to the last two possible > strategies we discussed: > > Pip Strategy Option 2: Check to see if a PKG-INFO file is present. If > so, do an in-place build_wheel; otherwise, do > build_sdist->unpack->build_wheel. The key thing here is that because > it's pip checking for the PKG-INFO instead of querying the build > backend, then every source tree has to either *be* an sdist, or be > able *produce* an sdist; there's no fallback for non-sdist trees that > can't produce sdists. (At least if you want to support 'pip install > .') > > Pip Strategy Option 3: Ask the build backend whether it can do a > build_sdist. If so, do build_sdist->unpack->build_wheel; otherwise, do > copytree->in-place build_wheel, or out-of-place build_wheel, or > whatever, it doesn't matter that much to pip. > > As far as Donald is concerned, either of these options would be fine. > For "option 2" (having pip check for PKG-INFO), the main potential > flaw we noted is that while it's OK with flit not being able to build > an sdist from an sdist, it will error out in the case where you try to > do 'pip install .' on a flit VCS checkout and flit can't find the VCS > tools. Donald is fine with that, on the grounds that oh well, if > you're working with a VCS checkout and need VCS tools to get > consistent result then printing an error message telling people to > install them is a fine outcome. (In particular, he observes that flit > *can* produce sdist-inconsistent results in this case: if there's a > .py file present in the source but not added to the VCS, and you do > build_wheel, then he thinks that the .py file will be installed iff > the VCS tools are unavailable, and would never be included in any > sdist. Thomas, is this right?) However, he thinks that Thomas objected > to raising an error in this case, and wants to allow flit to work. > (Thomas, is that right?) So possibly that's an argument for preferring > "option 3" (ask the backend if it can do an sdist). > > Another point about option 2 that we didn't discuss but that I'll > mention now is that it does kinda enforce that all build backends > support building sdists in general, which might be good or bad > depending on what you think about that. > > ----------- > > A few more general points that Donald made: > > - He doesn't care much about in-place vs out-of-place in general, we > can provide either or both as far as he's concerned. It doesn't matter > to pip. > > - But, if we do support both in-place and out-of-place, then he > strongly feels that there should *not* be any semantic difference > between them "because it just adds another "path" and possible > inconsistency", plus if a build backend is able to do something smart > it should just do it all the time. So I think this is a point of > disagreement between Donald and Nick. > > - If pip is going to go with "option 3" (i.e., attempting to do > build_sdist->unpack->build_wheel, unless the build backend says it > can't), then he strongly feels that this fallback should be triggered > by some *explicit* signal from the backend saying that build_sdist is > not supported; in general he wants build_sdist raising an error to be > an immediate failure, rather than triggering the fallback path. > Basically what he's saying is that he wants something like my > NotImplemented proposal or equivalent. > > Regarding this last point: I want to point out this would also mean > that when building via pip, the fallback could *only* be triggered by > backends that explicitly ask for it, so they're the only ones who > could possibly end up exposing their users to any weirdness in the > fallback path, and it's up to them to figure out how to deal with it. > In particular, it means that setuptools and flit are both handled fine > regardless of what we decide about in-place vs. out-of-place, because > setuptools will always go via the sdist path and thus be built from a > pristine directory, and flit doesn't care about in-place vs. > out-of-place anyway. Basically, as long as pip has some kind of > build_sdist and build_wheel and build_sdist can signal "not > implemented" in a clean way, then pip is happy. > > My analysis above suggests that beyond this, there's *one* possible > extension that pip might want: In the future, it might turn out that > there is a build backend that is like flit in that it can't always > generate an sdist, but is like setuptools in that it can produce > broken results when run in a unclean directory. If that happens, and > if we decide that it's important for such a backend to play nicely > with pip *and* to support incremental builds *and* to do both of these > through the PEP 517 hook interface, then it might be useful to have > something like a "make_clean_then_build_wheel" hook for pip to call. > My feeling is we can probably defer this for now though? This is like > 3 edge cases on top of each other and we don't even know if such a > backend will ever exist. > > ---------- > > So none of the above actually has much to do with the in-place vs > out-of-place debate; basically the conclusion is that pip doesn't > care. But we still have to pick something. > > Donald and I both have some preference for only having one way to do > it; given that the semantics are supposed to be identical, why bother > making everyone implement both? So: should we start with in-place (and > possibly add out-of-place as an option later for specific use cases > like build pipelines), or should we start by mandating out-of-place? > > Neither of us have a particularly strong opinion on this. I guess I > have mild preference for starting with in-place, because: > > - Out-of-place is more complicated to implement than in-place, and > it's going to be difficult to explain to build system authors why > we're forcing them to do this extra work for some obscure cases they > might not care about and that we haven't articulated well. > > - It might seem like only supporting out-of-place builds would > simplify pip's life, because the problems above that are caused by > in-place build detritus would go away. But this isn't really true, > because even if we don't *expose* in-place build functionality through > the PEP 517 hooks, then most build systems are still going to > implement this, and that means we still need to be prepared to handle > the case where the user has done an in-place build, which is the > tricky one. Plus, editable installs intrinsically leave detritus in > the source tree. > > - For me, one of the major goals of PEP 517 is to make life easier for > *really* gnarly projects -- and here I'm not thinking of like numpy or > scipy, they have it easy; I'm thinking of the folks who have, like, a > dozen C different libraries with build systems from the 80s vendored > inside their source tree. I'm nervous that forcing these folks with > complicated embedded multi-package builds to support out-of-tree > builds would be a significant burden and present an obstacle to > uptake. Out-of-place builds are definitely a more advanced feature and > the direction good build systems move in, but I don't want to leave > anyone behind. > > - We have zero users for this functionality right now, which is > usually not a good situation for trying to write a spec. > > - I don't think it will be a big deal to add out-of-place support > later if we want to. > > But really it's all that other stuff that's important to sort out first. > > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > Distutils-SIG maillist - Distutils-SIG@python.org > https://mail.python.org/mailman/listinfo/distutils-sig >
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig