On Sat, Nov 7, 2015 at 6:57 AM, Paul Moore <p.f.mo...@gmail.com> wrote: > 2. (For here) Builds are not isolated from what's in the development > directory. So if you have your sdist definition wrong, what you build > locally may work, but when you release it it may fail. Obviously that > can be fixed by proper development and testing practices, but pip is > designed currently to isolate builds to protect against mistakes like > this, we'd need to remove that protection for cases where we wanted to > do in-place builds.
I agree that it would be nice to make sdist generation more reliable and tested by default, but I don't think this quite works as a solution. 1) There's no guarantee that building an sdist from some dirty working tree will produce anything like what you'd have for a release sdist, or even a clean isolated build. (E.g. a very common mistake is adding a new file to the working directory for forgetting to run 'git/hg add'. To protect against this, you have to either have to have a build system that's smart enough to talk to the VCS when figuring out what files to include, or better yet you have to work from a clean checkout.) And as currently specified these "isolated" build trees might even end up including partial build detritus from previous in-place builds, copied from the source directory into the temporary directory. 2) Sometimes people will want to download an sdist, unpack it, and then run 'pip install .' from it. In your proposal this would require first building a new sdist from the unpacked working tree. But there's no guarantee that you can generate an sdist from an sdist. None of the proposals for a new build system interface have contemplated adding an "sdist" command, and even if they did, then a clever sdist command might well fail, e.g. because it is only designed to build sdists from a checkout with full VCS metadata that it can use to figure out what files to include :-). 3) And anyway, it's pretty weird logically to include a mandatory sdist command inside an interface that 99% of the time will be working *from* an sdist :-). The rule of thumb I've used for the build interface stuff so far is that it should be the minimal stuff that is needed to provide a convenient interface for people who just want to install packages, because the actual devs on a particular project can use whatever project/build-system-specific interfaces make sense for their workflow. And end-users don't build sdists. But for the operations that pip does provide, like 'pip wheel' and 'pip install', they should be usable by devs, because devs will use them. > 3. The logic inside pip for doing builds is already pretty tricky. > Adding code to sometimes build in place and sometimes in a temporary > directory is going to make it even more complex. That might not be a > concern for end users, but it makes maintaining pip harder, and risks > there being subtle bugs in the logic that could bite end users. If you > want specifics, I can't give them at the moment, because I don't know > what the code to do the proposed in-place building would look like. Yeah, this is always a concern for any change. The tradeoff is that you get to delete the code for "downloading" unpacked directories into a temporary directory (which currently doesn't even use sdist -- it just blindly copies everything, including e.g. the full git history). And you get to skip specifying a standard build-an-sdist interface that pip and every build system backend would all have to support and interoperate on. Basically AFAICT the logic should be: 1) Arrange for the existence of a build directory: If building from a directory: great, we have one, use that else if building from a file/url: download it and unpack it, then use that 2) do the build using the build directory 3) if it's a temporary directory and the build succeeded, clean up (Possibly with some complications like providing options for people to specify a non-temporary directory to use for unpacking downloaded sdists.) It might need a bit of refactoring so that the "arrange for the existence of a build directory" step returns the chosen build directory instead of taking it as a parameter like I assume it does now, but it doesn't seem like the intrinsic complexity is very high. > I hope that helps. It's probably not as specific or explicit as you'd > like, but to be fair, nor is the proposal. > > What we currently have on the table is "If 'pip (install/wheel) .' is > supposed to become the standard way to build things, then it should > probably build in-place by default." For my personal use cases, I > don't actually agree with any of that, but my use cases are not even > remotely like those of numpy developers, so I don't want to dismiss > the requirement. But if it's to go anywhere, it needs to be better > explained. > > Just to be clear, *my* position (for projects simpler than numpy and > friends) is: > > 1. The standard way to install should be "pip install <requirement or wheel>". > 2. The standard way to build should be "pip wheel <sdist or > directory>". The directory should be a clean checkout of something you > plan to release, with a unique version number. > 3. The standard way to develop should be "pip install -e ." > 4. Builds (pip wheel) should always unpack to a temporary location and > build there. When building from a directory, in effect build a sdist > and unpack it to the temporary location. > > I hear the message that for things like numpy these rules won't work. > But I'm completely unclear on why. Sure, builds take ages unless done > incrementally. That's what pip install -e does, I don't understand why > that's not acceptable. To me this feels like mixing two orthogonal issues. 'pip install' and 'pip install -e' have different *semantics* -- one installs a snapshot into an environment, and one installs a symlink-like-thing into an environment -- and that's orthogonal to the question of whether you want to implement that using a "clean build" or not. (Also, it's totally reasonable to want partial builds in 'pip wheel': 'pip wheel .', get a compiler error, fix it, try again...) Furthermore, I actually really dislike 'pip install -e' and am surprised to see so many people talking about it as if it were the obvious choice for all development :-). I understand it takes all kinds, etc., I'm not arguing that it should be removed or anything (though I probably would if I thought it had any chance of getting consensus :-)). But from my point of view, 'pip install -e' is a weird intrinsically-kinda-broken wart that provides no value outside of some rare use cases that most people never encounter. I say "intrinsically-kinda-broken" because as soon as you do an editable install, the metadata in .egg/dist-info starts to drift out of sync from your actual source tree, so that it necessarily makes the installed package database less reliable, undermining a lot of the work that's being done to make installation and resolution more robust. I also am really unsure about why people use it. I generally don't *want* to install code-under-development into a full-fledged virtualenv. I see lots of people who have a primary virtualenv that they use for day-to-day work, and they 'pip install -e' all the packages that they work on into this environment, and then run into all kinds of weird problems because they're using a bunch of untested code together, or they switch to a different branch of one package to check something and then forget about it when they context switch to some other project and everything is broken. And then they try to install some other package, and it depends on foo >= 1.2, and they have an editable install of foo that claims to be 1.1 (because that was the last time the .egg-info was regenerated) but really it's 1.3 and all kinds of weird things happen. And for packages with binary extensions, it doesn't really work, anyway, because you still have to rebuild every time (and you can get extra bonus forms of weird skew, where when you import the package then you get the up-to-date version of some source files -- the .py ones -- combined with out-of-date versions of others -- the .pyx / .c / .cpp ones). Even if I do decide that I want to install a non-official release into some virtualenv, I'd like to install a consistent snapshot that gets upgraded or uninstalled all together as an atomic unit. What I actually do when working on NumPy is that I use a little script [1] that does the equivalent of: $ rm -rf ./.tmpdir $ pip install . -d ./.tmpdir $ cd ./.tmpdir $ python -c 'import numpy; numpy.test()' OTOH, for packages without binary extensions, I just run my tests or start a REPL from the root of my source dir, and that works fine without the hassle of creating and activating a virtualenv, or polluting my normal environment with untested code. Also, 'pip install -e' intrinsically pollutes your source tree with build artifacts. I come from the build system tradition that says that build artifacts should all be shunted to the side and leave the actual directories uncluttered: https://www.gnu.org/software/automake/manual/html_node/VPATH-Builds.html and I think that a valid approach that build system authors might want to make is to enforce the invariant that the build system never writes to anywhere outside of $srcdir/build/ or similar. If we insist that editable installs are the only way to work, then we take this option away from projects. So there simply isn't any problem I have where editable installs are the best solution, and I see them causing problems for people all the time. That said, there are two theoretical advantages I can see to editable installs: 1) Unlike starting an interpreter from the root of your source tree, they trigger the install of runtime dependencies. I solve this by just installing those into my working environment myself, but for projects with complex dependencies I guess 'install -e' might ATM be the most convenient way to get this set up. This isn't a very compelling argument, though, because one could trivally provide better support for just this ('pip install-dependencies .' or something) without bringing along the intrinsically tricky bits of editable installs. 2) For people working on complex projects that involve multiple pure-python packages that are distributed separately but that require coordinated changes in sync (maybe OpenStack is like this?), so each round of your edit/test cycle involves edits to multiple different projects, then 'pip install -e' kinda solves a genuine problem, because it lets you assemble a single working environment that contains the editable versions of everything together. This seems like a genuine use case -- but it's what I meant at the top about how they seem like a very specialized tool for rare cases, because very few people are working on meta-projects composed of multiple pure-python sub-projects evolving in lock-step. Anyway, like I said, I'm not trying to argue that 'pip install -e' should be deprecated -- I understand that many people love it for reasons that I don't fully understand. My goal is just to help those who think 'pip install -e' is obviously the one-and-only way to do python development to understand my perspective, and why we might want to support other options as well. I think the actual bottom line for pip as a project is: we all agree that sooner or later we have to move users away from running 'setup.py install'. Practically speaking, that's only going to happen if 'pip install' actually functions as a real replacement, and doesn't create regressions in people's workflows. Right now it does. The thing that started this whole thread is that numpy had actually settled on going ahead and making the switch to requiring pip install, but then got derailed by issues like these... -n [1] https://github.com/numpy/numpy/blob/master/runtests.py -- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig