I've noticed that there seems to be a lot of confusion out there about what setuptools is and/or does, at least among Python-Dev folks, so I thought it might be a good idea to give an overview of its structure, so that people have a better idea of what is and isn't "magic".
Setuptools began as a fairly routine collection of distutils extensions, to do the same boring things that everybody needs distutils extensions to do. Basic stuff like installing data with your packages, running unit tests, that sort of thing. At some point, I was getting tired of having to deal with dependencies by making people install them manually, or else having to bundle them. I wanted a more automated way to deal with this problem, and in 2004 brought the problem to the distutils-sig and planned to do a PyCon sprint to try to address the problem. Tim Peters encouraged me to move the preliminary work I'd done to the Python sandbox, where others could follow the work and improve upon it, and he sponsored me for CVS privileges so I could do so. As it turned out, I wasn't able to go to PyCon, but I produced some crude stuff to try to implement dependency handling, based on some previous work by Bob Ippolito. Bob's stuff used imports to check version strings, and mine was a bit more sophisticated in that it could scan .py or .pyc files without actually importing them. But there was no reasonable way to track download URLs, though, or deal with the myriad package formats (source, RPM, etc.) platform-specificness, etc. and PyPI didn't really exist yet. To top it all off, within a couple of months I was laid off, so the problem ceased to be of immediate practical interest for me any more. I decided to take a six-month sabbatical and work on RuleDispatch, after which I began contracting for OSAF. OSAF's Chandler application has a plugin platform akin to Eclipse, and I saw that it was going to need a cross-platform plugin format. I put out the call to distutils-sig, and Bob Ippolito took up the challenge. We designed the first egg format, and we agreed that it should support Python libraries, not just plugins, and that it should be possible to treat .egg zipfiles and directories interchangeably, and that it should be possible to put more than one conceptual egg into one physical zipfile. The true "egg" was the project release, not the zipfile itself. (We called a zipfile containing multiple eggs a "basket", which we thought would be useful for things like py2exe. pkg_resources still supports baskets today, but there are no tools for generating them - you have to just zip up a bunch of .egg directories to make one.) Bob wrote the prototype pkg_resources module to support accessing resources in zipfiles and regular directories, while I worked on creating a bdist_egg command, which I added to the then-dormant setuptools package, figuring that the experimental dependency stuff could be later refactored to allow dependencies to be resolved using eggs. We had a general notion that there would be some kind of web pages you could use to list packages on, since at that time PyPI didn't allow uploads yet. Or at any rate, we didn't know about it until PyCon in 2005. After PyCon, I kept hearing about projects to make a CPAN-like tool for Python, such as the Uragas project. However, all of these projects sounded like they were going to reinvent everything from scratch, particularly a lot of stuff that Bob and I had just done. It then occurred to me for the first time that the .egg format could be used to solve the problems both of having a local package database, and also the uninstallation and upgrade of packages. In fact, the only piece missing was that there was no way to find and download the packages to be installed, and if I could solve that problem, the CPAN problem would be solved. So, I did some research by taking a random sample of packages from PyPI, to find out what information people were actually registering. I found that, more often than not, at least one of their PyPI URLs would point to a page that had links to packages that could be downloaded directly. And that was basically enough to permit writing a very simple spider that would only follow "download" or "homepage" links from PyPI pages, and would also inspect URLs to see if they were recognizable as distutils-generated filenames, from which it could extract package name and version info. Thus, easy_install was born, completing what some people now call the eggs/setuptools/easy_install trifecta. If you are going to work on or support these tools, it's important that you understand that these three things are related, but distinct. Setuptools is at heart just an ordinary collection of distutils enhancements, that just happens to include a bdist_egg command. EasyInstall is another enhanced command built on setuptools, that leverages setuptools to build eggs for packages that don't have them. But setuptools in its turn depends on EasyInstall, so that packages can have dependencies. So the components are: pkg_resources: standalone module for working with project releases, dependency specification and resolution, and bundled resources setuptools: a package of distutils extensions, including ones to build eggs with easy_install: a distutils extension built using setuptools, that finds, downloads, builds eggs for, and installs packages that use either distutils or setuptools And if you look at that list, it's pretty easy to see which part is the most magical, implicit, heuristic, etc. It's easy_install, no question. If it weren't for the fact that easy_install tries to support non-setuptools packages, there would be little need for monkeypatching or sandboxing. If it weren't for the fact that easy_install tries to interpret web pages, there would be no need for heuristics or guessing. So, in a perfect world where everybody neatly files everything with PyPI, easy_install would not have anything implicit about it. But this isn't a perfect world, and to gain adoption, it had to have backward compatibility. If easy_install could handle *enough* existing packages, then it would encourage package authors to use it so that they could depend on those existing packages. These authors would end up using setuptools, which would then tend to ensure that *their* package would be easy_install-able as well. And, since the user needs setuptools to install these new packages, then the user now has setuptools, and the option to try using it to install other packages. Users then encourage package authors to have correct PyPI information so their packages can be easy_install-ed as well, and the network effect increases from there. So, I bundled all three things (pkg_resources, setuptools, and easy_install) into a single distribution bundle precisely so it would have this "viral" network effect. I knew that if everybody had to be made to get their PyPI entries straight *first*, it would never work. But if I could leverage an ever-growing user population to put pressure on authors and system packagers, and an ever-growing author population to increase the number of users, then the natural course of things should be that packages that don't play will die off, be forked, etc., and those who do play will be rewarded with more users. I made an explicit, conscious, and cold-blooded decision to do things that way, knowing full well that it would immediately kill off all the competing "CPAN for Python" projects, and that it would also force lots of people to deal with setuptools who didn't care about it one way or another. The community as a whole would benefit immensely, even if the costs would be borne by people who didn't agree with what I was doing. So, yes, I'm a cold calculating bastard. EasyInstall is #1 in the field because it was designed to make its competition irrelevant and to virally spread itself across the entire Python ecosphere. I'm pointing these things out now because I think it's better not to mince words; easy_install was designed with Total World Domination in mind from day one and that is exactly what it's here to do. Compatibility at any cost is its watchword, because that is what fuels its adoption. End-users are its market, because what the end users want ultimately controls what the developers and the packagers do. Thus, if you look at the history of setuptools, you'll see that the vast majority of work I do on it is increasing the Just-Works-iness of easy_install. The majority of changes to non-easy_install code (and both setuptools.package_index and setuptools.sandbox are there only for easy_install) are architectural or format changes intended to support greater justworksiness for easy_install. (There are also lots of changes included to enhance setuptools' usefulness as a distutils extension, but these are driven mainly by user requests and Chandler needs, and there aren't nearly as many such changes.) So, if you take easy_install and its support modules entirely out of setuptools, you would be left with a modest assortment of distutils extensions, most of which don't have any backward compatibility issues. They could be merged into the distutils with nary a complaint. The only significant change is the "sdist" command, which in setuptools supports a cleaner (and extensible) way of managing the source distribution manifest, that frees developers from messing with the MANIFEST file and remembering to constantly add junk to MANIFEST.in. And there's probably some way we could decide to either keep the old behavior or make the old behavior an option for anybody who's relying on the way it worked before. And that's all well and good, but now you don't have the features that are the real reason end users want the whole thing: easy_install. And it's not just the users. Package authors want it too. TurboGears really couldn't exist without this. It's easy to argue that oh, they could've made distribution packages for six formats and nine platforms, or they could've made tarballs, etc. to bundle all the dependencies in, but those approaches really just don't scale -- especially for the single package author just starting to build something new. None of these options are economically viable for the author of a new package, especially if their core competency isn't packaging and distribution. Now that there's a Turbogears community, yes, there are probably people available who can do a lot of those distribution-related tasks. But there wouldn't have *been* a community if Kevin couldn't have shipped the software by himself! This is the *real* problem that I always meant to address, from the very beginning: Python development and distribution *costs too much* for the community to flourish as it should. It's too hard for non-experts, and until now it required bundling, system packaging, or asking users to install their own dependencies. But asking users to install dependencies doesn't scale for large numbers of dependencies. And not being able to reuse packages leads to proliferating wheel-reinvention, because installation cost is a barrier to entry. So, the work that I've done is simply social engineering through economic leverage. The goal is to change the cost equations so that entry barriers for package distribution are low, so that users can try different packages, so they can switch, so market forces can choose winners. Because switching and installation costs are low, interoperability and reuse are more attractive choices, and more likely to be demanded by users. You can already see these forces taking effect in such developments as the joint CherryPy/TurboGears template plugin interface, which uses another setuptools innovation (entry points) to allow plug-and-play. I am doing all this because I got tired of reinventing wheels. When you add in installation costs, writing your own package looks more attractive than reusing the other guy's. But if installation is cheap, then people are more inclined to overlook the minor differences between how the other guy did it and how they would have done it, and are more likely to say to the "other guy", hey, I like this but would you add X? And it's more likely that the "other guy" will say yes, because it will *multiply* his install base to get another published package depending on his project. So, my question to all of you is, is that worth a little implicitness, a little magic? My answer, of course, is yes. It will probably be a multi-year effort to get the state of community practice up to a level where all the heuristics and webscraping can be removed from easy_install, without negatively affecting the cost equation. Or maybe not. Maybe we're just hitting the turn of the hockey stick now, and inclusion in 2.5 is just what the doctor ordered to kick the number of users so high that anybody would be crazy not to have clean PyPI listings, I don't know. To be honest, though, I think the outstanding proposal on Catalog-SIG to merge Grig's "CheeseCake" rating system into PyPI (so that package authors will be shown what they can do to improve their listing quality) will actually have more direct impact on this than 2.5 inclusion will. Guido's choice to bless setuptools is important for system packagers and developers to have confidence that this is the direction Python is taking; it doesn't have to actually go *in* 2.5 to do that. install_egg_info clearly shows the direction we're taking. So, after reading all the other stuff that's gone by in the last few days, this is what I think should happen: First, setuptools should be withdrawn from inclusion in Python 2.5. Not directly because of the opposition, but because of the simple truth that it's just not ready. Some of that is because I've spent way too much time on the discussions this week, to the point of significant sleep deprivation at one point. But when Guido first asked about it, I had concerns about getting everything done that really needed to be done, and effectively only agreed because I figured out a way to allow new versions to be distributed after-the-fact. With the latest Python 2.5 release schedule, I'd be hard-pressed to get 0.7 to stability before the 2.5 betas go, certainly if I'm the only one working on it. And a stable version of 0.7 is really the minimum that should go in the standard library, because the package management side of things really needs to have commands to list, uninstall, upgrade, etc., and they need to be easy to understand, not the confusing mishmash that is easy_install's current assortment of options. (Which grew organically, rather than being designed ahead of time.) And Fredrik is right to bring up concerns about both easy_install's confusing array of options, and the general support issues of asking Python-Dev to adopt setuptools. These are things that can be addressed, and *are* being addressed, but they're not going to happen by Tuesday, when the alpha release is scheduled. I hate to say this, because I really don't want to disappoint Guido or anyone on Python-Dev or elsewhere who has been calling for it to go in. I really appreciate all your support, but Fredrik is right, and I can't let my desire to please all of you get in the way of what's right. What *should* happen now instead, is a plan for merging setuptools into the distutils for 2.6. That includes making the decisions about what "install" and "sdist" should do, and whether backward compatibility of internal behaviors should be implicit or explicit. I don't want to start *that* thread right now, and we've already heard plenty of arguments on both sides. Indeed, since Martin and Marc seem to be diametrically opposed on that issue, it is guaranteed that *somebody* will be unhappy with whatever decision is made. :) Between 2.5 and 2.6, setuptools should continue to be developed in the sandbox, and keep the name 'setuptools'. For 2.6, however, we should merge the code bases and have setuptools just be an alias. Or, perhaps what is now called setuptools should be called "distutils2" and distributed as such, with "setuptools" only being a legacy name. But regardless, the plan should be to have only one codebase for 2.6, and to issue backported releases of that codebase for at least Python 2.4 and 2.5. These ideas are new for me, because I hadn't thought that anybody would have cared enough to want to get into the code and share any of the work. That being the case, it seems to make more sense for me to back off a little on the development in order to work on developer documentation, such as of the kind Fredrik has been asking for, and to work on a development roadmap so we can co-ordinate who will work on what, when, to get 0.7 to stability. In the meantime, Python 2.5 *does* have install_egg_info, and it should definitely not be pulled out. install_egg_info ensures that every package installed by the distutils is detectable by setuptools, and thus will not be reinstalled just because it wasn't installed by setuptools. And there is one other thing that should go into 2.5, and that is PKG-INFO files for each bundled package that we are including in the standard library, that is distributed separately for older Python versions and is API-compatible. So for example, if ctypes 0.9.6 is going in Python 2.5, it should hav a PKG-INFO in the appropriate directory to say so. Thus, programs written for Python 2.4 that say they depend on something like "ctypes>=0.9" will work with Python 2.5 without needing to change their setup scripts to remove the dependency when the script is run under Python 2.5. Last, but not least, we need to find an appropriate spot to add documentation for install_egg_info. These are tasks that can be accomplished for 2.5, they are reasonably noncontroversial, and they do not add any new support requirements or stability issues that I can think of. One final item that is a possibility: we could leave pkg_resources in for 2.5, and add its documentation. This would allow people to begin using its API to check for installed packages, accessing resources, etc. I'd be interested in hearing folks' opinions about that, one way or the other. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com