On Tue, Jun 20, 2006 at 11:46:35AM -0400, Laszlo (Laca) Peter wrote: <snip>
Thanks a lot for writing this up; I think it will serve as a valuable guide for future contributors. A few thoughts inline. > FWIW, I do believe that this system works great for ON and other > in-house developed projects where developers can update the Makefiles > and the pkgmaps when they introduce a new header or library or source > file. However, for Open Source packages we have to do this for the > community developers: lots of possible mistakes and in fairness, they > know it better than us, so it's better to use their Makefiles. In general, that's logical, but unfortunately they also make a lot of assumptions that aren't valid for an integrated offering like OpenSolaris. They may use nonstandard (or differently-standardised) directories, some of which may not be possible to change at build time; they may deliver private headers that should not be delivered at all; they may overwrite existing files in the shared proto area, making package construction difficult. Using a per-component proto area is helpful in solving this last problem, but it has two major drawbacks as well: it makes it effectively impossible to manage intra-consolidation, inter-component flag days (because you can't build dependents against their current dependencies), and it requires additional work to deliver packages consisting of parts or all of multiple components (such as SUNWsfwhea, unfortunate though that package may be). I'd be interested to understand how you would address these shortcomings: the objective is to end up with a single internally-consistent shared proto area with known contents that can be packaged in arbitrary ways, without requiring root privileges during the build. > In my day job, I work on JDS and I maintain the JDS build system, > so allow me to make a comparison. (I'm obviously biased, but I'll > try my best to be fair.) Our builds are also based on community > tarballs and source patches. In JDS, we use the Makefile system > supplied by the originating community, which usually means > configure; make; make install. The "make install" part > is redirected (using the DESTDIR Makefile variable) to a per-package > proto area[3]. This means that by default, any files that the community We've often suggested that make install (when available) is the better route for the Companion, and perhaps even for SFW, but only once its entire effects are thoroughly understood. For example, many makefiles are broken and attempt (and succeed, if building as root) to install files to /, ignoring DESTDIR or similar variables. Other more subtle brokenness is common as well. > maintainers intended to install are installed, and to the correct > (relative) locations. Then we remove any files that we decided not to > deliver, for example lib*.a, lib*.la. This is the opposite of SFW's > philosophy, where we decide which files we wish to deliver and deal with > them one by one. We are now ready to create packages. Pkgmaps are > created dynamically from glob lists. And here's where we really come to the crux of the philosophical differences between JDS and the rest of the Solaris organisation, especially but not only ON. The procedure you're describing is one which asserts that "upstream is assumed to be correct" and if in doubt, the effects of using the autotools-based build system are assumed to be desirable. That's perfect for compiling what the GPL so eloquently terms "mere aggregations" of software, but it's dead wrong for building a tightly integrated polished product. As a concrete point, why should the default be to include all new files built by a component's makefiles? Shouldn't we at least know what we're shipping to customers? And if we know what we're shipping, is it so much to ask that we explicitly specify it (in packaging files, if not in install-sfw or equivalent) so that upon future updates engineers are reminded that this is *their* product, and they are expected to know what *they* are shipping to customers (to say nothing of noticing the changes and perhaps making necessary changes elsewhere in the system and/or alerting customers to them)? Why should change be so cavalier? It's bad enough that huge portions of Solaris - a Sun product, with the Sun brand affixed to it - go out the door without anyone except (possibly) the authors having read the code, but to dispense with even the rudiments of change control in the name of expediency is unconscionable. A perfect example of this is 6434055 python 2.4 SSL support is missing. Had JDS been using the SFW approach to delivering software, the missing ssl.so would have been immediately noticed (your build would fail), and the defective diffs would have been readily apparent. If the ON approach were in use, it would probably have been impossible from the beginning, since you would have been maintaining makefiles that you wrote yourself, forcing you to understand both the makefiles themselves and at least the rough outline of the product you're building. But this bug highlights an even worse problem: how do we know there aren't other pieces of functionality missing? The simple answer is that we don't, and can't. Even construction of a comprehensive after-the-fact test suite is made impossible by the assumption that the software itself is a moving target, changing in ways we don't take the time to understand. If a test that passed with the previous delivery fails with the new, do we write this off to mistaken assumptions in the test, or do we assume the new code is broken? Without understanding the intended changes to the code, it's simply not possible to know. This all boils down to pride of ownership. ON is so good in large part because the people who write it deliver it under their name and their brand, and are held individually accountable for it by their peers. That just doesn't happen with GNOME, because we're simply accepting a block of code without considering its suitability for our purposes, without understanding its technical characteristics, and without taking in it any pride of ownership. This isn't your fault; it's the result of unfortunate business decisions, and it has little or nothing to do with OpenSolaris (which will have the exact same problems any time third-party code is incorporated without due consideration for its correctness for *Open*Solaris). But it's still true. > The above steps are basically the same for all packages and all versions, > so updating to a newer version is often as simple as changing the > version number in the spec file[4]. Once the packages are built, it's > the same in both systems: we need to test the new packages. But how do you test them? You've already said that you just assume whatever files the makefiles install are the right ones, less any you already know we don't want. How can you possibly test new functionality, since you don't know what it is? And if you do know what it is, why is it so onerous a burden to list the files that provide it? > I realise we have long-standing traditions at Sun for doing things > a certain way and it's difficult to change. I'm just offering my > experiences and recommend that the SFW and CCD builds be made more > automatic and more convenient. In it's present form, SFW/CCD isn't > encouraging or inviting contributions, because of the tedious work > involved.[5] Inviting contributions is not an end of itself but a means to an end. The goal, ultimately, is to make the software delivered as OpenSolaris (and, for Sun employees, as Solaris) the best it can possibly be for the people who use it. Inviting more people, with more knowledge and skill and talent, to contribute to the software they use, and by doing so to improve it, can be - and, I believe, is and will remain for OpenSolaris - a huge win for quality. But it is not itself the goal; if a tradeoff exists between making work easier (so that more people will want to do it) and ensuring that work is of acceptable quality, quality must always take precedence. I'm in no way suggesting that the SFW strategy is ideal, or even that it's the right one; it's, as you've noted, time-consuming, obscure, and in some cases even error-prone. But the JDS solution is worse - it represents a belief that saving engineers time and effort is more important than delivering a well-understood, high-quality product. That philosophy I can never accept, and will for as long as I can type use every means within my power to contain and extinguish. Instead, we need to look for ways to invite contributions to SFW (and the Companion) within a framework that supports the delivery of quality, well-understood software, and those which would improve that framework. If the most inviting acceptable framework we can collectively devise is still unattractive to would-be contributing engineers, so be it. Quality is a constraint; participation is a goal; participation in improvement of quality-oriented processes is welcome. -- Keith M Wesolowski "Sir, we're surrounded!" Solaris Kernel Team "Excellent; we can attack in any direction!"
