Daniel Kahn Gillmor <d...@fifthhorseman.net> writes: > On Sun 2017-09-17 16:26:25 -0700, Russ Allbery wrote:
>> I personally lean towards 2, which is consistent with what's in Policy >> right now, but I can see definite merits in 3. I believe the >> reproducible builds project is currently sort of doing 1, but I have a >> hard time seeing how to make that viable on the testing side. > Thanks for raising this question, Russ! > I'm not sure that we should let lack of exhaustive testing push us away > from (1). (1) is in principle the right thing -- it's easy to make a > build reproducible if we tell people that they have to do exactly one > specific thing. But we generally want people to be able to run > heterogenous systems, and not to force them into one particular > environment. Well... I would argue that the amount of time and effort that's gone into this project shows that it's not that easy to make a build reproducible even when telling people to do exactly one thing. :) But I get your point. > Consider someone who wants to see more logging from a build, for > example. There could be an environment variable that encourages the > toolchain to log more, but doesn't affect the binary objects created by > the build. By going with choices (2) or (3) we effectively dismiss even > considering the reproducibility of those builds, which seems like a > shame. This is the case for (2), but not for (3). Indeed, this is exactly the distinction between (2) and (3). It does mean that discovery of any new such environment variable would require a change to our whitelist in approach (3), so there would be some lag and the whitelist would become long over time (with a corresponding testing load). But (3) does try to achieve that use case without trying to anticipate any possible environment variable setting. It lets us be reactive to newly-discovered environment variables across which we want to stay reproducible. > Does everything in policy need to be rigorously testable? or is it ok > to have Policy state the desired outcome even if we don't know how (or > don't have the resources) to test it fully today. I don't think everything has to be rigorously testable, but I do think it's a useful canary. If I can't test something, I start wondering whether that means I have problems with my underlying assumptions. In particular, for (1), we have no comprehensive list of environment variables that affect the behavior of tools, and that list would be difficult to create. Many pieces of software add their own environment variables with little coordination, and many of those variables could possibly affect tool output. I feel like the work for (1) and for (3) ends up being comparable; for (1) we have to maintain a blacklist, and for (3) we have to maintain a whitelist. But (3) is testable, whereas (1) is inherently aspirational and will always have to be aspirational. We're endlessly going to be discovering some other environment variable that changes tool output. I'm also unsure that (1) is even what we want to claim. Do we really want to say that builds are always reproducible if you don't change this short list of environment variables, no matter whatever other environment variables you set? There's some appeal in this for the end user, but it feels very frustrating for the package maintainer. At first glance, as a package maintainer, I'd think I'd have to maintain a huge blacklist of environment variables that I've discovered affect my toolchain somewhere, and explicitly unset them all in debian/rules. This doesn't feel like a good use of anyone's time (and may actually *break* other, non-reproducibility-related things that people want to do with my package). -- Russ Allbery (r...@debian.org) <http://www.eyrie.org/~eagle/>