Jeff Law: > On 07/21/2017 10:15 AM, Ximin Luo wrote: >> (Please keep me on CC, I am not subscribed) >> >> >> Proposal >> ======== >> >> This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. When >> this is set, GCC will treat this as extra implicit >> "-fdebug-prefix-map=$value" >> command-line arguments that precede any explicit ones. This makes the final >> binary output reproducible, and also hides the unreproducible value (the >> source >> path prefixes) from CFLAGS et. al. which many build tools (understandably) >> embed as-is into their build output. > I'd *really* avoid doing this with magic environment variables. Make it > a first class option to the compiler. Yes, it means projects that want > this behavior have to arrange to pass that flag to their compiler, but > IMHO that's much preferred over environment variables. > > Jeff >
Hi Jeff, If by "first class option" you meant a command-line flag, GCC *already has* that (-fdebug-prefix-map) and it wasn't enough to achieve reproducibility in many cases we tested. dpkg-buildflags actually already adds these flags to CFLAGS CXXFLAGS etc on Debian. However, with this patch using the environment variable, we are able to reproduce 1800 more packages out of 26000. GCC already supports a similar environment variable SOURCE_DATE_EPOCH, which was accepted about 2 years ago in a patch written by one of our GSoC students. We are not planning any more environment variables like this, and are committed to fixing other sources of non-determinism by patching the relevant build scripts. However for build-paths, we think that this environment variable is the best practical approach here. It is even more prevalent than the timestamps issue - when SOURCE_DATE_EPOCH was accepted into GCC, we were able to reproduce about 400 more packages. With BUILD_PATH_PREFIX_MAP, this number is about 1800. I understand that environment variables have a bit of a "magical feel" to them, but I think that this scenario fits their use-cases well. The idea of an "environment" or a "context" is used in many scenarios in programming (including pure functional programming) when one wants to avoid passing the same arguments down multiple layers of function calls, especially when only some of these layers (and often, only the bottom one) use these arguments and the other layers totally ignore them (except to pass it down). I think this fits the situation here - patching many many build scripts simply to take a prefix-map and pass it downwards one layer in the "buildsystem stack", and do nothing else with this information, does not seem like a clean nor scalable solution to me. I can empathise that UNIX global environment variables is not the most elegant manifestation of this general idea of "an environment", but I think the overall end result in this case, is still cleaner than "boilerplate argument passing". On top of all of this, we've seen that some build scripts will save GCC's command-line arguments into the build output. But this defeats the purpose of setting -fdebug-prefix-map for reproducibility purposes, since it contains an unreproducible value that is meant to be stripped out of the build output. Indeed, GCC itself used to do this in DW_AT_producer, until we submitted a patch over a year ago to specifically ignore -fdebug-prefix-map when writing DW_AT_producer. If we could only set -fdebug-prefix-map via the CLI and not this envvar, we'd have to teach this logic to all other build scripts that want to save compiler flags as well. By contrast, if we use a previously-unused envvar, and explicitly say that this should not be saved into build output, this problem goes away. (I think it is reasonable to save CLI flags into build output, because they very obviously affect what the program does. However, it is much less reasonable to save arbitrary envvars into build output, and we currently would treat this as a reproducibility bug and potential privacy leak.) For all of these reasons, that is why I decided an envvar approach was best here. At least, there are even stronger arguments for this one, than for SOURCE_DATE_EPOCH. X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git