Jeff Law:
> On 07/21/2017 10:15 AM, Ximin Luo wrote:
>> (Please keep me on CC, I am not subscribed)
>>
>>
>> Proposal
>> ========
>>
>> This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. When
>> this is set, GCC will treat this as extra implicit 
>> "-fdebug-prefix-map=$value"
>> command-line arguments that precede any explicit ones. This makes the final
>> binary output reproducible, and also hides the unreproducible value (the 
>> source
>> path prefixes) from CFLAGS et. al. which many build tools (understandably)
>> embed as-is into their build output.
> I'd *really* avoid doing this with magic environment variables.  Make it
> a first class option to the compiler.  Yes, it means projects that want
> this behavior have to arrange to pass that flag to their compiler, but
> IMHO that's much preferred over environment variables.
> 
> Jeff
> 

Hi Jeff,

If by "first class option" you meant a command-line flag, GCC *already has* 
that (-fdebug-prefix-map) and it wasn't enough to achieve reproducibility in 
many cases we tested. dpkg-buildflags actually already adds these flags to 
CFLAGS CXXFLAGS etc on Debian. However, with this patch using the environment 
variable, we are able to reproduce 1800 more packages out of 26000.

GCC already supports a similar environment variable SOURCE_DATE_EPOCH, which 
was accepted about 2 years ago in a patch written by one of our GSoC students. 
We are not planning any more environment variables like this, and are committed 
to fixing other sources of non-determinism by patching the relevant build 
scripts.

However for build-paths, we think that this environment variable is the best 
practical approach here. It is even more prevalent than the timestamps issue - 
when SOURCE_DATE_EPOCH was accepted into GCC, we were able to reproduce about 
400 more packages. With BUILD_PATH_PREFIX_MAP, this number is about 1800.

I understand that environment variables have a bit of a "magical feel" to them, 
but I think that this scenario fits their use-cases well. The idea of an 
"environment" or a "context" is used in many scenarios in programming 
(including pure functional programming) when one wants to avoid passing the 
same arguments down multiple layers of function calls, especially when only 
some of these layers (and often, only the bottom one) use these arguments and 
the other layers totally ignore them (except to pass it down). I think this 
fits the situation here - patching many many build scripts simply to take a 
prefix-map and pass it downwards one layer in the "buildsystem stack", and do 
nothing else with this information, does not seem like a clean nor scalable 
solution to me. I can empathise that UNIX global environment variables is not 
the most elegant manifestation of this general idea of "an environment", but I 
think the overall end result in this case, is still cleaner than "boilerplate 
argument passing".

On top of all of this, we've seen that some build scripts will save GCC's 
command-line arguments into the build output. But this defeats the purpose of 
setting -fdebug-prefix-map for reproducibility purposes, since it contains an 
unreproducible value that is meant to be stripped out of the build output. 
Indeed, GCC itself used to do this in DW_AT_producer, until we submitted a 
patch over a year ago to specifically ignore -fdebug-prefix-map when writing 
DW_AT_producer. If we could only set -fdebug-prefix-map via the CLI and not 
this envvar, we'd have to teach this logic to all other build scripts that want 
to save compiler flags as well. By contrast, if we use a previously-unused 
envvar, and explicitly say that this should not be saved into build output, 
this problem goes away. (I think it is reasonable to save CLI flags into build 
output, because they very obviously affect what the program does. However, it 
is much less reasonable to save arbitrary envvars into build output, and we 
currently would treat this as a reproducibility bug and potential privacy leak.)

For all of these reasons, that is why I decided an envvar approach was best 
here. At least, there are even stronger arguments for this one, than for 
SOURCE_DATE_EPOCH.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

Reply via email to