Re: [PATCH v3] Ability to remap file names in __FILE__, etc (PR other/70268)
Boris Kolpackov: > Ximin Luo writes: > >> -I to an absolute path is not that common for system / distro-built >> stuff. > > Ok, thanks for clarifying. > > >> In the cases that it occurs, indeed it could and should be fixed >> by the package buildsystem, e.g. by stripping a prefix when they >> add -I flags to CFLAGS. But that's a separate issue from what >> we're talking about here. > > I believe it is the same issue: any package that blindly embeds > information about how it was built into the result of the build > does not care about reproducible builds. > I don't think that's realistic position to take, and it seems like a fairly arbitrary assertion made merely to "brush this issue under the carpet". I think this because of the "implicit contract" of how CFLAGS is generally used: CFLAGS (and other flags) is a way to pass information from multiple layers of programs down to GCC. It is necessary to "blindly embed" it if you want to save it *at all*, otherwise you are forced to violate the separation-of-concerns between layers in your buildsystem stack. - Program A sets CFLAGS += --flags_A, then calls program B - Program B sets CFLAGS += --flags_B, then calls program C - Program C sets CFLAGS += --flags_C, then calls Make - Make reads CFLAGS then calls $(CC) $(CFLAGS) [etc] In general, you are not supposed to edit what was already in CFLAGS, you are supposed to append to it. That is why it's good practise to add "--no" options for various command-line flags in gcc (and other tools). So each program X has its own flags_X that it is "responsible for". If all programs in the stack makes sure their own flags_X contain reproducible values, then the whole build will be reproducible even when recording CFLAGS. However, this breaks in the case of -f*-prefix-map. If Program A sets this flag in flags_A, then this interferes with Program C, even though flags_C was fully reproducible. Note that program B and C don't even know that they are being used with GCC, and so really have no business filtering -f*-prefix-map out of the CFLAGS that program A already set. My proposal would allow all the flags_{A,B,C} to be reproducible, and avoid different programs having to embed layer-violating logic into themselves. > If others do agree that this should be "fixed" in GCC, then I > would suggest that you add a separate option for the environment > variable case (e.g., -ffile-prefix-map-env) and sidestep the > whole "how to portably distinguish a path from an environment > variable" issue. > That would also be fine for me sure. X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Re: [PATCH v3] Ability to remap file names in __FILE__, etc (PR other/70268)
Boris Kolpackov: > Ximin Luo writes: >> Boris Kolpackov: >> >>> This does feel like we are trying to fix the issue in the wrong place. >>> Also, won't such broken packages normally store all options (including >>> -I) rather than just CFLAGS? And if the answer is no, then perhaps you >>> could put -ffile-prefix-map into CPPFLAGS instead of CFLAGS? Kind of >>> even feels right seeing that it affects the preprocessor. >>> >> >> The broad issue affects -fdebug-prefix-map as well, not just __FILE__. > > -ffile-prefix-map is the "common" option that (currently) > takes care of both debug and macro remapping (and should we > add any new -f*-prefix-map, those as well). > >> Such packages are not "broken"; GCC itself stores command-line >> arguments in DW_AT_producer and as mentioned there was a bug to >> special-case filtering -fdebug-prefix-map out of that. >> >> Now we'll have to add exceptions for -ffile-prefix-map and all >> the other flags you added, and everything else that might be >> added in future. > > -ffile-prefix-map and -fmacro-prefix-map are already excluded. > And there is a note at the appropriate place in source code to > exclude any new -f*-prefix-map should they be added. > Fair enough, I didn't look at your entire patch. But don't you think stripping the whole value is unnecessary? One advantage of the ENV: approach is that we can remove this filtering, and then the =$to part can be present in DW_AT_producer, indicating the fact that the flag was indeed used, and what the $to value was. That would be helpful for debugging actually. > You also haven't answered my question about -I: don't the > projects that embed CFLAGS also embed CPPFLAGS (and thus > -I's with absolute/varying paths)? And if the answer is > no, then can't you use the same approach to make > -ffile-prefix-map a non-issue? > -I to an absolute path is not that common for system / distro-built stuff. In the cases that it occurs, indeed it could and should be fixed by the package buildsystem, e.g. by stripping a prefix when they add -I flags to CFLAGS. But that's a separate issue from what we're talking about here. One major use-case for this flag, is to be set for all builds by the distribution. (Setting it per-package would be much much more work.) In this case, even if the build is using relative paths for -I and setting $PWD to an relative path for __FILE__ (and this behaviour is not explicitly mentioned in any standard), as long as they save CFLAGS/CPPFLAGS their build would not be reproducible. I don't think it would be fair to impose this behaviour on packages, even if they are doing *everything else correctly* for reproducibility purposes. For example this one, the unreproducibility diff exists because we temporarily reverted to an unpatched dpkg (i.e. it sets -fdebug-prefix-map rather than using the envvar supported by our patched GCC): https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/freeradius.html (Then there are other programs that like to put cflags/cppflags in their --help text for debugging purposes, etc.) It does *everything else right* but the presence of that system-wide flag causes it to be unreproducible. If GCC does not support an envvar or other similar method to pass in this flag, then these packages are the ones that suffer. X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Re: [PATCH v3] Ability to remap file names in __FILE__, etc (PR other/70268)
Boris Kolpackov: > Ximin Luo writes: > >> Higher-level build scripts sometimes like to save CFLAGS etc into the build >> output, making the overall build output unreproducible even if GCC is >> playing >> nicely. Rather than add logic to strip -f{file,debug,macro,...}-prefix-map, >> into all possible higher-level programs that might want to save CFLAGS, it is >> simpler if GCC could read it from a user-specified environment variable. > > This does feel like we are trying to fix the issue in the wrong place. > Also, won't such broken packages normally store all options (including > -I) rather than just CFLAGS? And if the answer is no, then perhaps you > could put -ffile-prefix-map into CPPFLAGS instead of CFLAGS? Kind of > even feels right seeing that it affects the preprocessor. > The broad issue affects -fdebug-prefix-map as well, not just __FILE__. Such packages are not "broken"; GCC itself stores command-line arguments in DW_AT_producer and as mentioned there was a bug to special-case filtering -fdebug-prefix-map out of that. Now we'll have to add exceptions for -ffile-prefix-map and all the other flags you added, and everything else that might be added in future. The same goes for all possible buildsystems / Makefile / etc that are affected by the issue. Therefore, I concluded that the best and most appropriate solution was to fix it at the source. That is, allow it to be read from an environment variable, to "cancel out" the information that is already read from the filesystem - rather than forcing this build-specific non-deterministic information to be added to the command-line. X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Re: [PATCH v3] Ability to remap file names in __FILE__, etc (PR other/70268)
Hello, I see Boris' patch was accepted with some further changes. Any chance my additions for reproducible builds can be included as well? They are described below. It should still apply on top, minus the strchr -> strrchr change which was folded into Boris' accepted version. Ximin Ximin Luo: > Boris Kolpackov: >> Thanks for the review. Third revision of the patch attached. >> >> [..] > Here is a follow-up patch, meant to be applied on top of Boris' patch, to > address some issues we found based on discussions and experiments at the > Reproducible Builds project. > > Most of this patch is re-implementing the patch from [1] originally by Daniel > Kahn Gillmor. Back then it was rejected in favour of a simpler approach > because > its necessity was not realised. However, more experiments since then have > revealed that its basic approach is useful in more general scenarios: > > Higher-level build scripts sometimes like to save CFLAGS etc into the build > output, making the overall build output unreproducible even if GCC is playing > nicely. Rather than add logic to strip -f{file,debug,macro,...}-prefix-map, > into all possible higher-level programs that might want to save CFLAGS, it is > simpler if GCC could read it from a user-specified environment variable. The > fact that the name of the environment variable is still present on the > command-line, should make it obvious what is being done, and avoid confusion > that might arise if an implicit "magic" environment variable is used. > > In the patch thread (see [1] and follow-ups) there was concern that the > prefix > '$' would cause awkward escaping behaviour, so the prefix 'ENV:' was chosen. > A > concern about this conflicting with AmigaOS paths was voiced but not > explored; > of course it can be changed to something else if needed. > > (NetBSD is already carrying this patch in their system GCC [2], presumably for > the same reason mentioned above.) > > [1] https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01168.html > [2] > http://cvsweb.netbsd.org/bsdweb.cgi/src/external/gpl3/gcc/dist/gcc/final.c?rev=1.2&content-type=text/x-cvsweb-markup > > The other thing that this patch does, is to split the old and new prefixs by > the last '=' sign rather than the first. This allows arbitrary old prefixes to > be mapped, which would be more flexible than mapping to arbitrary new > prefixes. > The rust compiler is adopting this strategy too: [3]. > > [3] https://github.com/rust-lang/rust/issues/41555#issuecomment-321078174 > > ChangeLogs > -- > > gcc/ChangeLog: > > 2017-12-19 Ximin Luo > > PR other/70268 > * file-prefix-map.c (add_prefix_map): Support reading old prefix from > the > environment. Split old and new parts by the last '=' rather than the > first. > * file-prefix-map.h: Define macros for the ENV: prefix. > > gcc/testsuite/ChangeLog: > > 2017-12-19 Ximin Luo > > PR other/70268 > * c-c++-common/ffile-prefix-map-env.c: New test. > > Index: gcc-8-20171217/gcc/file-prefix-map.c > === > --- gcc-8-20171217.orig/gcc/file-prefix-map.c > +++ gcc-8-20171217/gcc/file-prefix-map.c > @@ -41,16 +41,40 @@ add_prefix_map (file_prefix_map *&maps, > { >file_prefix_map *map; >const char *p; > + char *env; > + const char *old; > + size_t oldlen; > > - p = strchr (arg, '='); > + p = strrchr (arg, '='); >if (!p) > { >error ("invalid argument %qs to %qs", arg, opt); >return; > } > + if (0 == strncmp(FILE_PREFIX_MAP_ENV_PREFIX, arg, > FILE_PREFIX_MAP_ENV_PREFIX_OFFSET)) > +{ > + const char *arg_offset = arg + FILE_PREFIX_MAP_ENV_PREFIX_OFFSET; > + env = xstrndup (arg_offset, p - arg_offset); > + old = getenv (env); > + if (!old) > + { > + warning (0, "environment variable %qs not set in argument to %s", > +env, opt); > + free (env); > + return; > + } > + oldlen = strlen (old); > + free (env); > +} > + else > +{ > + old = xstrndup (arg, p - arg); > + oldlen = p - arg; > +} > + >map = XNEW (file_prefix_map); > - map->old_prefix = xstrndup (arg, p - arg); > - map->old_len = p - arg; > + map->old_prefix = old; > + map->old_len = oldlen; >p++; >map->new_prefix = xstrdup (p); >map->new_len = strlen (p); > Index: gcc-8-20171217/gcc/file-prefix-map.h > ===
Re: [PATCH v3] Ability to remap file names in __FILE__, etc (PR other/70268)
Boris Kolpackov: > Thanks for the review. Third revision of the patch attached. > > [..] Here is a follow-up patch, meant to be applied on top of Boris' patch, to address some issues we found based on discussions and experiments at the Reproducible Builds project. Most of this patch is re-implementing the patch from [1] originally by Daniel Kahn Gillmor. Back then it was rejected in favour of a simpler approach because its necessity was not realised. However, more experiments since then have revealed that its basic approach is useful in more general scenarios: Higher-level build scripts sometimes like to save CFLAGS etc into the build output, making the overall build output unreproducible even if GCC is playing nicely. Rather than add logic to strip -f{file,debug,macro,...}-prefix-map, into all possible higher-level programs that might want to save CFLAGS, it is simpler if GCC could read it from a user-specified environment variable. The fact that the name of the environment variable is still present on the command-line, should make it obvious what is being done, and avoid confusion that might arise if an implicit "magic" environment variable is used. In the patch thread (see [1] and follow-ups) there was concern that the prefix '$' would cause awkward escaping behaviour, so the prefix 'ENV:' was chosen. A concern about this conflicting with AmigaOS paths was voiced but not explored; of course it can be changed to something else if needed. (NetBSD is already carrying this patch in their system GCC [2], presumably for the same reason mentioned above.) [1] https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01168.html [2] http://cvsweb.netbsd.org/bsdweb.cgi/src/external/gpl3/gcc/dist/gcc/final.c?rev=1.2&content-type=text/x-cvsweb-markup The other thing that this patch does, is to split the old and new prefixs by the last '=' sign rather than the first. This allows arbitrary old prefixes to be mapped, which would be more flexible than mapping to arbitrary new prefixes. The rust compiler is adopting this strategy too: [3]. [3] https://github.com/rust-lang/rust/issues/41555#issuecomment-321078174 ChangeLogs -- gcc/ChangeLog: 2017-12-19 Ximin Luo PR other/70268 * file-prefix-map.c (add_prefix_map): Support reading old prefix from the environment. Split old and new parts by the last '=' rather than the first. * file-prefix-map.h: Define macros for the ENV: prefix. gcc/testsuite/ChangeLog: 2017-12-19 Ximin Luo PR other/70268 * c-c++-common/ffile-prefix-map-env.c: New test. Index: gcc-8-20171217/gcc/file-prefix-map.c === --- gcc-8-20171217.orig/gcc/file-prefix-map.c +++ gcc-8-20171217/gcc/file-prefix-map.c @@ -41,16 +41,40 @@ add_prefix_map (file_prefix_map *&maps, { file_prefix_map *map; const char *p; + char *env; + const char *old; + size_t oldlen; - p = strchr (arg, '='); + p = strrchr (arg, '='); if (!p) { error ("invalid argument %qs to %qs", arg, opt); return; } + if (0 == strncmp(FILE_PREFIX_MAP_ENV_PREFIX, arg, FILE_PREFIX_MAP_ENV_PREFIX_OFFSET)) +{ + const char *arg_offset = arg + FILE_PREFIX_MAP_ENV_PREFIX_OFFSET; + env = xstrndup (arg_offset, p - arg_offset); + old = getenv (env); + if (!old) + { + warning (0, "environment variable %qs not set in argument to %s", + env, opt); + free (env); + return; + } + oldlen = strlen (old); + free (env); +} + else +{ + old = xstrndup (arg, p - arg); + oldlen = p - arg; +} + map = XNEW (file_prefix_map); - map->old_prefix = xstrndup (arg, p - arg); - map->old_len = p - arg; + map->old_prefix = old; + map->old_len = oldlen; p++; map->new_prefix = xstrdup (p); map->new_len = strlen (p); Index: gcc-8-20171217/gcc/file-prefix-map.h === --- gcc-8-20171217.orig/gcc/file-prefix-map.h +++ gcc-8-20171217/gcc/file-prefix-map.h @@ -18,6 +18,9 @@ #ifndef GCC_FILE_PREFIX_MAP_H #define GCC_FILE_PREFIX_MAP_H +#define FILE_PREFIX_MAP_ENV_PREFIX "ENV:" +#define FILE_PREFIX_MAP_ENV_PREFIX_OFFSET (sizeof(FILE_PREFIX_MAP_ENV_PREFIX) - 1) + void add_macro_prefix_map (const char *); void add_debug_prefix_map (const char *); void add_file_prefix_map (const char *); Index: gcc-8-20171217/gcc/testsuite/c-c++-common/ffile-prefix-map-env.c === --- /dev/null +++ gcc-8-20171217/gcc/testsuite/c-c++-common/ffile-prefix-map-env.c @@ -0,0 +1,13 @@ +/* Test __builtin_FILE() with ENV: prefix-map. */ +/* { dg-do run } */ +/* { dg-set-compiler-env-var SRCDIR "$srcdir" } */ +/* { dg-options "-
Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path
Yury Gribov: > On Thu, Aug 3, 2017 at 12:45 PM, Ximin Luo wrote: >> Yury Gribov: >>> [..] >>> >>> Shouldn't -fdebug-prefix-map be updated to use the same syntax as >>> BUILD_PATH_PREFIX_MAP? >>> >> >> -fdebug-prefix-map is a CLI option and can be given multiple times, each >> flag given is in the form of $from=$to where $from can't contain a '='. >> >> BUILD_PATH_PREFIX_MAP is a single envvar that encodes a list-of-pairs of the >> form $to=$from:$to=$from with some escaping for flexibility and to support >> things like windows paths. Since it's a new envvar, Ian Jackson suggested >> $to=$from to emphasise the reproducible ($to) part. I liked the idea so I >> implemented it like that. (We did a lot of bikeshedding over on the >> rb-general mailing list about the exact format and this is what we settled >> on, I'd like to avoid getting into that again but would nevertheless do it, >> if it's necessary to get this patch accepted.) >> >> Because -fdebug-prefix-map currently only encodes one $from=$to pair, it >> would be a very disruptive and highly backward-incompatible change to make >> it use the same syntax as B_P_P_M. A slightly less disruptive but still >> backward-incompatible change would be to make it encode a single $to=$from >> pair, but I don't really see the advantage to doing so - what were your >> thoughts on this? > > I believe it would much easier to reason about environment variable > behavior when it boils down to "prepend some standard flag to > command-line flags". It would also simplify maintenance of local > compiler patch as core functionality can be merged to mainline GCC > whereas debatable environment variable part stays in the distro. > [answered in another email together with other related points] >> If by "first class option" you meant a command-line flag, GCC *already has* >> that (-fdebug-prefix-map) and it wasn't enough to achieve reproducibility in >> many cases we tested. >> dpkg-buildflags actually already adds these flags to CFLAGS CXXFLAGS etc on >> Debian. However, with this patch using the environment variable, we are able >> to reproduce 1800 more packages out of 26000. > > Just curious, why -fdebug-prefix-map (maybe modified to support > multiple renames) was not enough for these packages (and why they > can't be fixed instead)? > One important reason is that some packages embed CFLAGS/CXXFLAGS in build output such as pkg-config files or Makefiles to be installed as examples. To fix this, we'd have to add buildsystem-specific logic to strip out -fdebug-prefix-map when it was writing such output. This does not affect all of these 1800 packages, but I saw enough cases that I was convinced that the use of a new envvar was a better approach - I don't think buildsystems should be burdened with having to know "which flag values are reproducible vs not", this is not the case with other CFLAGS. X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path
Jakub Jelinek: > On Fri, Aug 04, 2017 at 08:32:33AM -0400, Matthias Klose wrote: GCC already supports a similar environment variable SOURCE_DATE_EPOCH, which was accepted about 2 years ago in a patch written by one of our GSoC students. We are not planning any more environment variables like this, and are committed to fixing other sources of non-determinism by patching the relevant build scripts. >>> I would have rejected that as well :-) One of the few times I would >>> have disagreed with Bernd. >> >> You can argue that gcc has command line options to set these, but most build >> systems can be influenced by well documented environment variables like >> CFLAGS, >> CXXFLAGS, LDFLAGS, so adding one more variable like SOURCE_DATE_EPOCH makes >> sense, considering that many tools dealing with timestamps don't even have >> command line options to do these (and there it's not just about compilers). > > Unlike SOURCE_DATE_EPOCH, the other env vars are autoconf/cmake etc. > related, we really shouldn't be adding more of these. > If some package has messed up build system, you can use > CXX='g++ -fwhatever' > or whatever other way to pass flags you want to the compiler or pick the > compiler you prefer to use. > As I said, this is the only envvar that we're planning to add, we're not planning to add any more. Please also see my point in my other email, about how this is really more of a "cancel-out already-existing environment" piece of data, different from other first-class options to GCC. It would be really nice not to have to patch 1800 packages to make them reproducible. X >> [..] -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path
Yury Gribov: > [..] > > In general any distro provides some way to set default parameters for > every package. Adding hacks to compiler to work around limitations of > particular build system does not sound right. > > Here's a relevant discussion from binutils ML: > https://sourceware.org/ml/binutils/2009-05/msg00086.html > I would argue that what is being worked around, is not a limitation of a particular buildsystem but rather a limitation of the compiler and the language. It is GCC itself that generates the unreproducible output, and this issues affects all buildsystems. They do not explicitly request unreproducible output, so the fix should not be the responsibility of buildsystem. GCC takes the input from an environment-like object (the filesystem). Part of this input (some prefix) differs between different builds, through no fault of any particular buildsystem but rather through the "fault" of the user running the build - they put the source and/or build directory under that path. Since no buildsystem is responsible for this part of the filesystem layout, it does not seem appropriate to me, to require buildsystems to add logic to deal with this - e.g. to pass this information down through layers of the tooling stack, to be eventually consumed by the GCC command line. Instead, it seems much cleaner for the "fix" to be supplied by the user (the same entity that controls the filesystem) via some other environment-like object (here, a UNIX envvar), and for GCC to read this directly just like it reads the abspath() / realpath() from the filesystem environment. CFLAGS/CXXFLAGS is also not appropriate because these flags are meant to directly affect how GCC works. By contrast, these prefix-map options are meant to *cancel out* already-existing sources of unreproducibility from the filesystem environment; they do not affect the output pro-actively like other first-class options do. Also, adding -fdebug-prefix-map to CFLAGS/CXXFLAGS causes unreproducibility elsewhere. (from another email) > I believe it would much easier to reason about environment variable > behavior when it boils down to "prepend some standard flag to > command-line flags". It would also simplify maintenance of local > compiler patch as core functionality can be merged to mainline GCC > whereas debatable environment variable part stays in the distro. To re-iterate my point: I don't think it's right to think of these prefix maps as the same as other command line flags. They *cancel out* already-existing sources of unreproducibility from the filesystem environment that GCC is *already reading*, that already makes the output hard-to-reason-about (i.e. unreproducible). So I think it is more appropriate to pass them in an environment-like object, rather than via command-line flags. Normally, system-wide CFLAGS etc are a static expression of policy. What I mean by that is, writing logic to determine CFLAGS for particular package, would "look like" the actual wording of that policy. So for example, "all packages should have debug" => "CFLAGS+=-g", "packages that match property X should have flag Y" => "if X then CFLAGS += Y". OTOH if your policy is "packages should be reproducible" and we have to add these prefix-remapping flags to CFLAGS, you're adding a dependency from the value of CFLAGS itself, onto the build-time filesystem. So CFLAGS is no longer a static expression of policy, it depends on information not part of the policy. So, prefix-map is really different from other CFLAGS, it adds extra dependency relationships that other flags don't add. X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path
Jeff Law: > On 08/02/2017 08:06 PM, Ximin Luo wrote: >> Jeff Law: >>> On 07/21/2017 10:15 AM, Ximin Luo wrote: >>>> (Please keep me on CC, I am not subscribed) >>>> >>>> >>>> Proposal >>>> >>>> >>>> This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. >>>> When >>>> this is set, GCC will treat this as extra implicit >>>> "-fdebug-prefix-map=$value" >>>> command-line arguments that precede any explicit ones. This makes the final >>>> binary output reproducible, and also hides the unreproducible value (the >>>> source >>>> path prefixes) from CFLAGS et. al. which many build tools (understandably) >>>> embed as-is into their build output. >>> I'd *really* avoid doing this with magic environment variables. Make it >>> a first class option to the compiler. Yes, it means projects that want >>> this behavior have to arrange to pass that flag to their compiler, but >>> IMHO that's much preferred over environment variables. >>> >>> Jeff >>> >> >> Hi Jeff, >> >> If by "first class option" you meant a command-line flag, GCC *already has* >> that (-fdebug-prefix-map) and it wasn't enough to achieve reproducibility in >> many cases we tested. dpkg-buildflags actually already adds these flags to >> CFLAGS CXXFLAGS etc on Debian. However, with this patch using the >> environment variable, we are able to reproduce 1800 more packages out of >> 26000. > Then take what you've done with the environment variable and instead > implement it on top of a switch. An environment variable is absolutely the > wrong thing to do here. > >> >> GCC already supports a similar environment variable SOURCE_DATE_EPOCH, which >> was accepted about 2 years ago in a patch written by one of our GSoC >> students. We are not planning any more environment variables like this, and >> are committed to fixing other sources of non-determinism by patching the >> relevant build scripts. > I would have rejected that as well :-) One of the few times I would > have disagreed with Bernd. > > Could you go into some more detail on why you think an envvar is absolutely the wrong thing to do here? X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path
Yury Gribov: > On 03.08.2017 3:06, Ximin Luo wrote: >> Jeff Law: >>> On 07/21/2017 10:15 AM, Ximin Luo wrote: >>>> (Please keep me on CC, I am not subscribed) >>>> >>>> >>>> Proposal >>>> >>>> >>>> This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. >>>> When >>>> this is set, GCC will treat this as extra implicit >>>> "-fdebug-prefix-map=$value" >>>> command-line arguments that precede any explicit ones. This makes the final >>>> binary output reproducible, and also hides the unreproducible value (the >>>> source >>>> path prefixes) from CFLAGS et. al. which many build tools (understandably) >>>> embed as-is into their build output. >>> I'd *really* avoid doing this with magic environment variables. Make it >>> a first class option to the compiler. Yes, it means projects that want >>> this behavior have to arrange to pass that flag to their compiler, but >>> IMHO that's much preferred over environment variables. >>> >>> Jeff >>> >> >> Hi Jeff, >> >> If by "first class option" you meant a command-line flag, GCC *already has* >> that (-fdebug-prefix-map) > and it wasn't enough to achieve reproducibility >> in many cases we tested. > > Shouldn't -fdebug-prefix-map be updated to use the same syntax as > BUILD_PATH_PREFIX_MAP? > -fdebug-prefix-map is a CLI option and can be given multiple times, each flag given is in the form of $from=$to where $from can't contain a '='. BUILD_PATH_PREFIX_MAP is a single envvar that encodes a list-of-pairs of the form $to=$from:$to=$from with some escaping for flexibility and to support things like windows paths. Since it's a new envvar, Ian Jackson suggested $to=$from to emphasise the reproducible ($to) part. I liked the idea so I implemented it like that. (We did a lot of bikeshedding over on the rb-general mailing list about the exact format and this is what we settled on, I'd like to avoid getting into that again but would nevertheless do it, if it's necessary to get this patch accepted.) Because -fdebug-prefix-map currently only encodes one $from=$to pair, it would be a very disruptive and highly backward-incompatible change to make it use the same syntax as B_P_P_M. A slightly less disruptive but still backward-incompatible change would be to make it encode a single $to=$from pair, but I don't really see the advantage to doing so - what were your thoughts on this? X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path
Jeff Law: > On 07/21/2017 10:15 AM, Ximin Luo wrote: >> (Please keep me on CC, I am not subscribed) >> >> >> Proposal >> >> >> This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. When >> this is set, GCC will treat this as extra implicit >> "-fdebug-prefix-map=$value" >> command-line arguments that precede any explicit ones. This makes the final >> binary output reproducible, and also hides the unreproducible value (the >> source >> path prefixes) from CFLAGS et. al. which many build tools (understandably) >> embed as-is into their build output. > I'd *really* avoid doing this with magic environment variables. Make it > a first class option to the compiler. Yes, it means projects that want > this behavior have to arrange to pass that flag to their compiler, but > IMHO that's much preferred over environment variables. > > Jeff > Hi Jeff, If by "first class option" you meant a command-line flag, GCC *already has* that (-fdebug-prefix-map) and it wasn't enough to achieve reproducibility in many cases we tested. dpkg-buildflags actually already adds these flags to CFLAGS CXXFLAGS etc on Debian. However, with this patch using the environment variable, we are able to reproduce 1800 more packages out of 26000. GCC already supports a similar environment variable SOURCE_DATE_EPOCH, which was accepted about 2 years ago in a patch written by one of our GSoC students. We are not planning any more environment variables like this, and are committed to fixing other sources of non-determinism by patching the relevant build scripts. However for build-paths, we think that this environment variable is the best practical approach here. It is even more prevalent than the timestamps issue - when SOURCE_DATE_EPOCH was accepted into GCC, we were able to reproduce about 400 more packages. With BUILD_PATH_PREFIX_MAP, this number is about 1800. I understand that environment variables have a bit of a "magical feel" to them, but I think that this scenario fits their use-cases well. The idea of an "environment" or a "context" is used in many scenarios in programming (including pure functional programming) when one wants to avoid passing the same arguments down multiple layers of function calls, especially when only some of these layers (and often, only the bottom one) use these arguments and the other layers totally ignore them (except to pass it down). I think this fits the situation here - patching many many build scripts simply to take a prefix-map and pass it downwards one layer in the "buildsystem stack", and do nothing else with this information, does not seem like a clean nor scalable solution to me. I can empathise that UNIX global environment variables is not the most elegant manifestation of this general idea of "an environment", but I think the overall end result in this case, is still cleaner than "boilerplate argument passing". On top of all of this, we've seen that some build scripts will save GCC's command-line arguments into the build output. But this defeats the purpose of setting -fdebug-prefix-map for reproducibility purposes, since it contains an unreproducible value that is meant to be stripped out of the build output. Indeed, GCC itself used to do this in DW_AT_producer, until we submitted a patch over a year ago to specifically ignore -fdebug-prefix-map when writing DW_AT_producer. If we could only set -fdebug-prefix-map via the CLI and not this envvar, we'd have to teach this logic to all other build scripts that want to save compiler flags as well. By contrast, if we use a previously-unused envvar, and explicitly say that this should not be saved into build output, this problem goes away. (I think it is reasonable to save CLI flags into build output, because they very obviously affect what the program does. However, it is much less reasonable to save arbitrary envvars into build output, and we currently would treat this as a reproducibility bug and potential privacy leak.) For all of these reasons, that is why I decided an envvar approach was best here. At least, there are even stronger arguments for this one, than for SOURCE_DATE_EPOCH. X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
[PATCH 2/3] Use BUILD_PATH_PREFIX_MAP envvar to transform __FILE__
Use the BUILD_PATH_PREFIX_MAP environment variable when expanding the __FILE__ macro, in the same way that debug-prefix-map works for debugging symbol paths. This patch follows similar lines to the earlier patch for SOURCE_DATE_EPOCH. Specifically, we read the environment variable not in libcpp but via a hook which has an implementation defined in gcc/c-family. However, to achieve this is more complex than the earlier patch: we need to share the prefix_map data structure and associated functions between libcpp and c-family. Therefore, we need to move these to libiberty. (For comparison, the SOURCE_DATE_EPOCH patch did not need this because time_t et. al. are in the standard C library.) Acknowledgements Dhole who wrote the earlier patch for SOURCE_DATE_EPOCH which saved me a lot of time on figuring out what to edit. ChangeLogs -- gcc/ChangeLog: 2017-07-21 Ximin Luo * doc/invoke.texi (Environment Variables): Document form and behaviour of BUILD_PATH_PREFIX_MAP. gcc/c-family/ChangeLog: 2017-07-21 Ximin Luo * c-common.c (cb_get_build_path_prefix_map): Define new call target. * c-common.h (cb_get_build_path_prefix_map): Declare call target. * c-lex.c (init_c_lex): Set the get_build_path_prefix_map callback. libcpp/ChangeLog: 2017-07-21 Ximin Luo * include/cpplib.h (cpp_callbacks): Add get_build_path_prefix_map callback. * init.c (cpp_create_reader): Initialise build_path_prefix_map field. * internal.h (cpp_reader): Add new field build_path_prefix_map. * macro.c (_cpp_builtin_macro_text): Set the build_path_prefix_map field if unset and apply it when expanding __FILE__ macros. gcc/testsuite/ChangeLog: 2017-07-21 Ximin Luo * gcc.dg/cpp/build_path_prefix_map-1.c: New test. * gcc.dg/cpp/build_path_prefix_map-2.c: New test. Index: gcc-8-20170716/gcc/doc/invoke.texi === --- gcc-8-20170716.orig/gcc/doc/invoke.texi +++ gcc-8-20170716/gcc/doc/invoke.texi @@ -27197,6 +27197,26 @@ Recognize EUCJP characters. If @env{LANG} is not defined, or if it has some other value, then the compiler uses @code{mblen} and @code{mbtowc} as defined by the default locale to recognize and translate multibyte characters. + +@item BUILD_PATH_PREFIX_MAP +@findex BUILD_PATH_PREFIX_MAP +If this variable is set, it specifies an ordered map used to transform +filepaths output in debugging symbols and expansions of the @code{__FILE__} +macro. This may be used to achieve fully reproducible output. In the context +of running GCC within a higher-level build tool, it is typically more reliable +than setting command line arguments such as @option{-fdebug-prefix-map} or +common environment variables such as @env{CFLAGS}, since the build tool may +save these latter values into other output outside of GCC's control. + +The value is of the form +@samp{@var{dst@r{[0]}}=@var{src@r{[0]}}:@var{dst@r{[1]}}=@var{src@r{[1]}}@r{@dots{}}}. +If any @var{dst@r{[}i@r{]}} or @var{src@r{[}i@r{]}} contains @code{%}, @code{=} +or @code{:} characters, they must be replaced with @code{%#}, @code{%+}, and +@code{%.} respectively. + +Whenever GCC emits a filepath that starts with a whole path component matching +@var{src@r{[}i@r{]}} for some @var{i}, with rightmost @var{i} taking priority, +the matching part is replaced with @var{dst@r{[}i@r{]}} in the final output. @end table @noindent Index: gcc-8-20170716/gcc/c-family/c-common.c === --- gcc-8-20170716.orig/gcc/c-family/c-common.c +++ gcc-8-20170716/gcc/c-family/c-common.c @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3. #include "config.h" #include "system.h" +#include "prefix-map.h" #include "coretypes.h" #include "target.h" #include "function.h" @@ -7905,6 +7906,25 @@ cb_get_source_date_epoch (cpp_reader *pf return (time_t) epoch; } +/* Read BUILD_PATH_PREFIX_MAP from environment to have deterministic relative + paths to replace embedded absolute paths to get reproducible results. + Returns NULL if BUILD_PATH_PREFIX_MAP is badly formed. */ + +prefix_map ** +cb_get_build_path_prefix_map (cpp_reader *pfile ATTRIBUTE_UNUSED) +{ + prefix_map **map = XCNEW (prefix_map *); + + const char *arg = getenv ("BUILD_PATH_PREFIX_MAP"); + if (!arg || prefix_map_parse (map, arg)) +return map; + + free (map); + error_at (input_location, "environment variable BUILD_PATH_PREFIX_MAP is " + "not well formed; see the GCC documentation for more details."); + return NULL; +} + /* Callback for libcpp for offering spelling suggestions for misspelled directives. GOAL is an unrecognized string; CANDIDATES is a NULL-terminated array of candidate strings. Return the closest Index: gcc-8-20170716/gcc/c-
[PATCH 1/3] Use BUILD_PATH_PREFIX_MAP envvar for debug-prefix-map
Define the BUILD_PATH_PREFIX_MAP environment variable, and treat it as implicit -fdebug-prefix-map CLI options specified before any explicit such options. Much of the generic code for applying and parsing prefix-maps is implemented in libiberty instead of the dwarf2 parts of the code, in order to make subsequent patches unrelated to debuginfo easier. Acknowledgements Daniel Kahn Gillmor who wrote the patch for r231835, which saved me a lot of time figuring out what to edit. HW42 for discussion on the details of the proposal, and for suggesting that we retain the ability to map the prefix to something other than ".". Other contributors to the BUILD_PATH_PREFIX_MAP specification, see https://reproducible-builds.org/specs/build-path-prefix-map/ ChangeLogs -- include/ChangeLog: 2017-07-21 Ximin Luo * prefix-map.h: New file implementing the BUILD_PATH_PREFIX_MAP specification; includes code from /gcc/final.c and code adapted from examples attached to the specification. libiberty/ChangeLog: 2017-07-21 Ximin Luo * prefix-map.c: New file implementing the BUILD_PATH_PREFIX_MAP specification; includes code from /gcc/final.c and code adapted from examples attached to the specification. * Makefile.in: Update for new files. gcc/ChangeLog: 2017-07-21 Ximin Luo * debug.h: Declare add_debug_prefix_map_from_envvar. * final.c: Define add_debug_prefix_map_from_envvar, and refactor prefix-map utilities to use equivalent code from libiberty instead. * opts-global.c: (handle_common_deferred_options): Call add_debug_prefix_map_from_envvar before processing options. gcc/testsuite/ChangeLog: 2017-07-21 Ximin Luo * lib/gcc-dg.exp: Allow dg-set-compiler-env-var to take only one argument in which case it unsets the given env var. * gcc.dg/debug/dwarf2/build_path_prefix_map-1.c: New test. * gcc.dg/debug/dwarf2/build_path_prefix_map-2.c: New test. Index: gcc-8-20170716/include/prefix-map.h === --- /dev/null +++ gcc-8-20170716/include/prefix-map.h @@ -0,0 +1,94 @@ +/* Declarations for manipulating filename prefixes. + Written 2017 by Ximin Luo + This code is in the public domain. */ + +#ifndef _PREFIX_MAP_H +#define _PREFIX_MAP_H + +#ifdef __cplusplus +extern "C" { +#endif + +#ifdef HAVE_STDLIB_H +#include +#endif + +/* Linked-list of mappings from old prefixes to new prefixes. */ + +struct prefix_map +{ + const char *old_prefix; + const char *new_prefix; + size_t old_len; + size_t new_len; + struct prefix_map *next; +}; + + +/* Find a mapping suitable for the given OLD_NAME in the linked list MAP.\ + + If a mapping is found, writes a pointer to the non-matching suffix part of + OLD_NAME in SUFFIX, and its length in SUF_LEN. + + Returns NULL if there was no suitable mapping. */ +struct prefix_map * +prefix_map_find (struct prefix_map *map, const char *old_name, +const char **suffix, size_t *suf_len); + +/* Prepend a prefix map before a given SUFFIX. + + The remapped name is written to NEW_NAME and returned as a const pointer. No + allocations are performed; the caller must ensure it can hold at least + MAP->NEW_LEN + SUF_LEN + 1 characters. */ +const char * +prefix_map_prepend (struct prefix_map *map, char *new_name, + const char *suffix, size_t suf_len); + +/* Remap a filename. + + Returns OLD_NAME unchanged if there was no remapping, otherwise returns a + pointer to newly-allocated memory for the remapped filename. The memory is + allocated by the given ALLOC function, which also determines who is + responsible for freeing it. */ +#define prefix_map_remap_alloc_(map_head, old_name, alloc)\ + __extension__ \ + ({ \ +const char *__suffix; \ +size_t __suf_len; \ +struct prefix_map *__map; \ +(__map = prefix_map_find ((map_head), (old_name), &__suffix, &__suf_len)) \ + ? prefix_map_prepend (__map,\ + (char *) alloc (__map->new_len + __suf_len + 1), \ + __suffix, __suf_len) \ + : (old_name); \ + }) + +/* Remap a filename. + + Returns OLD_NAME unchanged if there was no remapping, otherwise returns a + stack-allocated pointer to the newly-remapped filename. */ +#define prefix_map_remap_alloca(map_head, old_name) \ + prefix_map_remap_alloc_ (map_head, old_name,
[PATCH 3/3] When remapping paths, only match whole path components
Change the remapping algorithm so that each old_prefix only matches paths that have old_prefix as a whole path component prefix. (A whole path component is a part of a path that begins and ends at a directory separator or at either end of the path string.) This remapping algorithm is more predictable than the old algorithm, because there is no chance of mappings for one directory interfering with mappings for other directories. It contains less corner cases and therefore it is easier for users to figure out how to set the mapping appropriately. Therefore, I believe it is better as a standardised algorithm that other build tools might like to adopt, and so in our BUILD_PATH_PREFIX_MAP specification we recommend this algorithm - though we allow others, and explicitly mention GCC's current algorithm. But it would be good for GCC to adopt this newer and cleaner one. (The original idea came from discussions with rustc developers on this topic.) This does technically break backwards-compatibility, but I was under the impression that this option was not seen as such a critical feature, that this would be too important. Nevertheless, this part is totally independent from the other patches and may be included or excluded as GCC maintainers desire. Acknowledgements Discussions with Michael Woerister and other members of the Rust compiler team on Github, and discussions with Daniel Shahaf on the rb-general@ mailing list on lists.reproducible-builds.org. ChangeLogs -- libiberty/ChangeLog: 2017-07-21 Ximin Luo * prefix-map.c: When remapping paths, only match whole path components. Index: gcc-8-20170716/libiberty/prefix-map.c === --- gcc-8-20170716.orig/libiberty/prefix-map.c +++ gcc-8-20170716/libiberty/prefix-map.c @@ -87,12 +87,22 @@ struct prefix_map * prefix_map_find (struct prefix_map *map, const char *old_name, const char **suffix, size_t *suf_len) { + size_t len; + for (; map; map = map->next) -if (filename_ncmp (old_name, map->old_prefix, map->old_len) == 0) - { - *suf_len = strlen (*suffix = old_name + map->old_len); - break; - } +{ + len = map->old_len; + /* Ignore trailing path separators at the end of old_prefix */ + while (len > 0 && IS_DIR_SEPARATOR (map->old_prefix[len-1])) len--; + /* Check if old_name matches old_prefix at a path component boundary */ + if (! filename_ncmp (old_name, map->old_prefix, len) + && (IS_DIR_SEPARATOR (old_name[len]) + || old_name[len] == '\0')) + { + *suf_len = strlen (*suffix = old_name + len); + break; + } +} return map; }
[PING^4][PATCH v2] Generate reproducible output independently of the build-path
(Please keep me on CC, I am not subscribed) Proposal This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. When this is set, GCC will treat this as extra implicit "-fdebug-prefix-map=$value" command-line arguments that precede any explicit ones. This makes the final binary output reproducible, and also hides the unreproducible value (the source path prefixes) from CFLAGS et. al. which many build tools (understandably) embed as-is into their build output. This environment variable also acts on the __FILE__ macro, mapping it in the same way that debug-prefix-map works for debug symbols. We have seen that __FILE__ is also a very large source of unreproducibility, and is represented quite heavily in the 3k+ figure given earlier. Finally, we tweak the mapping algorithm so that it applies only to whole path components when matching prefixes. This is justified in further detail in the patch header. It is an optional part of the patch series and could be dropped if the GCC maintainers are not convinced by our arguments there. Background == We have prepared a document that describes how this works in detail, so that projects can be confident that they are interoperable: https://reproducible-builds.org/specs/build-path-prefix-map/ The specification is currently in DRAFT status, awaiting some final feedback, including what the GCC maintainers think about it. We have written up some more detailed discussions on the topic, including a thorough justification on why we chose the mechanism of environment variables: https://wiki.debian.org/ReproducibleBuilds/StandardEnvironmentVariables The previous iteration of the patch series, essentially the same as the current re-submission, is here: https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00513.html An older version, that explains some GCC-specific background, is here: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00182.html The current patch series applies cleanly to GCC-8 snapshot 20170716. Reproducibility testing === Over the past 3 months, we have tested this patch backported to Debian GCC-6. Together with a patched dpkg that sets the environment variable appropriately, it allows us to reproduce ~1800 extra packages. This is about 6.8% of ~26400 Debian source packages, and just over 1/2 of the ones whose irreproducibility is due to build-path issues. https://tests.reproducible-builds.org/debian/issues/unstable/gcc_captures_build_path_issue.html https://tests.reproducible-builds.org/debian/unstable/index_suite_amd64_stats.html The first major increase around 2017-04 is due to us deploying this patch. The next major increase later in 2017-04 is unrelated, due to us deploying a patch for R. The dip during the last part of 2017-06 is due to unpatched and patched packages getting out-of-sync partly because of extra admin work around the Debian stretch release, and we believe that the green will soon return to their previous high after this situation settles. Unit testing I've tested these patches on a Debian unstable x86_64-linux-gnu schroot running inside a Debian jessie system, on a full-bootstrap build. The output of contrib/compare_tests is as follows: gcc-8-20170716$ contrib/compare_tests ../gcc-build-{0,1} # Comparing directories ## Dir1=../gcc-build-0: 8 sum files ## Dir2=../gcc-build-1: 8 sum files # Comparing 8 common sum files ## /bin/sh contrib/compare_tests /tmp/gxx-sum1.13468 /tmp/gxx-sum2.13468 New tests that PASS: gcc.dg/cpp/build_path_prefix_map-1.c (test for excess errors) gcc.dg/cpp/build_path_prefix_map-1.c execution test gcc.dg/cpp/build_path_prefix_map-2.c (test for excess errors) gcc.dg/cpp/build_path_prefix_map-2.c execution test gcc.dg/debug/dwarf2/build_path_prefix_map-1.c (test for excess errors) gcc.dg/debug/dwarf2/build_path_prefix_map-1.c scan-assembler DW_AT_comp_dir: "DWARF2TEST/gcc gcc.dg/debug/dwarf2/build_path_prefix_map-2.c (test for excess errors) gcc.dg/debug/dwarf2/build_path_prefix_map-2.c scan-assembler DW_AT_comp_dir: "/ # No differences found in 8 common sum files I can also provide the full logs on request. Fuzzing === I've also fuzzed the prefix-map code using AFL with ASAN enabled. Due to how AFL works I did not fuzz this patch directly but a smaller program with just the parser and remapper, available here: https://anonscm.debian.org/cgit/reproducible/build-path-prefix-map-spec.git/tree/consume Over the course of about ~4k cycles, no crashes were found. To reproduce, you could run something like: $ echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor $ make CC=afl-gcc clean reset-fuzz-pecsplit.c fuzz-pecsplit.c Copyright disclaimer I've signed a copyright disclaimer and the FSF has this on record. (RT #1209764)
Re: [Ping ^3][PATCH v2] Generate reproducible output independently of the build-path
Dear GCC Global Reviewers, Could any of you please review my patch series? It's about being able to reproducibly build things, even when the build machines are executing the build under different paths. Overview: https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00513.html Full thread, including individual patches: https://gcc.gnu.org/ml/gcc-patches/2017-04/threads.html#00513 Follow-up report: https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00781.html In summary, this patch helps ~1800/26000 packages in Debian to become reproducible even when the build-path is varied across builds. I've signed a copyright disclaimer and the FSF has this on record. X Ximin Luo: > Ximin Luo: >> Joseph Myers: >>> On Tue, 11 Apr 2017, Ximin Luo wrote: >>> >>>> Copyright disclaimer >>>> >>>> >>>> I dedicate these patches to the public domain by waiving all of my rights >>>> to >>>> the work worldwide under copyright law, including all related and >>>> neighboring >>>> rights, to the extent allowed by law. >>>> >>>> See https://creativecommons.org/publicdomain/zero/1.0/legalcode for full >>>> text. >>>> >>>> Please let me know if the above is insufficient and I will be happy to >>>> sign any >>>> relevant forms. >>> >>> I believe the FSF wants its own disclaimer forms signed as evidence code >>> is in the public domain. The process for getting disclaimer forms is to >>> complete >>> https://git.savannah.gnu.org/cgit/gnulib.git/plain/doc/Copyright/request-disclaim.changes >>> >>> and then you should be sent a disclaimer form for disclaiming the >>> particular set of changes you have completed (if you then make further >>> significant changes afterwards, the disclaimer form would then need >>> completing for them as well). >>> >> >> I've now done this, and the copyright clerk at the FSF has told me that this >> is complete on their side as well. >> >> Did any of you get a chance to look at the patch yet? >> > > Hi GCC patches list, > > Any progress or feedback on this patch series? > > Ximin > -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Re: [PATCH v2] Generate reproducible output independently of the build-path
Ximin Luo: > Joseph Myers: >> On Tue, 11 Apr 2017, Ximin Luo wrote: >> >>> Copyright disclaimer >>> >>> >>> I dedicate these patches to the public domain by waiving all of my rights to >>> the work worldwide under copyright law, including all related and >>> neighboring >>> rights, to the extent allowed by law. >>> >>> See https://creativecommons.org/publicdomain/zero/1.0/legalcode for full >>> text. >>> >>> Please let me know if the above is insufficient and I will be happy to sign >>> any >>> relevant forms. >> >> I believe the FSF wants its own disclaimer forms signed as evidence code >> is in the public domain. The process for getting disclaimer forms is to >> complete >> https://git.savannah.gnu.org/cgit/gnulib.git/plain/doc/Copyright/request-disclaim.changes >> >> and then you should be sent a disclaimer form for disclaiming the >> particular set of changes you have completed (if you then make further >> significant changes afterwards, the disclaimer form would then need >> completing for them as well). >> > > I've now done this, and the copyright clerk at the FSF has told me that this > is complete on their side as well. > > Did any of you get a chance to look at the patch yet? > Hi GCC patches list, Any progress or feedback on this patch series? Ximin -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Re: [PATCH v2] Generate reproducible output independently of the build-path
Joseph Myers: > On Tue, 11 Apr 2017, Ximin Luo wrote: > >> Copyright disclaimer >> >> >> I dedicate these patches to the public domain by waiving all of my rights to >> the work worldwide under copyright law, including all related and neighboring >> rights, to the extent allowed by law. >> >> See https://creativecommons.org/publicdomain/zero/1.0/legalcode for full >> text. >> >> Please let me know if the above is insufficient and I will be happy to sign >> any >> relevant forms. > > I believe the FSF wants its own disclaimer forms signed as evidence code > is in the public domain. The process for getting disclaimer forms is to > complete > https://git.savannah.gnu.org/cgit/gnulib.git/plain/doc/Copyright/request-disclaim.changes > > and then you should be sent a disclaimer form for disclaiming the > particular set of changes you have completed (if you then make further > significant changes afterwards, the disclaimer form would then need > completing for them as well). > I've now done this, and the copyright clerk at the FSF has told me that this is complete on their side as well. Did any of you get a chance to look at the patch yet? X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Re: [PATCH v2] Generate reproducible output independently of the build-path
Ximin Luo: > [..] > > I will soon test this patch backported to Debian GCC-6 on > tests.reproducible-builds.org and will have results in a few days or weeks. > Some preliminary tests earlier gave good results (about +40 packages > reproducible over ~2 days) but we had to abort due to some misscheduling. > > [..] This has been completed and we reproduced ~1700 extra packages when building with a GCC-6 with this patch, as well as a patched dpkg that sets the environment variable appropriately. This is about 6.5% of ~26100 Debian source packages, and about 1/2 of the ones whose irreproducibility is due to build-path issues. Most of the rest are not related to GCC, such as things built by R, OCaml, Erlang, LLVM, PDF IDs, etc, etc. https://tests.reproducible-builds.org/debian/unstable/index_suite_amd64_stats.html The dip afterwards is due to reverting back to an unpatched GCC-6, but I'll be rebasing the patch continually over the next few weeks so the graph should stay up. X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
[PATCH 2/3] Use BUILD_PATH_PREFIX_MAP envvar to transform __FILE__
Use the BUILD_PATH_PREFIX_MAP environment variable when expanding the __FILE__ macro, in the same way that debug-prefix-map works for debugging symbol paths. This patch follows similar lines to the earlier patch for SOURCE_DATE_EPOCH. Specifically, we read the environment variable not in libcpp but via a hook which has an implementation defined in gcc/c-family. However, to achieve this is more complex than the earlier patch: we need to share the prefix_map data structure and associated functions between libcpp and c-family. Therefore, we need to move these to libiberty. (For comparison, the SOURCE_DATE_EPOCH patch did not need this because time_t et. al. are in the standard C library.) Acknowledgements Dhole who wrote the earlier patch for SOURCE_DATE_EPOCH which saved me a lot of time on figuring out what to edit. ChangeLogs -- gcc/c-family/ChangeLog: 2017-03-27 Ximin Luo * c-common.c (cb_get_build_path_prefix_map): Define new call target. * c-common.h (cb_get_build_path_prefix_map): Declare call target. * c-lex.c (init_c_lex): Set the get_build_path_prefix_map callback. libcpp/ChangeLog: 2017-03-27 Ximin Luo * include/cpplib.h (cpp_callbacks): Add get_build_path_prefix_map callback. * init.c (cpp_create_reader): Initialise build_path_prefix_map field. * internal.h (cpp_reader): Add new field build_path_prefix_map. * macro.c (_cpp_builtin_macro_text): Set the build_path_prefix_map field if unset and apply it when expanding __FILE__ macros. gcc/testsuite/ChangeLog: 2017-03-27 Ximin Luo * gcc.dg/cpp/build_path_prefix_map-1.c: New test. * gcc.dg/cpp/build_path_prefix_map-2.c: New test. Index: gcc-7-20170402/gcc/c-family/c-common.c === --- gcc-7-20170402.orig/gcc/c-family/c-common.c +++ gcc-7-20170402/gcc/c-family/c-common.c @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3. #include "config.h" #include "system.h" +#include "prefix-map.h" #include "coretypes.h" #include "target.h" #include "function.h" @@ -8012,6 +8013,25 @@ cb_get_source_date_epoch (cpp_reader *pf return (time_t) epoch; } +/* Read BUILD_PATH_PREFIX_MAP from environment to have deterministic relative + paths to replace embedded absolute paths to get reproducible results. + Returns NULL if BUILD_PATH_PREFIX_MAP is badly formed. */ + +prefix_map ** +cb_get_build_path_prefix_map (cpp_reader *pfile ATTRIBUTE_UNUSED) +{ + prefix_map **map = XCNEW (prefix_map *); + + const char *arg = getenv ("BUILD_PATH_PREFIX_MAP"); + if (!arg || prefix_map_parse (map, arg)) +return map; + + free (map); + error_at (input_location, "environment variable BUILD_PATH_PREFIX_MAP is " + "not well formed; see the GCC documentation for more details."); + return NULL; +} + /* Callback for libcpp for offering spelling suggestions for misspelled directives. GOAL is an unrecognized string; CANDIDATES is a NULL-terminated array of candidate strings. Return the closest Index: gcc-7-20170402/gcc/c-family/c-common.h === --- gcc-7-20170402.orig/gcc/c-family/c-common.h +++ gcc-7-20170402/gcc/c-family/c-common.h @@ -1085,6 +1085,11 @@ extern time_t cb_get_source_date_epoch ( __TIME__ can store. */ #define MAX_SOURCE_DATE_EPOCH HOST_WIDE_INT_C (253402300799) +/* Read BUILD_PATH_PREFIX_MAP from environment to have deterministic relative + paths to replace embedded absolute paths to get reproducible results. + Returns NULL if BUILD_PATH_PREFIX_MAP is badly formed. */ +extern prefix_map **cb_get_build_path_prefix_map (cpp_reader *pfile); + /* Callback for libcpp for offering spelling suggestions for misspelled directives. */ extern const char *cb_get_suggestion (cpp_reader *, const char *, Index: gcc-7-20170402/gcc/c-family/c-lex.c === --- gcc-7-20170402.orig/gcc/c-family/c-lex.c +++ gcc-7-20170402/gcc/c-family/c-lex.c @@ -81,6 +81,7 @@ init_c_lex (void) cb->read_pch = c_common_read_pch; cb->has_attribute = c_common_has_attribute; cb->get_source_date_epoch = cb_get_source_date_epoch; + cb->get_build_path_prefix_map = cb_get_build_path_prefix_map; cb->get_suggestion = cb_get_suggestion; /* Set the debug callbacks if we can use them. */ Index: gcc-7-20170402/libcpp/include/cpplib.h === --- gcc-7-20170402.orig/libcpp/include/cpplib.h +++ gcc-7-20170402/libcpp/include/cpplib.h @@ -607,6 +607,9 @@ struct cpp_callbacks /* Callback to parse SOURCE_DATE_EPOCH from environment. */ time_t (*get_source_date_epoch) (cpp_reader *); + /* Callback to parse BUILD_PATH_PREFIX_MAP from environment. */ + st
[PATCH v2] Generate reproducible output independently of the build-path
(Please keep me on CC, I am not subscribed) Background == Previous background is here: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00182.html Upon further discussion, we decided to add support for multiple mappings and to rename the environment variable to BUILD_PATH_PREFIX_MAP. We have also prepared a document that describes how this works in detail, so that projects can be confident that they are interoperable: https://reproducible-builds.org/specs/build-path-prefix-map/ The specification is currently in DRAFT status, awaiting some final feedback, including what the GCC maintainers think about it. If one is interested in reading about this topic in the wider context of reproducible builds, there's some more background here: https://wiki.debian.org/ReproducibleBuilds/StandardEnvironmentVariables Proposal This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. When this is set, GCC will treat this as extra implicit "-fdebug-prefix-map=$value" command-line arguments that precede any explicit ones. This makes the final binary output reproducible, and also hides the unreproducible value (the source path prefixes) from CFLAGS et. al. which many build tools (understandably) embed as-is into their build output. This environment variable also acts on the __FILE__ macro, mapping it in the same way that debug-prefix-map works for debug symbols. We have seen that __FILE__ is also a very large source of unreproducibility, and is represented quite heavily in the 3k+ figure given earlier. Finally, we tweak the mapping algorithm so that it applies only to whole path components when matching prefixes. This algorithm contains fewer corner cases and is more predictable, so it is easier for users to figure out how to set the mapping appropriately, and it is better as a standardised algorithm that other build tools might like to adopt. (The original idea came from discussions with some rustc developers about this same topic.) This does technically break backwards-compatibility, but I was under the impression that this option was not seen as such a critical feature, that this would be too important. I am also happy to justify it in more detail on request. Nevertheless, for this reason our draft specification currently offers two algorithms for implementers, but I would reduce this to one if the GCC maintainers agree to accept this third patch. Testing === I've tested these patches on a Debian unstable x86_64-linux-gnu schroot running inside a Debian jessie system, on a full-bootstrap build. The output of contrib/compare_tests is as follows: gcc-7-20170409$ contrib/compare_tests ../gcc-build-0 ../gcc-build-1 # Comparing directories ## Dir1=../gcc-build-0: 8 sum files ## Dir2=../gcc-build-1: 8 sum files # Comparing 8 common sum files ## /bin/sh contrib/compare_tests /tmp/gxx-sum1.24154 /tmp/gxx-sum2.24154 New tests that PASS: gcc.dg/cpp/build_path_prefix_map-1.c (test for excess errors) gcc.dg/cpp/build_path_prefix_map-1.c execution test gcc.dg/cpp/build_path_prefix_map-2.c (test for excess errors) gcc.dg/cpp/build_path_prefix_map-2.c execution test gcc.dg/debug/dwarf2/build_path_prefix_map-1.c (test for excess errors) gcc.dg/debug/dwarf2/build_path_prefix_map-1.c scan-assembler DW_AT_comp_dir: "DWARF2TEST/gcc gcc.dg/debug/dwarf2/build_path_prefix_map-2.c (test for excess errors) gcc.dg/debug/dwarf2/build_path_prefix_map-2.c scan-assembler DW_AT_comp_dir: "/ # No differences found in 8 common sum files I can also provide the full logs on request. -- I've also fuzzed the prefix-map code using AFL with ASAN enabled. Due to how AFL works I did not fuzz this patch directly but a smaller program with just the parser and remapper, available here: https://anonscm.debian.org/cgit/reproducible/build-path-prefix-map-spec.git/tree/consume Over the course of about ~4k cycles, no crashes were found. To reproduce, you could run something like: $ echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor $ make CC=afl-gcc clean reset-fuzz-pecsplit.c fuzz-pecsplit.c -- I will soon test this patch backported to Debian GCC-6 on tests.reproducible-builds.org and will have results in a few days or weeks. Some preliminary tests earlier gave good results (about +40 packages reproducible over ~2 days) but we had to abort due to some misscheduling. Copyright disclaimer I dedicate these patches to the public domain by waiving all of my rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. See https://creativecommons.org/publicdomain/zero/1.0/legalcode for full text. Please let me know if the above is insufficient and I will be happy to sign any relevant forms. However, I would prefer it if the prefix-map.{h,c} remain public domain since its code is also duplicated in our "example code" repo (url above), which is meant for other projects to copy+paste.
[PATCH 1/3] Use BUILD_PATH_PREFIX_MAP envvar for debug-prefix-map
Define the BUILD_PATH_PREFIX_MAP environment variable, and treat it as implicit -fdebug-prefix-map CLI options specified before any explicit such options. Much of the generic code for applying and parsing prefix-maps is implemented in libiberty instead of the dwarf2 parts of the code, in order to make subsequent patches unrelated to debuginfo easier. Acknowledgements Daniel Kahn Gillmor who wrote the patch for r231835, which saved me a lot of time figuring out what to edit. HW42 for discussion on the details of the proposal, and for suggesting that we retain the ability to map the prefix to something other than ".". Other contributors to the BUILD_PATH_PREFIX_MAP specification, see https://reproducible-builds.org/specs/build-path-prefix-map/ ChangeLogs -- include/ChangeLog: 2017-04-09 Ximin Luo * prefix-map.h: New file implementing the BUILD_PATH_PREFIX_MAP specification; includes code from /gcc/final.c and code adapted from examples attached to the specification. libiberty/ChangeLog: 2017-04-09 Ximin Luo * prefix-map.c: New file implementing the BUILD_PATH_PREFIX_MAP specification; includes code from /gcc/final.c and code adapted from examples attached to the specification. * Makefile.in: Update for new files. gcc/ChangeLog: 2017-04-09 Ximin Luo * debug.h: Declare add_debug_prefix_map_from_envvar. * final.c: Define add_debug_prefix_map_from_envvar, and refactor prefix-map utilities to use equivalent code from libiberty instead. * opts-global.c: (handle_common_deferred_options): Call add_debug_prefix_map_from_envvar before processing options. gcc/testsuite/ChangeLog: 2017-04-09 Ximin Luo * lib/gcc-dg.exp: Allow dg-set-compiler-env-var to take only one argument in which case it unsets the given env var. * gcc.dg/debug/dwarf2/build_path_prefix_map-1.c: New test. * gcc.dg/debug/dwarf2/build_path_prefix_map-2.c: New test. Index: gcc-7-20170409/include/prefix-map.h === --- /dev/null +++ gcc-7-20170409/include/prefix-map.h @@ -0,0 +1,94 @@ +/* Declarations for manipulating filename prefixes. + Written 2017 by Ximin Luo + This code is in the public domain. */ + +#ifndef _PREFIX_MAP_H +#define _PREFIX_MAP_H + +#ifdef __cplusplus +extern "C" { +#endif + +#ifdef HAVE_STDLIB_H +#include +#endif + +/* Linked-list of mappings from old prefixes to new prefixes. */ + +struct prefix_map +{ + const char *old_prefix; + const char *new_prefix; + size_t old_len; + size_t new_len; + struct prefix_map *next; +}; + + +/* Find a mapping suitable for the given OLD_NAME in the linked list MAP.\ + + If a mapping is found, writes a pointer to the non-matching suffix part of + OLD_NAME in SUFFIX, and its length in SUF_LEN. + + Returns NULL if there was no suitable mapping. */ +struct prefix_map * +prefix_map_find (struct prefix_map *map, const char *old_name, +const char **suffix, size_t *suf_len); + +/* Prepend a prefix map before a given SUFFIX. + + The remapped name is written to NEW_NAME and returned as a const pointer. No + allocations are performed; the caller must ensure it can hold at least + MAP->NEW_LEN + SUF_LEN + 1 characters. */ +const char * +prefix_map_prepend (struct prefix_map *map, char *new_name, + const char *suffix, size_t suf_len); + +/* Remap a filename. + + Returns OLD_NAME unchanged if there was no remapping, otherwise returns a + pointer to newly-allocated memory for the remapped filename. The memory is + allocated by the given ALLOC function, which also determines who is + responsible for freeing it. */ +#define prefix_map_remap_alloc_(map_head, old_name, alloc)\ + __extension__ \ + ({ \ +const char *__suffix; \ +size_t __suf_len; \ +struct prefix_map *__map; \ +(__map = prefix_map_find ((map_head), (old_name), &__suffix, &__suf_len)) \ + ? prefix_map_prepend (__map,\ + (char *) alloc (__map->new_len + __suf_len + 1), \ + __suffix, __suf_len) \ + : (old_name); \ + }) + +/* Remap a filename. + + Returns OLD_NAME unchanged if there was no remapping, otherwise returns a + stack-allocated pointer to the newly-remapped filename. */ +#define prefix_map_remap_alloca(map_head, old_name) \ + prefix_map_remap_alloc_ (map_head, old_name,
[PATCH 3/3] When remapping paths, only match whole path components
Change the remapping algorithm so that each old_prefix only matches paths that have old_prefix as a whole path component prefix. (A whole path component is a part of a path that begins and ends at a directory separator or at either end of the path string.) This remapping algorithm is more predictable than the old algorithm, because there is no chance of mappings for one directory interfering with mappings for other directories. It contains less corner cases and is therefore nicer for clients to use. For these reasons, in our BUILD_PATH_PREFIX_MAP specification we recommend this algorithm, and it would be good for GCC to follow suit. This does technically break backwards compatibility but I don't think anyone would be reasonably depending on the corner cases of the previous algorithm, which are surprising and counterintuitive. Acknowledgements Discussions with Michael Woerister and other members of the Rust compiler team on Github, and discussions with Daniel Shahaf on the rb-general@ mailing list on lists.reproducible-builds.org. ChangeLogs -- gcc/ChangeLog: 2017-04-09 Ximin Luo * doc/invoke.texi (Environment Variables): Document form and behaviour of BUILD_PATH_PREFIX_MAP. libiberty/ChangeLog: 2017-04-09 Ximin Luo * prefix-map.c: When remapping paths, only match whole path components. Index: gcc-7-20170409/gcc/doc/invoke.texi === --- gcc-7-20170409.orig/gcc/doc/invoke.texi +++ gcc-7-20170409/gcc/doc/invoke.texi @@ -26637,6 +26637,26 @@ Recognize EUCJP characters. If @env{LANG} is not defined, or if it has some other value, then the compiler uses @code{mblen} and @code{mbtowc} as defined by the default locale to recognize and translate multibyte characters. + +@item BUILD_PATH_PREFIX_MAP +@findex BUILD_PATH_PREFIX_MAP +If this variable is set, it specifies an ordered map used to transform +filepaths output in debugging symbols and expansions of the @code{__FILE__} +macro. This may be used to achieve fully reproducible output. In the context +of running GCC within a higher-level build tool, it is typically more reliable +than setting command line arguments such as @option{-fdebug-prefix-map} or +common environment variables such as @env{CFLAGS}, since the build tool may +save these latter values into other output outside of GCC's control. + +The value is of the form +@samp{@var{dst@r{[0]}}=@var{src@r{[0]}}:@var{dst@r{[1]}}=@var{src@r{[1]}}@r{@dots{}}}. +If any @var{dst@r{[}i@r{]}} or @var{src@r{[}i@r{]}} contains @code{%}, @code{=} +or @code{:} characters, they must be replaced with @code{%#}, @code{%+}, and +@code{%.} respectively. + +Whenever GCC emits a filepath that starts with a whole path component matching +@var{src@r{[}i@r{]}} for some @var{i}, with rightmost @var{i} taking priority, +the matching part is replaced with @var{dst@r{[}i@r{]}} in the final output. @end table @noindent Index: gcc-7-20170409/libiberty/prefix-map.c === --- gcc-7-20170409.orig/libiberty/prefix-map.c +++ gcc-7-20170409/libiberty/prefix-map.c @@ -87,12 +87,22 @@ struct prefix_map * prefix_map_find (struct prefix_map *map, const char *old_name, const char **suffix, size_t *suf_len) { + size_t len; + for (; map; map = map->next) -if (filename_ncmp (old_name, map->old_prefix, map->old_len) == 0) - { - *suf_len = strlen (*suffix = old_name + map->old_len); - break; - } +{ + len = map->old_len; + /* Ignore trailing path separators at the end of old_prefix */ + while (len > 0 && IS_DIR_SEPARATOR (map->old_prefix[len-1])) len--; + /* Check if old_name matches old_prefix at a path component boundary */ + if (! filename_ncmp (old_name, map->old_prefix, len) + && (IS_DIR_SEPARATOR (old_name[len]) + || old_name[len] == '\0')) + { + *suf_len = strlen (*suffix = old_name + len); + break; + } +} return map; }
Re: [PATCH] Generate reproducible output independently of the build-path
Mike Stump: > On Nov 3, 2016, at 1:01 PM, Ximin Luo wrote: >> Log snippets attached. > > Ick. You missed the utility of contrib/compare_tests. :-) > > If you say: > > contrib/compare_tests oldbuilddir newbuilddir > > You can then sit back and see everything as you'd expect and want. The > output is priority sorted and usually around 0 lines line or so. > Oh, thanks! Here is the output: ~/reproducible/gcc-7-20161030$ contrib/compare_tests ../gcc-build-0/ ../gcc-build-1/ # Comparing directories ## Dir1=../gcc-build-0/: 8 sum files ## Dir2=../gcc-build-1/: 8 sum files # Comparing 8 common sum files ## /bin/sh contrib/compare_tests /tmp/gxx-sum1.9339 /tmp/gxx-sum2.9339 New tests that PASS: gcc.dg/cpp/source_date_epoch-4.c (test for excess errors) gcc.dg/cpp/source_date_epoch-4.c execution test gcc.dg/cpp/source_date_epoch-5.c (test for excess errors) gcc.dg/cpp/source_date_epoch-5.c execution test gcc.dg/cpp/source_prefix_map-1.c (test for excess errors) gcc.dg/cpp/source_prefix_map-1.c execution test gcc.dg/cpp/source_prefix_map-2.c (test for excess errors) gcc.dg/cpp/source_prefix_map-2.c execution test gcc.dg/debug/dwarf2/source_prefix_map-1.c (test for excess errors) gcc.dg/debug/dwarf2/source_prefix_map-1.c scan-assembler-dem DW_AT_comp_dir: "DWARF2TEST/gcc gcc.dg/debug/dwarf2/source_prefix_map-2.c (test for excess errors) gcc.dg/debug/dwarf2/source_prefix_map-2.c scan-assembler DW_AT_comp_dir: "/ # No differences found in 8 common sum files -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Re: [PATCH] Generate reproducible output independently of the build-path
Ximin Luo: > Testing > === > > I've tested these patches on a Debian testing/unstable x86_64-linux-gnu > system. > So far I've only run the new tests that this patch adds, on a > disable-bootstrap > build. I will do a full bootstrap and run the full testsuite over the next few > days, both with and without this patch, and report back. > Tested with a full bootstrap on a Debian unstable x86_64-linux-gnu system. Log snippets attached. These were produced by running the below shell command in each of the respective build directories (0 = without, 1 = with the patches): find . -name '*.log' -a -not -name 'config.log' | sort | while read x; do echo "== $x =="; grep -a '^FAIL: \|^# of' "$x"; done > ../gcc-build-X.snippets Due to make -j22, they didn't run the tests in the same order. However in between doing this: $ diff -ru <(sort gcc-build-0.snippets) <(sort gcc-build-1.snippets) --- /dev/fd/63 2016-11-03 20:58:02.892334031 +0100 +++ /dev/fd/62 2016-11-03 20:58:02.892334031 +0100 @@ -323,7 +323,7 @@ # of expected failures 76 # of expected passes 110511 # of expected passes 11259 -# of expected passes 127561 +# of expected passes 127573 # of expected passes 2797 # of expected passes 42 # of expected passes 43915 1 and a manual review of the unsorted diffs, hopefully you can see that everything is good. X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git == ./gcc/testsuite/g++/g++.log == FAIL: g++.dg/guality/pr55665.C -O2 line 23 p == 40 FAIL: g++.dg/guality/pr55665.C -O3 -g line 23 p == 40 # of expected passes110511 # of unexpected failures2 # of unexpected successes 2 # of expected failures 333 # of unsupported tests 3989 == ./gcc/testsuite/gcc/gcc.log == FAIL: gcc.dg/guality/pr54200.c -Os line 20 z == 3 FAIL: gcc.dg/guality/pr54519-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 20 y == 25 FAIL: gcc.dg/guality/pr54519-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 20 z == 6 FAIL: gcc.dg/guality/pr54519-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 23 y == 117 FAIL: gcc.dg/guality/pr54519-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 23 z == 8 FAIL: gcc.dg/guality/pr54519-2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 17 y == 25 FAIL: gcc.dg/guality/pr54519-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 17 y == 25 FAIL: gcc.dg/guality/pr54519-3.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 20 y == 25 FAIL: gcc.dg/guality/pr54519-3.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 20 z == 6 FAIL: gcc.dg/guality/pr54519-3.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 23 y == 117 FAIL: gcc.dg/guality/pr54519-3.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 23 z == 8 FAIL: gcc.dg/guality/pr54519-3.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 20 y == 25 FAIL: gcc.dg/guality/pr54519-3.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 20 z == 6 FAIL: gcc.dg/guality/pr54519-3.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 23 y == 117 FAIL: gcc.dg/guality/pr54519-3.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 23 z == 8 FAIL: gcc.dg/guality/pr54519-4.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 17 y == 25 FAIL: gcc.dg/guality/pr54519-4.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 17 y == 25 FAIL: gcc.dg/guality/pr54519-5.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 17 y == 25 FAIL: gcc.dg/guality/pr54519-5.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 17 y == 25 FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions line 34 c == &a[0] FAIL: gcc.dg/guality/pr43051-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions line 36 e == &a[1] FAIL: gcc.dg/guality/pr45882.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 16 d == 112 FAIL: gcc.dg/guality/pr45882.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 16 e == 142 FAIL: gcc.dg/guality/pr68860-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 14 arg1 == 1 FAIL: gcc.dg/guality/pr68860-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 14 arg2 == 2 FAIL: gcc.dg/guality/pr68860-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 14 arg3 == 3 FAIL: gcc.dg/guality/pr68860-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 14 arg4 == 4 FAIL: gcc.dg/guality/pr68860-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none line 14 arg5 == 5 FAIL: gcc
Re: [PATCH] Generate reproducible output independently of the build-path
Ximin Luo: > Joseph Myers: >> On Wed, 2 Nov 2016, Ximin Luo wrote: >> >>> This patch series adds a new environment variable SOURCE_PREFIX_MAP. When >>> this >>> is set, GCC will treat this as an implicit "-fdebug-prefix-map=$value" >>> command-line argument. This makes the final binary output reproducible, and >> >> Only one argument? Sometimes you may want multiple -fdebug-prefix-map >> options (for both source and build directories, for example). Perhaps the >> value should be split on spaces to provide multiple such mappings? >> > > We wanted to keep this environment variable simple, and allowing this would > make it more costly for other projects to adopt. > > Whichever separator character we choose, we would have to either (a) disallow > it within the oldprefix/newprefix strings like what PATH does or (b) think of > an escaping scheme. Neither choice is great - with (a) since we want to map > *arbitrary* paths the restriction is less acceptable here than in PATH, and > with (b) it adds complexity to the spec, for uncommon use-cases. > If we picked '\n' as the separator character, we could perhaps live with the (a) restriction. I'll wait a while for others to comment some more. > In the case of an out-of-tree build, it is still possible with the current > proposal. Instead of: > > SOURCE_PREFIX_MAP="srcprefix=srcname:buildprefix=buildname" > > One could arrange the tree like: > > commonprefix/srcname > commonprefix/buildname > > then set SOURCE_PREFIX_MAP="commonprefix=." or > SOURCE_PREFIX_MAP="commonprefix/=". > -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Re: [PATCH] Generate reproducible output independently of the build-path
Joseph Myers: > On Wed, 2 Nov 2016, Ximin Luo wrote: > >> This patch series adds a new environment variable SOURCE_PREFIX_MAP. When >> this >> is set, GCC will treat this as an implicit "-fdebug-prefix-map=$value" >> command-line argument. This makes the final binary output reproducible, and > > Only one argument? Sometimes you may want multiple -fdebug-prefix-map > options (for both source and build directories, for example). Perhaps the > value should be split on spaces to provide multiple such mappings? > We wanted to keep this environment variable simple, and allowing this would make it more costly for other projects to adopt. Whichever separator character we choose, we would have to either (a) disallow it within the oldprefix/newprefix strings like what PATH does or (b) think of an escaping scheme. Neither choice is great - with (a) since we want to map *arbitrary* paths the restriction is less acceptable here than in PATH, and with (b) it adds complexity to the spec, for uncommon use-cases. In the case of an out-of-tree build, it is still possible with the current proposal. Instead of: SOURCE_PREFIX_MAP="srcprefix=srcname:buildprefix=buildname" One could arrange the tree like: commonprefix/srcname commonprefix/buildname then set SOURCE_PREFIX_MAP="commonprefix=." or SOURCE_PREFIX_MAP="commonprefix/=". X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
[PATCH] Generate reproducible output independently of the build-path
(Please keep me on CC, I am not subscribed) Background == We are on a long journey to make build processes be able to reproduce the build outputs independently of which filesystem path the build is being executed from - e.g. if the executing user doesn't have root access to be able to execute it under a standard path such as /build. This currently is making about 2k-3k [1] packages in Debian unreproducible when build-paths are varied across builds. [1] https://tests.reproducible-builds.org/debian/issues/unstable/captures_build_path_issue.html Previous attempts have involved using -fdebug-prefix-map to strip away the prefix of an absolute path, leaving behind the part relative to the top-level directory of the source code, which is reproducible. But this flag was itself stored in DW_AT_producer, which makes the final output unreproducible. This was pointed out in bug 68848 and fixed in r231835. However, through more testing, we have found out that the fix just mentioned is not enough to achieve reproducibility in practice. The main issue is that many different packages like to store CFLAGS et. al. in arbitrary ways. So if we add an explicit -fdebug-prefix-map flag to the system-level CFLAGS etc, this will often propagate into the build result, making it again dependent on the build-path, and not reproducible. For example: Some packages embed compiler flags into their pkg-config files (or equivalent): https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/curl.html https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/perl.html https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/qt4-x11.html https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/fflas-ffpack.html https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/sip4.html Other packages embed compiler flags directly into the binary: https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/singular.html https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/mutt.html etc etc. We think it's not appropriate to patch all (3k+) of these packages to strip out -fdebug-prefix-map flags. This would involve adding quite complex logic to everyone's build scripts, and we have to adapt this logic every single time to that particular package. Also, in general CFLAGS is *supposed* to affect the compiler output, and saving it unconditionally is quite a reasonable thing for packages to do. If we tried to patch all of these packages, we would be turning "reproducible builds" in to a costly opt-in feature, rather than on-by-default that everyone can easily benefit from. So, we believe it is better to patch GCC centrally. Our proposed solution is similar to (a) the SOURCE_DATE_EPOCH environment variable which was previously accepted into GCC and was used to successfully make 400+ packages reproducible, and (b) the -fdebug-prefix-map mechanism that already exists in GCC and which nearly but not quite, achieves at-scale build-path-independent reproducibility. Proposal This patch series adds a new environment variable SOURCE_PREFIX_MAP. When this is set, GCC will treat this as an implicit "-fdebug-prefix-map=$value" command-line argument. This makes the final binary output reproducible, and also hides the unreproducible value (the build path prefix) from CFLAGS et. al. which everyone is (understandably) embedding as-is into their build output. This environment variable also acts on the __FILE__ macro, mapping it in the same way that debug-prefix-map works for debug symbols. We have seen that __FILE__ is also a very large source of unreproducibility, and is represented quite heavily in the 3k+ figure given above. Finally, we tweak the __TIMESTAMP__ macro so it honours SOURCE_DATE_EPOCH, in a similar way to how __DATE__ and __TIME__ do so already. More details are given in the headers of the patch files themselves. Testing === I've tested these patches on a Debian testing/unstable x86_64-linux-gnu system. So far I've only run the new tests that this patch adds, on a disable-bootstrap build. I will do a full bootstrap and run the full testsuite over the next few days, both with and without this patch, and report back. Copyright disclaimer I dedicate these patches to the public domain by waiving all of my rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. See https://creativecommons.org/publicdomain/zero/1.0/legalcode for full text. Please let me know if the above is insufficient and I will be happy to sign any relevant forms.
[PATCH 3/4] Use SOURCE_PREFIX_MAP envvar to transform __FILE__
Honour the SOURCE_PREFIX_MAP environment variable when expanding the __FILE__ macro, in the same way that debug-prefix-map works for debugging symbol paths. This patch follows similar lines to the earlier patch for SOURCE_DATE_EPOCH. Specifically, we read the environment variable not in libcpp but via a hook which has an implementation defined in gcc/c-family. However, to achieve this is more complex than the earlier patch: we need to share the prefix_map data structure and associated functions between libcpp and c-family. Therefore, we need to move these to libiberty. (For comparison, the SOURCE_DATE_EPOCH patch did not need this because time_t et. al. are in the standard C library.) Acknowledgements Dhole who wrote the earlier patch for SOURCE_DATE_EPOCH which saved me a lot of time on figuring out what to edit. ChangeLogs -- include/ChangeLog: 2016-11-01 Ximin Luo * prefix-map.h: New file, mostly derived from /gcc/final.c. libiberty/ChangeLog: 2016-11-01 Ximin Luo * prefix-map.c: New file, mostly derived from /gcc/final.c. * Makefile.in: Update for new files. gcc/ChangeLog: 2016-11-01 Ximin Luo * final.c: Generalise and refactor code related to debug_prefix_map. Move some of it to /libiberty/prefix-map.c, /include/prefix-map.h and refactor the remaining code to use the moved-out things. * doc/invoke.texi (Environment Variables): Update SOURCE_PREFIX_MAP to describe how it affects __FILE__ expansion. gcc/c-family/ChangeLog: 2016-11-01 Ximin Luo * c-common.c (cb_get_source_prefix_map): Define new call target. * c-common.h (cb_get_source_prefix_map): Declare call target. * c-lex.c (init_c_lex): Set the get_source_prefix_map callback. libcpp/ChangeLog: 2016-11-01 Ximin Luo * include/cpplib.h (cpp_callbacks): Add get_source_prefix_map callback. * init.c (cpp_create_reader): Initialise source_prefix_map field. * internal.h (cpp_reader): Add new field source_prefix_map. * macro.c (_cpp_builtin_macro_text): Set the source_prefix_map field if unset and apply it to the __FILE__ macro. gcc/testsuite/ChangeLog: 2016-11-01 Ximin Luo * gcc.dg/cpp/source_prefix_map-1.c: New test. * gcc.dg/cpp/source_prefix_map-2.c: New test. Index: gcc-7-20161030/include/prefix-map.h === --- /dev/null +++ gcc-7-20161030/include/prefix-map.h @@ -0,0 +1,71 @@ +/* Declarations for manipulating filename prefixes. + + Copyright (C) 2016 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2, or (at your option) + any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software Foundation, + Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA. */ + +#ifndef _PREFIX_MAP_H +#define _PREFIX_MAP_H + +#ifdef __cplusplus +extern "C" { +#endif + +#ifdef HAVE_STDLIB_H +#include +#endif + +/* Linked-list of mappings from old prefixes to new prefixes. */ + +struct prefix_map +{ + const char *old_prefix; + const char *new_prefix; + size_t old_len; + size_t new_len; + struct prefix_map *next; +}; + +/* Parse a single prefix-map. + + The string `arg' is split at the final '=' character. The part before + it is used to set `map->old_prefix' and `map->old_len', and the part + after it is used to set `map->new_prefix' and `map->new_len'. + + If `arg' does not contain a '=' then 0 is returned. Otherwise, a + non-zero value is returned. + */ + +extern int parse_prefix_map (const char *arg, struct prefix_map *map); + +/* Perform mapping of filename prefixes. + + Return the filename corresponding to `old_name'. The return value is + equal to `old_name' if no transformation occurred, else it is equal + to `new_name' where the new filename is stored. + + On entry into this function, `new_name' must be able to hold at least + `(old_name - map->old_len + map->old_len + 1)' characters, where + `map' is the mapping that will be selected and performed. + */ + +extern const char *apply_prefix_map (const char *old_name, char *new_name, +struct prefix_map *map_head); + +#ifdef __cplusplus +} +#endif + +#endif /* _PREFIX_MAP_H */ Index: gcc-7-20161030/libiberty/prefix-map.c ===
[PATCH 1/4] Use SOURCE_PREFIX_MAP envvar as an implicit debug-prefix-map
Define the SOURCE_PREFIX_MAP environment variable, and treat it as an implicit CLI -fdebug-prefix-map option specified before any explicit such options. Acknowledgements Daniel Kahn Gillmor who wrote the patch for r231835, which saved me a lot of time figuring out what to edit. HW42 for discussion on the details of the proposal, and for suggesting that we retain the ability to map the prefix to something other than ".". ChangeLogs -- gcc/ChangeLog: 2016-11-01 Ximin Luo * opts-global.c (add_debug_prefix_map_from_envvar): Add the value of SOURCE_PREFIX_MAP as a debug_prefix_map if the former is set. (handle_common_deferred_options): Call add_debug_prefix_map_from_envvar before processing options. * final.c: (add_debug_prefix_map): Be less specific in the error message, since it can also be triggered by the SOURCE_PREFIX_MAP variable. * doc/invoke.texi (Environment Variables): Document form and behaviour of SOURCE_PREFIX_MAP. gcc/testsuite/ChangeLog: 2016-11-01 Ximin Luo * gcc.dg/debug/dwarf2/source_prefix_map-1.c: New test. * gcc.dg/debug/dwarf2/source_prefix_map-2.c: New test. Index: gcc-7-20161030/gcc/opts-global.c === --- gcc-7-20161030.orig/gcc/opts-global.c +++ gcc-7-20161030/gcc/opts-global.c @@ -310,6 +310,21 @@ decode_options (struct gcc_options *opts finish_options (opts, opts_set, loc); } +/* Add a debug-prefix-map using the SOURCE_PREFIX_MAP environment variable if + it is set. */ + +static void +add_debug_prefix_map_from_envvar () +{ + char *prefix_map; + + prefix_map = getenv ("SOURCE_PREFIX_MAP"); + if (!prefix_map) +return; + + add_debug_prefix_map (prefix_map); +} + /* Hold command-line options associated with stack limitation. */ const char *opt_fstack_limit_symbol_arg = NULL; int opt_fstack_limit_register_no = -1; @@ -335,6 +350,8 @@ handle_common_deferred_options (void) if (flag_opt_info) opt_info_switch_p (NULL); + add_debug_prefix_map_from_envvar (); + FOR_EACH_VEC_ELT (v, i, opt) { switch (opt->opt_index) Index: gcc-7-20161030/gcc/final.c === --- gcc-7-20161030.orig/gcc/final.c +++ gcc-7-20161030/gcc/final.c @@ -1531,7 +1531,7 @@ add_debug_prefix_map (const char *arg) p = strchr (arg, '='); if (!p) { - error ("invalid argument %qs to -fdebug-prefix-map", arg); + error ("invalid value %qs for debug-prefix-map", arg); return; } map = XNEW (debug_prefix_map); Index: gcc-7-20161030/gcc/doc/invoke.texi === --- gcc-7-20161030.orig/gcc/doc/invoke.texi +++ gcc-7-20161030/gcc/doc/invoke.texi @@ -26222,6 +26222,23 @@ Recognize EUCJP characters. If @env{LANG} is not defined, or if it has some other value, then the compiler uses @code{mblen} and @code{mbtowc} as defined by the default locale to recognize and translate multibyte characters. + +@item SOURCE_PREFIX_MAP If this variable is set, it specifies a mapping +that is used to transform filepaths that are output in debug symbols. +This helps the embedded paths become reproducible, without having the +unreproducible value be visible in other input sources - such as GCC +command-line flags or standardised build-time environment variables like +@code{CFLAGS} - that are commonly legitimately-embedded in the build +output by higher-level build processes. + +The form and behaviour is similar to @option{-fdebug-prefix-map}. That +is, the value of @env{SOURCE_PREFIX_MAP} must be of the form +@samp{@var{old}=@var{new}}. The split occurs on the first @code{=} +character, so that @var{old} cannot itself contain a @code{=}. + +Whenever an absolute source- or build-related filepath is to be emitted +in a final end-result output, GCC will replace @var{old} with @var{new} +if that filepath starts with @var{old}. @end table @noindent Index: gcc-7-20161030/gcc/testsuite/gcc.dg/debug/dwarf2/source_prefix_map-1.c === --- /dev/null +++ gcc-7-20161030/gcc/testsuite/gcc.dg/debug/dwarf2/source_prefix_map-1.c @@ -0,0 +1,9 @@ +/* DW_AT_comp_dir should be relative if SOURCE_PREFIX_MAP is a prefix of it. */ +/* { dg-do compile } */ +/* { dg-options "-gdwarf -dA" } */ +/* { dg-set-compiler-env-var SOURCE_PREFIX_MAP "[file dirname [pwd]]=DWARF2TEST" } */ +/* { dg-final { scan-assembler-dem "DW_AT_comp_dir: \"DWARF2TEST/gcc" } } */ + +void func (void) +{ +} Index: gcc-7-20161030/gcc/testsuite/gcc.dg/debug/dwarf2/source_prefix_map-2.c === --- /dev/null +++ gcc-7-20161030/gcc/testsuite/gcc.dg/debug/dwarf2/source_prefix_map-2.c @@ -0,0 +1,8 @@ +/* DW_AT_comp_dir
[PATCH 4/4] Use SOURCE_DATE_EPOCH in place of __TIMESTAMP__ if the latter is newer.
This brings the behaviour in line with the __DATE__ and __TIME__ macros. Note though the minor difference: __TIMESTAMP__ is defined as the file-modification date and not the "current" date or time like the other two macros. Therefore, we do a clamping behaviour (similar to tar --clamp-mtime). Acknowledgements Reiner Herrmann suggested the clamping behaviour. ChangeLogs -- libcpp/ChangeLog: 2016-11-01 Ximin Luo * macro.c (_cpp_builtin_macro_text): Use SOURCE_DATE_EPOCH in place of __TIMESTAMP__ if the latter is newer than the former. gcc/ChangeLog: 2016-11-01 Ximin Luo * doc/cppenv.texi (Environment Variables): Update SOURCE_DATE_EPOCH to describe the new effect on __TIMESTAMP__. gcc/testsuite/ChangeLog: 2016-11-01 Ximin Luo * gcc.dg/cpp/source_date_epoch-4.c: New test. * gcc.dg/cpp/source_date_epoch-5.c: New test. Index: gcc-7-20161030/libcpp/macro.c === --- gcc-7-20161030.orig/libcpp/macro.c +++ gcc-7-20161030/libcpp/macro.c @@ -264,7 +264,30 @@ _cpp_builtin_macro_text (cpp_reader *pfi struct tm *tb = NULL; struct stat *st = _cpp_get_file_stat (file); if (st) - tb = localtime (&st->st_mtime); + { + /* Set a reproducible timestamp for __DATE__ and __TIME__ macro + if SOURCE_DATE_EPOCH is defined. */ + if (pfile->source_date_epoch == (time_t) -2 + && pfile->cb.get_source_date_epoch != NULL) + pfile->source_date_epoch = pfile->cb.get_source_date_epoch (pfile); + + if (pfile->source_date_epoch >= (time_t) 0) + { + /* If SOURCE_DATE_EPOCH is set, we want reproducible + timestamps so use gmtime not localtime. */ + if (st->st_mtime >= pfile->source_date_epoch) + tb = gmtime (&pfile->source_date_epoch); + else + /* Don't use SOURCE_DATE_EPOCH if the timestamp is +older, since that means it was probably already +set in a reproducible way (e.g. via source tarball +extraction or some other method). */ + tb = gmtime (&st->st_mtime); + } + else + tb = localtime (&st->st_mtime); + } + if (tb) { char *str = asctime (tb); Index: gcc-7-20161030/gcc/doc/cppenv.texi === --- gcc-7-20161030.orig/gcc/doc/cppenv.texi +++ gcc-7-20161030/gcc/doc/cppenv.texi @@ -83,8 +83,9 @@ main input file is omitted. @item SOURCE_DATE_EPOCH If this variable is set, its value specifies a UNIX timestamp to be used in replacement of the current date and time in the @code{__DATE__} -and @code{__TIME__} macros, so that the embedded timestamps become -reproducible. +and @code{__TIME__} macros, and in replacement of the file modification +date in the @code{__TIMESTAMP__} macro if the latter is newer than the +former, so that the embedded timestamps become reproducible. The value of @env{SOURCE_DATE_EPOCH} must be a UNIX timestamp, defined as the number of seconds (excluding leap seconds) since Index: gcc-7-20161030/gcc/testsuite/gcc.dg/cpp/source_date_epoch-4.c === --- /dev/null +++ gcc-7-20161030/gcc/testsuite/gcc.dg/cpp/source_date_epoch-4.c @@ -0,0 +1,13 @@ +/* __TIMESTAMP__ should use SOURCE_DATE_EPOCH if the latter is older. */ +/* { dg-do run } */ +/* { dg-set-compiler-env-var TZ "UTC" } */ +/* { dg-set-compiler-env-var LC_ALL "C" } */ +/* { dg-set-compiler-env-var SOURCE_DATE_EPOCH "0" } */ + +int +main () +{ + if (__builtin_strcmp (__TIMESTAMP__, "Thu Jan 1 00:00:00 1970") != 0) +__builtin_abort (); + return 0; +} Index: gcc-7-20161030/gcc/testsuite/gcc.dg/cpp/source_date_epoch-5.c === --- /dev/null +++ gcc-7-20161030/gcc/testsuite/gcc.dg/cpp/source_date_epoch-5.c @@ -0,0 +1,13 @@ +/* __TIMESTAMP__ should not use SOURCE_DATE_EPOCH if the latter is newer. */ +/* { dg-do run } */ +/* { dg-set-compiler-env-var TZ "UTC" } */ +/* { dg-set-compiler-env-var LC_ALL "C" } */ +/* { dg-set-compiler-env-var SOURCE_DATE_EPOCH "253402300799" } */ + +int +main () +{ + if (__builtin_strcmp (__TIMESTAMP__, "Fri Dec 31 23:59:59 UTC ") == 0) +__builtin_abort (); + return 0; +}
[PATCH 2/4] Split prefix-map values on the last '=' character, not the first
We are planning to ask other tools to support SOURCE_PREFIX_MAP, in the same way that we have already done this for SOURCE_DATE_EPOCH. So, this will last for some time and have quite wide reach. Consequently, we believe it is better to split on the final '=', since it is much less likely to result in problems. For example, with the previous behaviour (splitting on the first '=') it would not be possible to map paths like the following: C:\Documents and Settings\Betrand Russell\Proofs of 1+1=2\Automated Proofs\Source Code\ /srv/net/distributed-hash-table/address/VaIWP8YIWDChR2O76yiufXBsbw5g2skB_kp9VP-qVLvydovdGw==/projects/gcc-6/ These are contrived examples, but hopefully you can agree that they're not *so* unrealistic - future software or users might plausibly wish to run reproducible build processes underneath paths similar to these. On the other hand, I can think of much fewer cases where the new-prefix *must* include a '=' character. I can't think of any software project that includes it, and I'd imagine that any such (or future) projects that might exist would already have standardised "ASCII-only" versions of its name. ChangeLogs -- gcc/ChangeLog: 2016-11-01 Ximin Luo * final.c: (add_debug_prefix_map): Split on the last and not first '='. * doc/invoke.texi (Environment Variables): Update SOURCE_PREFIX_MAP to describe new parsing. Index: gcc-7-20161030/gcc/final.c === --- gcc-7-20161030.orig/gcc/final.c +++ gcc-7-20161030/gcc/final.c @@ -1528,7 +1528,7 @@ add_debug_prefix_map (const char *arg) debug_prefix_map *map; const char *p; - p = strchr (arg, '='); + p = strrchr (arg, '='); if (!p) { error ("invalid value %qs for debug-prefix-map", arg); Index: gcc-7-20161030/gcc/doc/invoke.texi === --- gcc-7-20161030.orig/gcc/doc/invoke.texi +++ gcc-7-20161030/gcc/doc/invoke.texi @@ -26233,8 +26233,8 @@ output by higher-level build processes. The form and behaviour is similar to @option{-fdebug-prefix-map}. That is, the value of @env{SOURCE_PREFIX_MAP} must be of the form -@samp{@var{old}=@var{new}}. The split occurs on the first @code{=} -character, so that @var{old} cannot itself contain a @code{=}. +@samp{@var{old}=@var{new}}. The split occurs on the last @code{=} +character, so that @var{new} cannot itself contain a @code{=}. Whenever an absolute source- or build-related filepath is to be emitted in a final end-result output, GCC will replace @var{old} with @var{new}
Re: [PATCH] PR77985: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path
Richard Biener: > On Mon, Oct 24, 2016 at 12:53 PM, Ximin Luo wrote: >> Richard Biener: >>> On Fri, Oct 21, 2016 at 12:56 PM, Ximin Luo wrote: >>>> Richard Biener: >>>>> On Tue, Oct 18, 2016 at 2:35 PM, Ximin Luo wrote: >>>>>> >>>>>> Thanks, I'll add the Changelog entry. My computer isn't very powerful, >>>>>> so I didn't bootstrap it yet, I only tested it on a stage1 compiler, on >>>>>> Debian testing/unstable. I'll find some time to bootstrap it and test it >>>>>> fully over the next few days. >>>>>> >>>>>> Shall I also get rid of the Darwin force_at_comp_dir stuff? Looking into >>>>>> it a bit more, my patch basically obsoletes the need for this so I can >>>>>> delete that as well. >>>>> >>>>> That would be nice. >>>>> >>>> >>>> Hi, >>>> >>>> Attached is the ChangeLog plus updated patch, rebased against the >>>> 2016-10-16 snapshot. Also I noticed I got the wrong bug number, the >>>> correct one is 77985 not 77895. >>>> >>>> I've tested it on a Debian testing/unstable x86_64-linux-gnu system. The >>>> results are good, the same tests fail both before and after the patch, and >>>> we have 2 new expected successes. Unfortunately I don't have access (and >>>> am unlikely to get access) to a Darwin system to test it on. >>>> >>>> Snippets of the test logs are attached. The full logs are about 200MB each >>>> in size (4MB XZ-compressed, each) so I guessed I shouldn't send them via >>>> email... The snippets were grepped from the logs using the '^FAIL: \|^# >>>> of\|pr77985' pattern. You can diff them to check that the results are same >>>> in both cases. >>> >>> The patch is ok. Do you have commit privileges? >>> >> >> No, I don't. > > Ok, I included the patch in a x86_64 bootstrap & regtest and committed > it as r241473. > Great, thanks! X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Re: [PATCH] PR77985: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path
Richard Biener: > On Fri, Oct 21, 2016 at 12:56 PM, Ximin Luo wrote: >> Richard Biener: >>> On Tue, Oct 18, 2016 at 2:35 PM, Ximin Luo wrote: >>>> >>>> Thanks, I'll add the Changelog entry. My computer isn't very powerful, so >>>> I didn't bootstrap it yet, I only tested it on a stage1 compiler, on >>>> Debian testing/unstable. I'll find some time to bootstrap it and test it >>>> fully over the next few days. >>>> >>>> Shall I also get rid of the Darwin force_at_comp_dir stuff? Looking into >>>> it a bit more, my patch basically obsoletes the need for this so I can >>>> delete that as well. >>> >>> That would be nice. >>> >> >> Hi, >> >> Attached is the ChangeLog plus updated patch, rebased against the 2016-10-16 >> snapshot. Also I noticed I got the wrong bug number, the correct one is >> 77985 not 77895. >> >> I've tested it on a Debian testing/unstable x86_64-linux-gnu system. The >> results are good, the same tests fail both before and after the patch, and >> we have 2 new expected successes. Unfortunately I don't have access (and am >> unlikely to get access) to a Darwin system to test it on. >> >> Snippets of the test logs are attached. The full logs are about 200MB each >> in size (4MB XZ-compressed, each) so I guessed I shouldn't send them via >> email... The snippets were grepped from the logs using the '^FAIL: \|^# >> of\|pr77985' pattern. You can diff them to check that the results are same >> in both cases. > > The patch is ok. Do you have commit privileges? > No, I don't. X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Re: [PATCH] PR77985: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path
Richard Biener: > On Tue, Oct 18, 2016 at 2:35 PM, Ximin Luo wrote: >> >> Thanks, I'll add the Changelog entry. My computer isn't very powerful, so I >> didn't bootstrap it yet, I only tested it on a stage1 compiler, on Debian >> testing/unstable. I'll find some time to bootstrap it and test it fully over >> the next few days. >> >> Shall I also get rid of the Darwin force_at_comp_dir stuff? Looking into it >> a bit more, my patch basically obsoletes the need for this so I can delete >> that as well. > > That would be nice. > Hi, Attached is the ChangeLog plus updated patch, rebased against the 2016-10-16 snapshot. Also I noticed I got the wrong bug number, the correct one is 77985 not 77895. I've tested it on a Debian testing/unstable x86_64-linux-gnu system. The results are good, the same tests fail both before and after the patch, and we have 2 new expected successes. Unfortunately I don't have access (and am unlikely to get access) to a Darwin system to test it on. Snippets of the test logs are attached. The full logs are about 200MB each in size (4MB XZ-compressed, each) so I guessed I shouldn't send them via email... The snippets were grepped from the logs using the '^FAIL: \|^# of\|pr77985' pattern. You can diff them to check that the results are same in both cases. X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git == gcc-build/gcc/testsuite/gcc/gcc.log == FAIL: c is -1, not 6303904 FAIL: v is -1, not 13 FAIL: e is -1, not 6304000 FAIL: c is -1, not 6303904 FAIL: v is -1, not 13 FAIL: e is -1, not 6304000 FAIL: c is -1, not 6303920 FAIL: v is -1, not 13 FAIL: e is -1, not 6304000 FAIL: c is -1, not 6303920 FAIL: v is -1, not 13 FAIL: e is -1, not 6304000 FAIL: c is -1, not 6303936 FAIL: v is -1, not 13 FAIL: e is -1, not 6304000 FAIL: c is -1, not 6303936 FAIL: v is -1, not 13 FAIL: e is -1, not 6304000 FAIL: c is -1, not 6303952 FAIL: v is -1, not 13 FAIL: e is -1, not 6304000 FAIL: c is -1, not 6303952 FAIL: v is -1, not 13 FAIL: e is -1, not 6304000 FAIL: c is -1, not 6303968 FAIL: v is -1, not 13 FAIL: e is -1, not 6304000 FAIL: c is -1, not 6303968 FAIL: v is -1, not 13 FAIL: e is -1, not 6304000 FAIL: c is -1, not 6303984 FAIL: v is -1, not 13 FAIL: e is -1, not 6304000 FAIL: ret is -1, not 6303984 FAIL: c is -1, not 6303920 FAIL: o is -1, not 6303904 FAIL: w is -1, not 6303984 FAIL: c is -1, not 6303920 FAIL: o is -1, not 6303904 FAIL: w is -1, not 6303984 FAIL: c is -1, not 6303936 FAIL: o is -1, not 6303920 FAIL: w is -1, not 6303984 FAIL: c is -1, not 6303936 FAIL: o is -1, not 6303920 FAIL: w is -1, not 6303984 FAIL: c is -1, not 6303952 FAIL: o is -1, not 6303936 FAIL: w is -1, not 6303984 FAIL: c is -1, not 6303952 FAIL: o is -1, not 6303936 FAIL: w is -1, not 6303984 FAIL: c is -1, not 6303968 FAIL: o is -1, not 6303952 FAIL: w is -1, not 6303984 FAIL: c is -1, not 6303968 FAIL: o is -1, not 6303952 FAIL: w is -1, not 6303984 FAIL: c is -1, not 6303984 FAIL: o is -1, not 6303968 FAIL: w is -1, not 6303984 FAIL: ret is -1, not 6303968 FAIL: c is -1, not 6303904 FAIL: e is -1, not 6303984 FAIL: c is -1, not 6303904 FAIL: e is -1, not 6303984 FAIL: c is -1, not 6303920 FAIL: e is -1, not 6303984 FAIL: c is -1, not 6303920 FAIL: e is -1, not 6303984 FAIL: c is -1, not 6303936 FAIL: e is -1, not 6303984 FAIL: c is -1, not 6303936 FAIL: e is -1, not 6303984 FAIL: c is -1, not 6303952 FAIL: e is -1, not 6303984 FAIL: c is -1, not 6303952 FAIL: e is -1, not 6303984 FAIL: c is -1, not 6303968 FAIL: e is -1, not 6303984 FAIL: c is -1, not 6303968 FAIL: e is -1, not 6303984 FAIL: ret is -1, not 0 FAIL: c is -1, not 6303904 FAIL: n is -1, not 6303920 FAIL: t is -1, not 6303984 FAIL: c is -1, not 6303904 FAIL: n is -1, not 6303920 FAIL: t is -1, not 6303984 FAIL: c is -1, not 6303920 FAIL: n is -1, not 6303936 FAIL: t is -1, not 6303984 FAIL: c is -1, not 6303920 FAIL: n is -1, not 6303936 FAIL: t is -1, not 6303984 FAIL: c is -1, not 6303936 FAIL: n is -1, not 6303952 FAIL: t is -1, not 6303984 FAIL: c is -1, not 6303936 FAIL: n is -1, not 6303952 FAIL: t is -1, not 6303984 FAIL: c is -1, not 6303952 FAIL: n is -1, not 6303968 FAIL: t is -1, not 6303984 FAIL: c is -1, not 6303952 FAIL: n is -1, not 6303968 FAIL: t is -1, not 6303984 FAIL: c is -1, not 6303968 FAIL: n is -1, not 6303984 FAIL: t is -1, not 6303984 FAIL: c is -1, not 6303968 FAIL: n is -1, not 6303984 FAIL: t is -1, not 6303984 FAIL: ret is -1, not 0 FAIL: 5 PASS, 114 FAIL, 0 UNRESOLVED FAIL: ret is -1, not 6299888 FAIL: o is -1, not 6299808 FAIL: w is -1, not 6299888 FAIL: o is -1, not 6299808 FAIL: w is -1, not 6299888 FAIL: o is -1, not 6299824 FAIL: w is -1, not 6299888 FAIL: o is -1, not 6299824 FAIL: w is -1, not 6299888 FAIL: o is -1, not 6299840 FAIL: w is -1, not 6299888 FAIL: o is -1, not 6299840 FAIL: w is -1, not 6299888 FAIL: o is
Re: [PATCH] PR77895: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path
Richard Biener: > On Mon, Oct 17, 2016 at 11:44 PM, Mike Stump wrote: >> On Oct 17, 2016, at 2:38 PM, Ximin Luo wrote: >>> >>> Mike Stump: >>>> On Oct 17, 2016, at 11:00 AM, Ximin Luo wrote: >>>>> Therefore, it is better to emit it in all circumstances, in case the >>>>> reader needs to know what the working >>>>> directory was at compile-time. >>>> >>>> I can't help but wonder if this would break ccache some? >>>> >>> >>> Could you explain this in some more detail? At the moment, GCC will already >>> emit DW_AT_name with an absolute path (in the scenario that this patch is >>> relevant to). How does ccache work around this at the moment? (Does it use >>> debug-prefix-map? In which case, this also affects DW_AT_comp_dir, so my >>> patch should be fine.) >> >> If you compile the same file, but in a different directory, I wonder if cwd >> will cause the cache entry to not be reused. >> >> I expect one of the ccache people that are around would just know if it will >> care at all. > > I believe ccache compares preprocessed source, definitely _not_ DWARF > output, so this shouldn't break anything there. > It might result in different object file output but as the reporter > figured due to a bug in dwarf2out.c we end up generating > DW_AT_comp_dir in almost all cases already. > > I think the patch is ok but it misses a ChangeLog entry. How did you > test the patch? (bootstrapped and tested on ...?) > Thanks, I'll add the Changelog entry. My computer isn't very powerful, so I didn't bootstrap it yet, I only tested it on a stage1 compiler, on Debian testing/unstable. I'll find some time to bootstrap it and test it fully over the next few days. Shall I also get rid of the Darwin force_at_comp_dir stuff? Looking into it a bit more, my patch basically obsoletes the need for this so I can delete that as well. X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Re: [PATCH] PR77895: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path
Mike Stump: > On Oct 17, 2016, at 11:00 AM, Ximin Luo wrote: >> Therefore, it is better to emit it in all circumstances, in case the reader >> needs to know what the working >> directory was at compile-time. > > I can't help but wonder if this would break ccache some? > Could you explain this in some more detail? At the moment, GCC will already emit DW_AT_name with an absolute path (in the scenario that this patch is relevant to). How does ccache work around this at the moment? (Does it use debug-prefix-map? In which case, this also affects DW_AT_comp_dir, so my patch should be fine.) X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
[PATCH] PR77895: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path
(Please keep me on CC, I am not subscribed) In working on another patch for DW_AT_comp_dir behaviour, I tried to write a test for that other patch. This test did not work as expected - DW_AT_comp_dir was not even generated in the resulting output! But this had nothing to do with my patch so I dug a bit deeper. It turns out that GCC does not emit DW_AT_comp_dir if the source path given is an absolute path, and the file being compiled contains a typedef to a compiler builtin (sorry, I don't know the terminology here very well). This typedef exists in the vast majority of cases in the real world via standard library includes, so from the outside, the behaviour of GCC "looks like" it emits DW_AT_comp_dir in all/most cases. >From looking at the source code dwarf2out.c, Richard Biener determined that the current code is written to "intend to" not emit DW_AT_comp_dir if to path is absolute, regardless of the typedef. It is possible to "fix" this bug by a 1-line change, to conform to the way currently "intended" by GCC code. However, I think a much better fix is a 22-line deletion of code from dwarf2out.c, to instead emit DW_AT_comp_dir in all cases. The original aim of not emitting DW_AT_comp_dir seems to be to avoid emitting redundant information. However, this information is *not* redundant! The DWARF2 spec says: "A DW_AT_comp_dir attribute whose value is a null-terminated string containing the current working directory of the compilation command that produced this compilation unit in whatever form makes sense for the host system." Conceptually, the working directory is independent from an input file, so it would *not* be redundant to emit it in the general case. In future versions of DWARF, other information might be added (such as relative -I flags, etc) that are dependent on knowing the working directory. The logic of "don't emit DW_AT_comp_dir if DW_AT_name is absolute" would then break. In other words, the choice to avoid emitting DW_AT_comp_dir is a brittle and premature optimisation that loses information in the general case. Therefore, it is better to emit it in all circumstances, in case the reader needs to know what the working directory was at compile-time. *Sometimes*, *parts* of it might be redundant yes - and rewriting DW_AT_name to be relative to this, would remove the redundancy. Doing this is compatible with the above points, and I could amend the patch to do this - although I suggest it's not worth the effort, unless there is a function in GCC code that already does this. Some more minor advantages are: - We delete 21 lines, this reduces complexity in the already-complex code. We can also get rid of the Darwin-specific workaround described here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53453 (Not currently part of my patch, but I can add this in.) - The GCC test suite compiles tests via their absolute paths, and many many other buildsystems do this too. This was in fact my original problem - I was trying to test something involving DW_AT_comp_dir, but GCC does not emit this in the testsuite! Rather than saying "this is a bug, everyone else should spend effort fixing this", it is better to fix it in one place, i.e. the source of where it is (not) generated. Thanks for your time, Ximin -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git Index: gcc-7-20161009/gcc/dwarf2out.c === --- gcc-7-20161009.orig/gcc/dwarf2out.c +++ gcc-7-20161009/gcc/dwarf2out.c @@ -21994,7 +21994,7 @@ gen_compile_unit_die (const char *filena { add_name_attribute (die, filename); /* Don't add cwd for . */ - if (!IS_ABSOLUTE_PATH (filename) && filename[0] != '<') + if (filename[0] != '<') add_comp_dir_attribute (die); } @@ -26332,20 +26332,6 @@ prune_unused_types (void) prune_unmark_dies (ctnode->root_die); } -/* Set the parameter to true if there are any relative pathnames in - the file table. */ -int -file_table_relative_p (dwarf_file_data **slot, bool *p) -{ - struct dwarf_file_data *d = *slot; - if (!IS_ABSOLUTE_PATH (d->filename)) -{ - *p = true; - return 0; -} - return 1; -} - /* Helpers to manipulate hash table of comdat type units. */ struct comdat_type_hasher : nofree_ptr_hash @@ -28152,15 +28138,7 @@ dwarf2out_early_finish (const char *file /* Add the name for the main input file now. We delayed this from dwarf2out_init to avoid complications with PCH. */ add_name_attribute (comp_unit_die (), remap_debug_filename (filename)); - if (!IS_ABSOLUTE_PATH (filename) || targetm.force_at_comp_dir) -add_comp_dir_attribute (comp_unit_die ()); - else if (get_AT (comp_unit_die (), DW_AT_comp_dir) == NULL) -{ - bool p = false; - file_table->traverse (&p); - if (p) - add_comp_dir_attribute (comp_unit_die ()); -} + add_comp_dir_attribute (comp_unit_die ());