Re: [PATCH v3] Ability to remap file names in __FILE__, etc (PR other/70268)

2018-01-22 Thread Ximin Luo
Boris Kolpackov:
> Ximin Luo  writes:
> 
>> -I to an absolute path is not that common for system / distro-built
>> stuff.
> 
> Ok, thanks for clarifying.
> 
> 
>> In the cases that it occurs, indeed it could and should be fixed
>> by the package buildsystem, e.g. by stripping a prefix when they
>> add -I flags to CFLAGS. But that's a separate issue from what
>> we're talking about here.
> 
> I believe it is the same issue: any package that blindly embeds
> information about how it was built into the result of the build
> does not care about reproducible builds.
> 

I don't think that's realistic position to take, and it seems like a fairly 
arbitrary assertion made merely to "brush this issue under the carpet". I think 
this because of the "implicit contract" of how CFLAGS is generally used:

CFLAGS (and other flags) is a way to pass information from multiple layers of 
programs down to GCC. It is necessary to "blindly embed" it if you want to save 
it *at all*, otherwise you are forced to violate the separation-of-concerns 
between layers in your buildsystem stack.

- Program A sets CFLAGS += --flags_A, then calls program B
- Program B sets CFLAGS += --flags_B, then calls program C
- Program C sets CFLAGS += --flags_C, then calls Make
- Make reads CFLAGS then calls $(CC) $(CFLAGS) [etc]

In general, you are not supposed to edit what was already in CFLAGS, you are 
supposed to append to it. That is why it's good practise to add "--no" options 
for various command-line flags in gcc (and other tools).

So each program X has its own flags_X that it is "responsible for". If all 
programs in the stack makes sure their own flags_X contain reproducible values, 
then the whole build will be reproducible even when recording CFLAGS.

However, this breaks in the case of -f*-prefix-map. If Program A sets this flag 
in flags_A, then this interferes with Program C, even though flags_C was fully 
reproducible.

Note that program B and C don't even know that they are being used with GCC, 
and so really have no business filtering -f*-prefix-map out of the CFLAGS that 
program A already set.

My proposal would allow all the flags_{A,B,C} to be reproducible, and avoid 
different programs having to embed layer-violating logic into themselves.

> If others do agree that this should be "fixed" in GCC, then I
> would suggest that you add a separate option for the environment
> variable case (e.g., -ffile-prefix-map-env) and sidestep the
> whole "how to portably distinguish a path from an environment
> variable" issue.
> 

That would also be fine for me sure.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PATCH v3] Ability to remap file names in __FILE__, etc (PR other/70268)

2018-01-20 Thread Ximin Luo
Boris Kolpackov:
> Ximin Luo  writes:
>> Boris Kolpackov:
>>
>>> This does feel like we are trying to fix the issue in the wrong place.
>>> Also, won't such broken packages normally store all options (including
>>> -I) rather than just CFLAGS? And if the answer is no, then perhaps you
>>> could put -ffile-prefix-map into CPPFLAGS instead of CFLAGS? Kind of
>>> even feels right seeing that it affects the preprocessor.
>>>
>>
>> The broad issue affects -fdebug-prefix-map as well, not just __FILE__.
> 
> -ffile-prefix-map is the "common" option that (currently)
> takes care of both debug and macro remapping (and should we
> add any new -f*-prefix-map, those as well).
> 
>> Such packages are not "broken"; GCC itself stores command-line
>> arguments in DW_AT_producer and as mentioned there was a bug to
>> special-case filtering -fdebug-prefix-map out of that.
>>
>> Now we'll have to add exceptions for -ffile-prefix-map and all
>> the other flags you added, and everything else that might be
>> added in future.
> 
> -ffile-prefix-map and -fmacro-prefix-map are already excluded.
> And there is a note at the appropriate place in source code to
> exclude any new -f*-prefix-map should they be added.
> 

Fair enough, I didn't look at your entire patch. But don't you think stripping 
the whole value is unnecessary? One advantage of the ENV: approach is that we 
can remove this filtering, and then the =$to part can be present in 
DW_AT_producer, indicating the fact that the flag was indeed used, and what the 
$to value was. That would be helpful for debugging actually.

> You also haven't answered my question about -I: don't the
> projects that embed CFLAGS also embed CPPFLAGS (and thus
> -I's with absolute/varying paths)? And if the answer is
> no, then can't you use the same approach to make
> -ffile-prefix-map a non-issue?
> 

-I to an absolute path is not that common for system / distro-built stuff. In 
the cases that it occurs, indeed it could and should be fixed by the package 
buildsystem, e.g. by stripping a prefix when they add -I flags to CFLAGS. But 
that's a separate issue from what we're talking about here.

One major use-case for this flag, is to be set for all builds by the 
distribution. (Setting it per-package would be much much more work.) In this 
case, even if the build is using relative paths for -I and setting $PWD to an 
relative path for __FILE__ (and this behaviour is not explicitly mentioned in 
any standard), as long as they save CFLAGS/CPPFLAGS their build would not be 
reproducible.

I don't think it would be fair to impose this behaviour on packages, even if 
they are doing *everything else correctly* for reproducibility purposes.

For example this one, the unreproducibility diff exists because we temporarily 
reverted to an unpatched dpkg (i.e. it sets -fdebug-prefix-map rather than 
using the envvar supported by our patched GCC):

https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/freeradius.html

(Then there are other programs that like to put cflags/cppflags in their --help 
text for debugging purposes, etc.)

It does *everything else right* but the presence of that system-wide flag 
causes it to be unreproducible. If GCC does not support an envvar or other 
similar method to pass in this flag, then these packages are the ones that 
suffer.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PATCH v3] Ability to remap file names in __FILE__, etc (PR other/70268)

2018-01-20 Thread Ximin Luo
Boris Kolpackov:
> Ximin Luo  writes:
> 
>> Higher-level build scripts sometimes like to save CFLAGS etc into the build 
>> output, making the overall build output unreproducible even if GCC is 
>> playing 
>> nicely. Rather than add logic to strip -f{file,debug,macro,...}-prefix-map, 
>> into all possible higher-level programs that might want to save CFLAGS, it is
>> simpler if GCC could read it from a user-specified environment variable.
> 
> This does feel like we are trying to fix the issue in the wrong place.
> Also, won't such broken packages normally store all options (including
> -I) rather than just CFLAGS? And if the answer is no, then perhaps you
> could put -ffile-prefix-map into CPPFLAGS instead of CFLAGS? Kind of
> even feels right seeing that it affects the preprocessor.
> 

The broad issue affects -fdebug-prefix-map as well, not just __FILE__.

Such packages are not "broken"; GCC itself stores command-line arguments in 
DW_AT_producer and as mentioned there was a bug to special-case filtering 
-fdebug-prefix-map out of that.

Now we'll have to add exceptions for -ffile-prefix-map and all the other flags 
you added, and everything else that might be added in future.

The same goes for all possible buildsystems / Makefile / etc that are affected 
by the issue.

Therefore, I concluded that the best and most appropriate solution was to fix 
it at the source. That is, allow it to be read from an environment variable, to 
"cancel out" the information that is already read from the filesystem - rather 
than forcing this build-specific non-deterministic information to be added to 
the command-line.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PATCH v3] Ability to remap file names in __FILE__, etc (PR other/70268)

2018-01-18 Thread Ximin Luo
Hello, I see Boris' patch was accepted with some further changes.

Any chance my additions for reproducible builds can be included as well? They 
are described below.

It should still apply on top, minus the strchr -> strrchr change which was 
folded into Boris' accepted version.

Ximin

Ximin Luo:
> Boris Kolpackov:
>> Thanks for the review. Third revision of the patch attached.
>>
>> [..]
> Here is a follow-up patch, meant to be applied on top of Boris' patch, to 
> address some issues we found based on discussions and experiments at the 
> Reproducible Builds project.
> 
> Most of this patch is re-implementing the patch from [1] originally by Daniel 
> Kahn Gillmor. Back then it was rejected in favour of a simpler approach 
> because 
> its necessity was not realised. However, more experiments since then have 
> revealed that its basic approach is useful in more general scenarios:
> 
> Higher-level build scripts sometimes like to save CFLAGS etc into the build 
> output, making the overall build output unreproducible even if GCC is playing 
> nicely. Rather than add logic to strip -f{file,debug,macro,...}-prefix-map, 
> into all possible higher-level programs that might want to save CFLAGS, it is
> simpler if GCC could read it from a user-specified environment variable. The
> fact that the name of the environment variable is still present on the
> command-line, should make it obvious what is being done, and avoid confusion
> that might arise if an implicit "magic" environment variable is used.
> 
> In the patch thread (see [1] and follow-ups) there was concern that the 
> prefix 
> '$' would cause awkward escaping behaviour, so the prefix 'ENV:' was chosen. 
> A 
> concern about this conflicting with AmigaOS paths was voiced but not 
> explored; 
> of course it can be changed to something else if needed.
> 
> (NetBSD is already carrying this patch in their system GCC [2], presumably for
> the same reason mentioned above.)
> 
> [1] https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01168.html
> [2] 
> http://cvsweb.netbsd.org/bsdweb.cgi/src/external/gpl3/gcc/dist/gcc/final.c?rev=1.2&content-type=text/x-cvsweb-markup
> 
> The other thing that this patch does, is to split the old and new prefixs by
> the last '=' sign rather than the first. This allows arbitrary old prefixes to
> be mapped, which would be more flexible than mapping to arbitrary new 
> prefixes.
> The rust compiler is adopting this strategy too: [3].
> 
> [3] https://github.com/rust-lang/rust/issues/41555#issuecomment-321078174
> 
> ChangeLogs
> --
> 
> gcc/ChangeLog:
> 
> 2017-12-19  Ximin Luo  
> 
>   PR other/70268
>   * file-prefix-map.c (add_prefix_map): Support reading old prefix from 
> the
>   environment. Split old and new parts by the last '=' rather than the 
> first.
>   * file-prefix-map.h: Define macros for the ENV: prefix.
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-12-19  Ximin Luo  
> 
>   PR other/70268
>   * c-c++-common/ffile-prefix-map-env.c: New test.
> 
> Index: gcc-8-20171217/gcc/file-prefix-map.c
> ===
> --- gcc-8-20171217.orig/gcc/file-prefix-map.c
> +++ gcc-8-20171217/gcc/file-prefix-map.c
> @@ -41,16 +41,40 @@ add_prefix_map (file_prefix_map *&maps,
>  {
>file_prefix_map *map;
>const char *p;
> +  char *env;
> +  const char *old;
> +  size_t oldlen;
>  
> -  p = strchr (arg, '=');
> +  p = strrchr (arg, '=');
>if (!p)
>  {
>error ("invalid argument %qs to %qs", arg, opt);
>return;
>  }
> +  if (0 == strncmp(FILE_PREFIX_MAP_ENV_PREFIX, arg, 
> FILE_PREFIX_MAP_ENV_PREFIX_OFFSET))
> +{
> +  const char *arg_offset = arg + FILE_PREFIX_MAP_ENV_PREFIX_OFFSET;
> +  env = xstrndup (arg_offset, p - arg_offset);
> +  old = getenv (env);
> +  if (!old)
> + {
> +   warning (0, "environment variable %qs not set in argument to %s",
> +env, opt);
> +   free (env);
> +   return;
> + }
> +  oldlen = strlen (old);
> +  free (env);
> +}
> +  else
> +{
> +  old = xstrndup (arg, p - arg);
> +  oldlen = p - arg;
> +}
> +
>map = XNEW (file_prefix_map);
> -  map->old_prefix = xstrndup (arg, p - arg);
> -  map->old_len = p - arg;
> +  map->old_prefix = old;
> +  map->old_len = oldlen;
>p++;
>map->new_prefix = xstrdup (p);
>map->new_len = strlen (p);
> Index: gcc-8-20171217/gcc/file-prefix-map.h
> ===

Re: [PATCH v3] Ability to remap file names in __FILE__, etc (PR other/70268)

2017-12-19 Thread Ximin Luo
Boris Kolpackov:
> Thanks for the review. Third revision of the patch attached.
> 
> [..]
Here is a follow-up patch, meant to be applied on top of Boris' patch, to 
address some issues we found based on discussions and experiments at the 
Reproducible Builds project.

Most of this patch is re-implementing the patch from [1] originally by Daniel 
Kahn Gillmor. Back then it was rejected in favour of a simpler approach because 
its necessity was not realised. However, more experiments since then have 
revealed that its basic approach is useful in more general scenarios:

Higher-level build scripts sometimes like to save CFLAGS etc into the build 
output, making the overall build output unreproducible even if GCC is playing 
nicely. Rather than add logic to strip -f{file,debug,macro,...}-prefix-map, 
into all possible higher-level programs that might want to save CFLAGS, it is
simpler if GCC could read it from a user-specified environment variable. The
fact that the name of the environment variable is still present on the
command-line, should make it obvious what is being done, and avoid confusion
that might arise if an implicit "magic" environment variable is used.

In the patch thread (see [1] and follow-ups) there was concern that the prefix 
'$' would cause awkward escaping behaviour, so the prefix 'ENV:' was chosen. A 
concern about this conflicting with AmigaOS paths was voiced but not explored; 
of course it can be changed to something else if needed.

(NetBSD is already carrying this patch in their system GCC [2], presumably for
the same reason mentioned above.)

[1] https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01168.html
[2] 
http://cvsweb.netbsd.org/bsdweb.cgi/src/external/gpl3/gcc/dist/gcc/final.c?rev=1.2&content-type=text/x-cvsweb-markup

The other thing that this patch does, is to split the old and new prefixs by
the last '=' sign rather than the first. This allows arbitrary old prefixes to
be mapped, which would be more flexible than mapping to arbitrary new prefixes.
The rust compiler is adopting this strategy too: [3].

[3] https://github.com/rust-lang/rust/issues/41555#issuecomment-321078174

ChangeLogs
--

gcc/ChangeLog:

2017-12-19  Ximin Luo  

PR other/70268
* file-prefix-map.c (add_prefix_map): Support reading old prefix from 
the
environment. Split old and new parts by the last '=' rather than the 
first.
* file-prefix-map.h: Define macros for the ENV: prefix.

gcc/testsuite/ChangeLog:

2017-12-19  Ximin Luo  

PR other/70268
* c-c++-common/ffile-prefix-map-env.c: New test.

Index: gcc-8-20171217/gcc/file-prefix-map.c
===
--- gcc-8-20171217.orig/gcc/file-prefix-map.c
+++ gcc-8-20171217/gcc/file-prefix-map.c
@@ -41,16 +41,40 @@ add_prefix_map (file_prefix_map *&maps,
 {
   file_prefix_map *map;
   const char *p;
+  char *env;
+  const char *old;
+  size_t oldlen;
 
-  p = strchr (arg, '=');
+  p = strrchr (arg, '=');
   if (!p)
 {
   error ("invalid argument %qs to %qs", arg, opt);
   return;
 }
+  if (0 == strncmp(FILE_PREFIX_MAP_ENV_PREFIX, arg, 
FILE_PREFIX_MAP_ENV_PREFIX_OFFSET))
+{
+  const char *arg_offset = arg + FILE_PREFIX_MAP_ENV_PREFIX_OFFSET;
+  env = xstrndup (arg_offset, p - arg_offset);
+  old = getenv (env);
+  if (!old)
+   {
+ warning (0, "environment variable %qs not set in argument to %s",
+  env, opt);
+ free (env);
+ return;
+   }
+  oldlen = strlen (old);
+  free (env);
+}
+  else
+{
+  old = xstrndup (arg, p - arg);
+  oldlen = p - arg;
+}
+
   map = XNEW (file_prefix_map);
-  map->old_prefix = xstrndup (arg, p - arg);
-  map->old_len = p - arg;
+  map->old_prefix = old;
+  map->old_len = oldlen;
   p++;
   map->new_prefix = xstrdup (p);
   map->new_len = strlen (p);
Index: gcc-8-20171217/gcc/file-prefix-map.h
===
--- gcc-8-20171217.orig/gcc/file-prefix-map.h
+++ gcc-8-20171217/gcc/file-prefix-map.h
@@ -18,6 +18,9 @@
 #ifndef GCC_FILE_PREFIX_MAP_H
 #define GCC_FILE_PREFIX_MAP_H
 
+#define FILE_PREFIX_MAP_ENV_PREFIX "ENV:"
+#define FILE_PREFIX_MAP_ENV_PREFIX_OFFSET (sizeof(FILE_PREFIX_MAP_ENV_PREFIX) 
- 1)
+
 void add_macro_prefix_map (const char *);
 void add_debug_prefix_map (const char *);
 void add_file_prefix_map (const char *);
Index: gcc-8-20171217/gcc/testsuite/c-c++-common/ffile-prefix-map-env.c
===
--- /dev/null
+++ gcc-8-20171217/gcc/testsuite/c-c++-common/ffile-prefix-map-env.c
@@ -0,0 +1,13 @@
+/* Test __builtin_FILE() with ENV: prefix-map. */
+/* { dg-do run } */
+/* { dg-set-compiler-env-var SRCDIR "$srcdir" } */
+/* { dg-options "-

Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path

2017-08-10 Thread Ximin Luo
Yury Gribov:
> On Thu, Aug 3, 2017 at 12:45 PM, Ximin Luo  wrote:
>> Yury Gribov:
>>> [..]
>>>
>>> Shouldn't -fdebug-prefix-map be updated to use the same syntax as 
>>> BUILD_PATH_PREFIX_MAP?
>>>
>>
>> -fdebug-prefix-map is a CLI option and can be given multiple times, each 
>> flag given is in the form of $from=$to where $from can't contain a '='.
>>
>> BUILD_PATH_PREFIX_MAP is a single envvar that encodes a list-of-pairs of the 
>> form $to=$from:$to=$from with some escaping for flexibility and to support 
>> things like windows paths. Since it's a new envvar, Ian Jackson suggested 
>> $to=$from to emphasise the reproducible ($to) part. I liked the idea so I 
>> implemented it like that. (We did a lot of bikeshedding over on the 
>> rb-general mailing list about the exact format and this is what we settled 
>> on, I'd like to avoid getting into that again but would nevertheless do it, 
>> if it's necessary to get this patch accepted.)
>>
>> Because -fdebug-prefix-map currently only encodes one $from=$to pair, it 
>> would be a very disruptive and highly backward-incompatible change to make 
>> it use the same syntax as B_P_P_M. A slightly less disruptive but still 
>> backward-incompatible change would be to make it encode a single $to=$from 
>> pair, but I don't really see the advantage to doing so - what were your 
>> thoughts on this?
> 
> I believe it would much easier to reason about environment variable
> behavior when it boils down to "prepend some standard flag to
> command-line flags".  It would also simplify maintenance of local
> compiler patch as core functionality can be merged to mainline GCC
> whereas debatable environment variable part stays in the distro.
> 

[answered in another email together with other related points]

>> If by "first class option" you meant a command-line flag, GCC *already has* 
>> that (-fdebug-prefix-map) and it wasn't enough to achieve reproducibility in 
>> many cases we tested.
>> dpkg-buildflags actually already adds these flags to CFLAGS CXXFLAGS etc on 
>> Debian. However, with this patch using the environment variable, we are able 
>> to reproduce 1800 more packages out of 26000.
> 
> Just curious, why -fdebug-prefix-map (maybe modified to support
> multiple renames) was not enough for these packages (and why they
> can't be fixed instead)?
> 

One important reason is that some packages embed CFLAGS/CXXFLAGS in build
output such as pkg-config files or Makefiles to be installed as examples. To
fix this, we'd have to add buildsystem-specific logic to strip out
-fdebug-prefix-map when it was writing such output. This does not affect all of
these 1800 packages, but I saw enough cases that I was convinced that the use
of a new envvar was a better approach - I don't think buildsystems should be
burdened with having to know "which flag values are reproducible vs not", this
is not the case with other CFLAGS.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path

2017-08-10 Thread Ximin Luo
Jakub Jelinek:
> On Fri, Aug 04, 2017 at 08:32:33AM -0400, Matthias Klose wrote:
 GCC already supports a similar environment variable SOURCE_DATE_EPOCH, 
 which was accepted about 2 years ago in a patch written by one of our GSoC 
 students. We are not planning any more environment variables like this, 
 and are committed to fixing other sources of non-determinism by patching 
 the relevant build scripts.
>>> I would have rejected that as well :-)  One of the few times I would
>>> have disagreed with Bernd.
>>
>> You can argue that gcc has command line options to set these, but most build
>> systems can be influenced by well documented environment variables like 
>> CFLAGS,
>> CXXFLAGS, LDFLAGS, so adding one more variable like SOURCE_DATE_EPOCH makes
>> sense, considering that many tools dealing with timestamps don't even have
>> command line options to do these (and there it's not just about compilers).
> 
> Unlike SOURCE_DATE_EPOCH, the other env vars are autoconf/cmake etc.
> related, we really shouldn't be adding more of these.
> If some package has messed up build system, you can use
> CXX='g++ -fwhatever'
> or whatever other way to pass flags you want to the compiler or pick the
> compiler you prefer to use.
> 

As I said, this is the only envvar that we're planning to add, we're not
planning to add any more. Please also see my point in my other email, about
how this is really more of a "cancel-out already-existing environment" piece of
data, different from other first-class options to GCC.

It would be really nice not to have to patch 1800 packages to make them
reproducible.

X

>> [..]

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path

2017-08-10 Thread Ximin Luo
Yury Gribov:
> [..]
> 
> In general any distro provides some way to set default parameters for
> every package.  Adding hacks to compiler to work around limitations of
> particular build system does not sound right.
> 
> Here's a relevant discussion from binutils ML:
> https://sourceware.org/ml/binutils/2009-05/msg00086.html
> 

I would argue that what is being worked around, is not a limitation of a
particular buildsystem but rather a limitation of the compiler and the
language. It is GCC itself that generates the unreproducible output, and this
issues affects all buildsystems. They do not explicitly request unreproducible
output, so the fix should not be the responsibility of buildsystem.

GCC takes the input from an environment-like object (the filesystem). Part of
this input (some prefix) differs between different builds, through no fault of
any particular buildsystem but rather through the "fault" of the user running
the build - they put the source and/or build directory under that path.

Since no buildsystem is responsible for this part of the filesystem layout, it
does not seem appropriate to me, to require buildsystems to add logic to deal
with this - e.g. to pass this information down through layers of the tooling
stack, to be eventually consumed by the GCC command line. Instead, it seems
much cleaner for the "fix" to be supplied by the user (the same entity that
controls the filesystem) via some other environment-like object (here, a UNIX
envvar), and for GCC to read this directly just like it reads the abspath() /
realpath() from the filesystem environment.

CFLAGS/CXXFLAGS is also not appropriate because these flags are meant to
directly affect how GCC works. By contrast, these prefix-map options are meant
to *cancel out* already-existing sources of unreproducibility from the
filesystem environment; they do not affect the output pro-actively like other
first-class options do. Also, adding -fdebug-prefix-map to CFLAGS/CXXFLAGS
causes unreproducibility elsewhere.

(from another email)
> I believe it would much easier to reason about environment variable
> behavior when it boils down to "prepend some standard flag to
> command-line flags".  It would also simplify maintenance of local
> compiler patch as core functionality can be merged to mainline GCC
> whereas debatable environment variable part stays in the distro.

To re-iterate my point: I don't think it's right to think of these prefix maps
as the same as other command line flags. They *cancel out* already-existing
sources of unreproducibility from the filesystem environment that GCC is
*already reading*, that already makes the output hard-to-reason-about (i.e.
unreproducible). So I think it is more appropriate to pass them in an
environment-like object, rather than via command-line flags.

Normally, system-wide CFLAGS etc are a static expression of policy. What I mean
by that is, writing logic to determine CFLAGS for particular package, would
"look like" the actual wording of that policy. So for example, "all packages
should have debug" => "CFLAGS+=-g", "packages that match property X should have
flag Y" => "if X then CFLAGS += Y". OTOH if your policy is "packages should be
reproducible" and we have to add these prefix-remapping flags to CFLAGS, you're
adding a dependency from the value of CFLAGS itself, onto the build-time
filesystem. So CFLAGS is no longer a static expression of policy, it depends on
information not part of the policy.

So, prefix-map is really different from other CFLAGS, it adds extra dependency
relationships that other flags don't add.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path

2017-08-03 Thread Ximin Luo
Jeff Law:
> On 08/02/2017 08:06 PM, Ximin Luo wrote:
>> Jeff Law:
>>> On 07/21/2017 10:15 AM, Ximin Luo wrote:
>>>> (Please keep me on CC, I am not subscribed)
>>>>
>>>>
>>>> Proposal
>>>> 
>>>>
>>>> This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. 
>>>> When
>>>> this is set, GCC will treat this as extra implicit 
>>>> "-fdebug-prefix-map=$value"
>>>> command-line arguments that precede any explicit ones. This makes the final
>>>> binary output reproducible, and also hides the unreproducible value (the 
>>>> source
>>>> path prefixes) from CFLAGS et. al. which many build tools (understandably)
>>>> embed as-is into their build output.
>>> I'd *really* avoid doing this with magic environment variables.  Make it
>>> a first class option to the compiler.  Yes, it means projects that want
>>> this behavior have to arrange to pass that flag to their compiler, but
>>> IMHO that's much preferred over environment variables.
>>>
>>> Jeff
>>>
>>
>> Hi Jeff,
>>
>> If by "first class option" you meant a command-line flag, GCC *already has* 
>> that (-fdebug-prefix-map) and it wasn't enough to achieve reproducibility in 
>> many cases we tested. dpkg-buildflags actually already adds these flags to 
>> CFLAGS CXXFLAGS etc on Debian. However, with this patch using the 
>> environment variable, we are able to reproduce 1800 more packages out of 
>> 26000.
> Then take what you've done with the environment variable and instead 
> implement it on top of a switch.  An environment variable is absolutely the 
> wrong thing to do here.
> 
>>
>> GCC already supports a similar environment variable SOURCE_DATE_EPOCH, which 
>> was accepted about 2 years ago in a patch written by one of our GSoC 
>> students. We are not planning any more environment variables like this, and 
>> are committed to fixing other sources of non-determinism by patching the 
>> relevant build scripts.
> I would have rejected that as well :-)  One of the few times I would
> have disagreed with Bernd.
> 
> 

Could you go into some more detail on why you think an envvar is absolutely the 
wrong thing to do here?

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path

2017-08-03 Thread Ximin Luo
Yury Gribov:
> On 03.08.2017 3:06, Ximin Luo wrote:
>> Jeff Law:
>>> On 07/21/2017 10:15 AM, Ximin Luo wrote:
>>>> (Please keep me on CC, I am not subscribed)
>>>>
>>>>
>>>> Proposal
>>>> 
>>>>
>>>> This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. 
>>>> When
>>>> this is set, GCC will treat this as extra implicit 
>>>> "-fdebug-prefix-map=$value"
>>>> command-line arguments that precede any explicit ones. This makes the final
>>>> binary output reproducible, and also hides the unreproducible value (the 
>>>> source
>>>> path prefixes) from CFLAGS et. al. which many build tools (understandably)
>>>> embed as-is into their build output.
>>> I'd *really* avoid doing this with magic environment variables.  Make it
>>> a first class option to the compiler.  Yes, it means projects that want
>>> this behavior have to arrange to pass that flag to their compiler, but
>>> IMHO that's much preferred over environment variables.
>>>
>>> Jeff
>>>
>>
>> Hi Jeff,
>>
>> If by "first class option" you meant a command-line flag, GCC *already has* 
>> that (-fdebug-prefix-map) > and it wasn't enough to achieve reproducibility 
>> in many cases we tested.
> 
> Shouldn't -fdebug-prefix-map be updated to use the same syntax as 
> BUILD_PATH_PREFIX_MAP?
> 

-fdebug-prefix-map is a CLI option and can be given multiple times, each flag 
given is in the form of $from=$to where $from can't contain a '='.

BUILD_PATH_PREFIX_MAP is a single envvar that encodes a list-of-pairs of the 
form $to=$from:$to=$from with some escaping for flexibility and to support 
things like windows paths. Since it's a new envvar, Ian Jackson suggested 
$to=$from to emphasise the reproducible ($to) part. I liked the idea so I 
implemented it like that. (We did a lot of bikeshedding over on the rb-general 
mailing list about the exact format and this is what we settled on, I'd like to 
avoid getting into that again but would nevertheless do it, if it's necessary 
to get this patch accepted.)

Because -fdebug-prefix-map currently only encodes one $from=$to pair, it would 
be a very disruptive and highly backward-incompatible change to make it use the 
same syntax as B_P_P_M. A slightly less disruptive but still 
backward-incompatible change would be to make it encode a single $to=$from 
pair, but I don't really see the advantage to doing so - what were your 
thoughts on this?

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path

2017-08-02 Thread Ximin Luo
Jeff Law:
> On 07/21/2017 10:15 AM, Ximin Luo wrote:
>> (Please keep me on CC, I am not subscribed)
>>
>>
>> Proposal
>> 
>>
>> This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. When
>> this is set, GCC will treat this as extra implicit 
>> "-fdebug-prefix-map=$value"
>> command-line arguments that precede any explicit ones. This makes the final
>> binary output reproducible, and also hides the unreproducible value (the 
>> source
>> path prefixes) from CFLAGS et. al. which many build tools (understandably)
>> embed as-is into their build output.
> I'd *really* avoid doing this with magic environment variables.  Make it
> a first class option to the compiler.  Yes, it means projects that want
> this behavior have to arrange to pass that flag to their compiler, but
> IMHO that's much preferred over environment variables.
> 
> Jeff
> 

Hi Jeff,

If by "first class option" you meant a command-line flag, GCC *already has* 
that (-fdebug-prefix-map) and it wasn't enough to achieve reproducibility in 
many cases we tested. dpkg-buildflags actually already adds these flags to 
CFLAGS CXXFLAGS etc on Debian. However, with this patch using the environment 
variable, we are able to reproduce 1800 more packages out of 26000.

GCC already supports a similar environment variable SOURCE_DATE_EPOCH, which 
was accepted about 2 years ago in a patch written by one of our GSoC students. 
We are not planning any more environment variables like this, and are committed 
to fixing other sources of non-determinism by patching the relevant build 
scripts.

However for build-paths, we think that this environment variable is the best 
practical approach here. It is even more prevalent than the timestamps issue - 
when SOURCE_DATE_EPOCH was accepted into GCC, we were able to reproduce about 
400 more packages. With BUILD_PATH_PREFIX_MAP, this number is about 1800.

I understand that environment variables have a bit of a "magical feel" to them, 
but I think that this scenario fits their use-cases well. The idea of an 
"environment" or a "context" is used in many scenarios in programming 
(including pure functional programming) when one wants to avoid passing the 
same arguments down multiple layers of function calls, especially when only 
some of these layers (and often, only the bottom one) use these arguments and 
the other layers totally ignore them (except to pass it down). I think this 
fits the situation here - patching many many build scripts simply to take a 
prefix-map and pass it downwards one layer in the "buildsystem stack", and do 
nothing else with this information, does not seem like a clean nor scalable 
solution to me. I can empathise that UNIX global environment variables is not 
the most elegant manifestation of this general idea of "an environment", but I 
think the overall end result in this case, is still cleaner than "boilerplate 
argument passing".

On top of all of this, we've seen that some build scripts will save GCC's 
command-line arguments into the build output. But this defeats the purpose of 
setting -fdebug-prefix-map for reproducibility purposes, since it contains an 
unreproducible value that is meant to be stripped out of the build output. 
Indeed, GCC itself used to do this in DW_AT_producer, until we submitted a 
patch over a year ago to specifically ignore -fdebug-prefix-map when writing 
DW_AT_producer. If we could only set -fdebug-prefix-map via the CLI and not 
this envvar, we'd have to teach this logic to all other build scripts that want 
to save compiler flags as well. By contrast, if we use a previously-unused 
envvar, and explicitly say that this should not be saved into build output, 
this problem goes away. (I think it is reasonable to save CLI flags into build 
output, because they very obviously affect what the program does. However, it 
is much less reasonable to save arbitrary envvars into build output, and we 
currently would treat this as a reproducibility bug and potential privacy leak.)

For all of these reasons, that is why I decided an envvar approach was best 
here. At least, there are even stronger arguments for this one, than for 
SOURCE_DATE_EPOCH.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


[PATCH 2/3] Use BUILD_PATH_PREFIX_MAP envvar to transform __FILE__

2017-07-21 Thread Ximin Luo
Use the BUILD_PATH_PREFIX_MAP environment variable when expanding the __FILE__
macro, in the same way that debug-prefix-map works for debugging symbol paths.

This patch follows similar lines to the earlier patch for SOURCE_DATE_EPOCH.
Specifically, we read the environment variable not in libcpp but via a hook
which has an implementation defined in gcc/c-family.  However, to achieve this
is more complex than the earlier patch: we need to share the prefix_map data
structure and associated functions between libcpp and c-family.  Therefore, we
need to move these to libiberty.  (For comparison, the SOURCE_DATE_EPOCH patch
did not need this because time_t et. al. are in the standard C library.)

Acknowledgements


Dhole  who wrote the earlier patch for SOURCE_DATE_EPOCH
which saved me a lot of time on figuring out what to edit.

ChangeLogs
--

gcc/ChangeLog:

2017-07-21  Ximin Luo  

* doc/invoke.texi (Environment Variables): Document form and behaviour
of BUILD_PATH_PREFIX_MAP.

gcc/c-family/ChangeLog:

2017-07-21  Ximin Luo  

* c-common.c (cb_get_build_path_prefix_map): Define new call target.
* c-common.h (cb_get_build_path_prefix_map): Declare call target.
* c-lex.c (init_c_lex): Set the get_build_path_prefix_map callback.

libcpp/ChangeLog:

2017-07-21  Ximin Luo  

* include/cpplib.h (cpp_callbacks): Add get_build_path_prefix_map
callback.
* init.c (cpp_create_reader): Initialise build_path_prefix_map field.
* internal.h (cpp_reader): Add new field build_path_prefix_map.
* macro.c (_cpp_builtin_macro_text): Set the build_path_prefix_map
field if unset and apply it when expanding __FILE__ macros.

gcc/testsuite/ChangeLog:

2017-07-21  Ximin Luo  

* gcc.dg/cpp/build_path_prefix_map-1.c: New test.
* gcc.dg/cpp/build_path_prefix_map-2.c: New test.

Index: gcc-8-20170716/gcc/doc/invoke.texi
===
--- gcc-8-20170716.orig/gcc/doc/invoke.texi
+++ gcc-8-20170716/gcc/doc/invoke.texi
@@ -27197,6 +27197,26 @@ Recognize EUCJP characters.
 If @env{LANG} is not defined, or if it has some other value, then the
 compiler uses @code{mblen} and @code{mbtowc} as defined by the default locale 
to
 recognize and translate multibyte characters.
+
+@item BUILD_PATH_PREFIX_MAP
+@findex BUILD_PATH_PREFIX_MAP
+If this variable is set, it specifies an ordered map used to transform
+filepaths output in debugging symbols and expansions of the @code{__FILE__}
+macro.  This may be used to achieve fully reproducible output.  In the context
+of running GCC within a higher-level build tool, it is typically more reliable
+than setting command line arguments such as @option{-fdebug-prefix-map} or
+common environment variables such as @env{CFLAGS}, since the build tool may
+save these latter values into other output outside of GCC's control.
+
+The value is of the form
+@samp{@var{dst@r{[0]}}=@var{src@r{[0]}}:@var{dst@r{[1]}}=@var{src@r{[1]}}@r{@dots{}}}.
+If any @var{dst@r{[}i@r{]}} or @var{src@r{[}i@r{]}} contains @code{%}, @code{=}
+or @code{:} characters, they must be replaced with @code{%#}, @code{%+}, and
+@code{%.} respectively.
+
+Whenever GCC emits a filepath that starts with a whole path component matching
+@var{src@r{[}i@r{]}} for some @var{i}, with rightmost @var{i} taking priority,
+the matching part is replaced with @var{dst@r{[}i@r{]}} in the final output.
 @end table
 
 @noindent
Index: gcc-8-20170716/gcc/c-family/c-common.c
===
--- gcc-8-20170716.orig/gcc/c-family/c-common.c
+++ gcc-8-20170716/gcc/c-family/c-common.c
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.
 
 #include "config.h"
 #include "system.h"
+#include "prefix-map.h"
 #include "coretypes.h"
 #include "target.h"
 #include "function.h"
@@ -7905,6 +7906,25 @@ cb_get_source_date_epoch (cpp_reader *pf
   return (time_t) epoch;
 }
 
+/* Read BUILD_PATH_PREFIX_MAP from environment to have deterministic relative
+   paths to replace embedded absolute paths to get reproducible results.
+   Returns NULL if BUILD_PATH_PREFIX_MAP is badly formed.  */
+
+prefix_map **
+cb_get_build_path_prefix_map (cpp_reader *pfile ATTRIBUTE_UNUSED)
+{
+  prefix_map **map = XCNEW (prefix_map *);
+
+  const char *arg = getenv ("BUILD_PATH_PREFIX_MAP");
+  if (!arg || prefix_map_parse (map, arg))
+return map;
+
+  free (map);
+  error_at (input_location, "environment variable BUILD_PATH_PREFIX_MAP is "
+   "not well formed; see the GCC documentation for more details.");
+  return NULL;
+}
+
 /* Callback for libcpp for offering spelling suggestions for misspelled
directives.  GOAL is an unrecognized string; CANDIDATES is a
NULL-terminated array of candidate strings.  Return the closest
Index: gcc-8-20170716/gcc/c-

[PATCH 1/3] Use BUILD_PATH_PREFIX_MAP envvar for debug-prefix-map

2017-07-21 Thread Ximin Luo
Define the BUILD_PATH_PREFIX_MAP environment variable, and treat it as implicit
-fdebug-prefix-map CLI options specified before any explicit such options.

Much of the generic code for applying and parsing prefix-maps is implemented in
libiberty instead of the dwarf2 parts of the code, in order to make subsequent
patches unrelated to debuginfo easier.

Acknowledgements


Daniel Kahn Gillmor who wrote the patch for r231835, which saved me a lot of
time figuring out what to edit.

HW42 for discussion on the details of the proposal, and for suggesting that we
retain the ability to map the prefix to something other than ".".

Other contributors to the BUILD_PATH_PREFIX_MAP specification, see
https://reproducible-builds.org/specs/build-path-prefix-map/

ChangeLogs
--

include/ChangeLog:

2017-07-21  Ximin Luo  

* prefix-map.h: New file implementing the BUILD_PATH_PREFIX_MAP
specification; includes code from /gcc/final.c and code adapted from
examples attached to the specification.

libiberty/ChangeLog:

2017-07-21  Ximin Luo  

* prefix-map.c: New file implementing the BUILD_PATH_PREFIX_MAP
specification; includes code from /gcc/final.c and code adapted from
examples attached to the specification.
* Makefile.in: Update for new files.

gcc/ChangeLog:

2017-07-21  Ximin Luo  

* debug.h: Declare add_debug_prefix_map_from_envvar.
* final.c: Define add_debug_prefix_map_from_envvar, and refactor
prefix-map utilities to use equivalent code from libiberty instead.
* opts-global.c: (handle_common_deferred_options): Call
add_debug_prefix_map_from_envvar before processing options.

gcc/testsuite/ChangeLog:

2017-07-21  Ximin Luo  

* lib/gcc-dg.exp: Allow dg-set-compiler-env-var to take only one
argument in which case it unsets the given env var.
* gcc.dg/debug/dwarf2/build_path_prefix_map-1.c: New test.
* gcc.dg/debug/dwarf2/build_path_prefix_map-2.c: New test.

Index: gcc-8-20170716/include/prefix-map.h
===
--- /dev/null
+++ gcc-8-20170716/include/prefix-map.h
@@ -0,0 +1,94 @@
+/* Declarations for manipulating filename prefixes.
+   Written 2017 by Ximin Luo 
+   This code is in the public domain. */
+
+#ifndef _PREFIX_MAP_H
+#define _PREFIX_MAP_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#ifdef HAVE_STDLIB_H
+#include 
+#endif
+
+/* Linked-list of mappings from old prefixes to new prefixes.  */
+
+struct prefix_map
+{
+  const char *old_prefix;
+  const char *new_prefix;
+  size_t old_len;
+  size_t new_len;
+  struct prefix_map *next;
+};
+
+
+/* Find a mapping suitable for the given OLD_NAME in the linked list MAP.\
+
+   If a mapping is found, writes a pointer to the non-matching suffix part of
+   OLD_NAME in SUFFIX, and its length in SUF_LEN.
+
+   Returns NULL if there was no suitable mapping.  */
+struct prefix_map *
+prefix_map_find (struct prefix_map *map, const char *old_name,
+const char **suffix, size_t *suf_len);
+
+/* Prepend a prefix map before a given SUFFIX.
+
+   The remapped name is written to NEW_NAME and returned as a const pointer. No
+   allocations are performed; the caller must ensure it can hold at least
+   MAP->NEW_LEN + SUF_LEN + 1 characters.  */
+const char *
+prefix_map_prepend (struct prefix_map *map, char *new_name,
+   const char *suffix, size_t suf_len);
+
+/* Remap a filename.
+
+   Returns OLD_NAME unchanged if there was no remapping, otherwise returns a
+   pointer to newly-allocated memory for the remapped filename.  The memory is
+   allocated by the given ALLOC function, which also determines who is
+   responsible for freeing it.  */
+#define prefix_map_remap_alloc_(map_head, old_name, alloc)\
+  __extension__
   \
+  ({  \
+const char *__suffix; \
+size_t __suf_len; \
+struct prefix_map *__map; \
+(__map = prefix_map_find ((map_head), (old_name), &__suffix, &__suf_len))  
\
+  ? prefix_map_prepend (__map,\
+   (char *) alloc (__map->new_len + __suf_len + 1),   \
+   __suffix, __suf_len)   \
+  : (old_name);   \
+  })
+
+/* Remap a filename.
+
+   Returns OLD_NAME unchanged if there was no remapping, otherwise returns a
+   stack-allocated pointer to the newly-remapped filename.  */
+#define prefix_map_remap_alloca(map_head, old_name) \
+  prefix_map_remap_alloc_ (map_head, old_name,

[PATCH 3/3] When remapping paths, only match whole path components

2017-07-21 Thread Ximin Luo
Change the remapping algorithm so that each old_prefix only matches paths that
have old_prefix as a whole path component prefix.  (A whole path component is a
part of a path that begins and ends at a directory separator or at either end
of the path string.)

This remapping algorithm is more predictable than the old algorithm, because
there is no chance of mappings for one directory interfering with mappings for
other directories.  It contains less corner cases and therefore it is easier
for users to figure out how to set the mapping appropriately.  Therefore, I
believe it is better as a standardised algorithm that other build tools might
like to adopt, and so in our BUILD_PATH_PREFIX_MAP specification we recommend
this algorithm - though we allow others, and explicitly mention GCC's current
algorithm.  But it would be good for GCC to adopt this newer and cleaner one.

(The original idea came from discussions with rustc developers on this topic.)

This does technically break backwards-compatibility, but I was under the
impression that this option was not seen as such a critical feature, that this
would be too important.  Nevertheless, this part is totally independent from
the other patches and may be included or excluded as GCC maintainers desire.

Acknowledgements


Discussions with Michael Woerister and other members of the Rust compiler team
on Github, and discussions with Daniel Shahaf on the rb-general@ mailing list
on lists.reproducible-builds.org.

ChangeLogs
--

libiberty/ChangeLog:

2017-07-21  Ximin Luo  

* prefix-map.c: When remapping paths, only match whole path components.

Index: gcc-8-20170716/libiberty/prefix-map.c
===
--- gcc-8-20170716.orig/libiberty/prefix-map.c
+++ gcc-8-20170716/libiberty/prefix-map.c
@@ -87,12 +87,22 @@ struct prefix_map *
 prefix_map_find (struct prefix_map *map, const char *old_name,
 const char **suffix, size_t *suf_len)
 {
+  size_t len;
+
   for (; map; map = map->next)
-if (filename_ncmp (old_name, map->old_prefix, map->old_len) == 0)
-  {
-   *suf_len = strlen (*suffix = old_name + map->old_len);
-   break;
-  }
+{
+  len = map->old_len;
+  /* Ignore trailing path separators at the end of old_prefix */
+  while (len > 0 && IS_DIR_SEPARATOR (map->old_prefix[len-1])) len--;
+  /* Check if old_name matches old_prefix at a path component boundary */
+  if (! filename_ncmp (old_name, map->old_prefix, len)
+ && (IS_DIR_SEPARATOR (old_name[len])
+ || old_name[len] == '\0'))
+   {
+ *suf_len = strlen (*suffix = old_name + len);
+ break;
+   }
+}
 
   return map;
 }


[PING^4][PATCH v2] Generate reproducible output independently of the build-path

2017-07-21 Thread Ximin Luo
(Please keep me on CC, I am not subscribed)


Proposal


This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. When
this is set, GCC will treat this as extra implicit "-fdebug-prefix-map=$value"
command-line arguments that precede any explicit ones. This makes the final
binary output reproducible, and also hides the unreproducible value (the source
path prefixes) from CFLAGS et. al. which many build tools (understandably)
embed as-is into their build output.

This environment variable also acts on the __FILE__ macro, mapping it in the
same way that debug-prefix-map works for debug symbols. We have seen that
__FILE__ is also a very large source of unreproducibility, and is represented
quite heavily in the 3k+ figure given earlier.

Finally, we tweak the mapping algorithm so that it applies only to whole path
components when matching prefixes. This is justified in further detail in the
patch header. It is an optional part of the patch series and could be dropped
if the GCC maintainers are not convinced by our arguments there.


Background
==

We have prepared a document that describes how this works in detail, so that
projects can be confident that they are interoperable:

https://reproducible-builds.org/specs/build-path-prefix-map/

The specification is currently in DRAFT status, awaiting some final feedback,
including what the GCC maintainers think about it.

We have written up some more detailed discussions on the topic, including a
thorough justification on why we chose the mechanism of environment variables:

https://wiki.debian.org/ReproducibleBuilds/StandardEnvironmentVariables

The previous iteration of the patch series, essentially the same as the current
re-submission, is here:

https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00513.html

An older version, that explains some GCC-specific background, is here:

https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00182.html

The current patch series applies cleanly to GCC-8 snapshot 20170716.


Reproducibility testing
===

Over the past 3 months, we have tested this patch backported to Debian GCC-6.
Together with a patched dpkg that sets the environment variable appropriately,
it allows us to reproduce ~1800 extra packages.

This is about 6.8% of ~26400 Debian source packages, and just over 1/2 of the
ones whose irreproducibility is due to build-path issues.

https://tests.reproducible-builds.org/debian/issues/unstable/gcc_captures_build_path_issue.html
https://tests.reproducible-builds.org/debian/unstable/index_suite_amd64_stats.html

The first major increase around 2017-04 is due to us deploying this patch. The
next major increase later in 2017-04 is unrelated, due to us deploying a patch
for R. The dip during the last part of 2017-06 is due to unpatched and patched
packages getting out-of-sync partly because of extra admin work around the
Debian stretch release, and we believe that the green will soon return to their
previous high after this situation settles.


Unit testing


I've tested these patches on a Debian unstable x86_64-linux-gnu schroot running
inside a Debian jessie system, on a full-bootstrap build. The output of
contrib/compare_tests is as follows:


gcc-8-20170716$ contrib/compare_tests ../gcc-build-{0,1}
# Comparing directories
## Dir1=../gcc-build-0: 8 sum files
## Dir2=../gcc-build-1: 8 sum files

# Comparing 8 common sum files
## /bin/sh contrib/compare_tests  /tmp/gxx-sum1.13468 /tmp/gxx-sum2.13468
New tests that PASS:

gcc.dg/cpp/build_path_prefix_map-1.c (test for excess errors)
gcc.dg/cpp/build_path_prefix_map-1.c execution test
gcc.dg/cpp/build_path_prefix_map-2.c (test for excess errors)
gcc.dg/cpp/build_path_prefix_map-2.c execution test
gcc.dg/debug/dwarf2/build_path_prefix_map-1.c (test for excess errors)
gcc.dg/debug/dwarf2/build_path_prefix_map-1.c scan-assembler DW_AT_comp_dir: 
"DWARF2TEST/gcc
gcc.dg/debug/dwarf2/build_path_prefix_map-2.c (test for excess errors)
gcc.dg/debug/dwarf2/build_path_prefix_map-2.c scan-assembler DW_AT_comp_dir: "/

# No differences found in 8 common sum files


I can also provide the full logs on request.


Fuzzing
===

I've also fuzzed the prefix-map code using AFL with ASAN enabled. Due to how
AFL works I did not fuzz this patch directly but a smaller program with just
the parser and remapper, available here:

https://anonscm.debian.org/cgit/reproducible/build-path-prefix-map-spec.git/tree/consume

Over the course of about ~4k cycles, no crashes were found.

To reproduce, you could run something like:

$ echo performance | sudo tee 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
$ make CC=afl-gcc clean reset-fuzz-pecsplit.c fuzz-pecsplit.c


Copyright disclaimer


I've signed a copyright disclaimer and the FSF has this on record. (RT #1209764)



Re: [Ping ^3][PATCH v2] Generate reproducible output independently of the build-path

2017-06-29 Thread Ximin Luo
Dear GCC Global Reviewers,

Could any of you please review my patch series? It's about being able to 
reproducibly build things, even when the build machines are executing the build 
under different paths.

Overview:
https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00513.html

Full thread, including individual patches:
https://gcc.gnu.org/ml/gcc-patches/2017-04/threads.html#00513

Follow-up report:
https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00781.html

In summary, this patch helps ~1800/26000 packages in Debian to become 
reproducible even when the build-path is varied across builds.

I've signed a copyright disclaimer and the FSF has this on record.

X

Ximin Luo:
> Ximin Luo:
>> Joseph Myers:
>>> On Tue, 11 Apr 2017, Ximin Luo wrote:
>>>
>>>> Copyright disclaimer
>>>> 
>>>>
>>>> I dedicate these patches to the public domain by waiving all of my rights 
>>>> to
>>>> the work worldwide under copyright law, including all related and 
>>>> neighboring
>>>> rights, to the extent allowed by law.
>>>>
>>>> See https://creativecommons.org/publicdomain/zero/1.0/legalcode for full 
>>>> text.
>>>>
>>>> Please let me know if the above is insufficient and I will be happy to 
>>>> sign any
>>>> relevant forms.
>>>
>>> I believe the FSF wants its own disclaimer forms signed as evidence code 
>>> is in the public domain.  The process for getting disclaimer forms is to 
>>> complete 
>>> https://git.savannah.gnu.org/cgit/gnulib.git/plain/doc/Copyright/request-disclaim.changes
>>>  
>>> and then you should be sent a disclaimer form for disclaiming the 
>>> particular set of changes you have completed (if you then make further 
>>> significant changes afterwards, the disclaimer form would then need 
>>> completing for them as well).
>>>
>>
>> I've now done this, and the copyright clerk at the FSF has told me that this 
>> is complete on their side as well.
>>
>> Did any of you get a chance to look at the patch yet?
>>
> 
> Hi GCC patches list,
> 
> Any progress or feedback on this patch series?
> 
> Ximin
> 

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PATCH v2] Generate reproducible output independently of the build-path

2017-06-07 Thread Ximin Luo
Ximin Luo:
> Joseph Myers:
>> On Tue, 11 Apr 2017, Ximin Luo wrote:
>>
>>> Copyright disclaimer
>>> 
>>>
>>> I dedicate these patches to the public domain by waiving all of my rights to
>>> the work worldwide under copyright law, including all related and 
>>> neighboring
>>> rights, to the extent allowed by law.
>>>
>>> See https://creativecommons.org/publicdomain/zero/1.0/legalcode for full 
>>> text.
>>>
>>> Please let me know if the above is insufficient and I will be happy to sign 
>>> any
>>> relevant forms.
>>
>> I believe the FSF wants its own disclaimer forms signed as evidence code 
>> is in the public domain.  The process for getting disclaimer forms is to 
>> complete 
>> https://git.savannah.gnu.org/cgit/gnulib.git/plain/doc/Copyright/request-disclaim.changes
>>  
>> and then you should be sent a disclaimer form for disclaiming the 
>> particular set of changes you have completed (if you then make further 
>> significant changes afterwards, the disclaimer form would then need 
>> completing for them as well).
>>
> 
> I've now done this, and the copyright clerk at the FSF has told me that this 
> is complete on their side as well.
> 
> Did any of you get a chance to look at the patch yet?
> 

Hi GCC patches list,

Any progress or feedback on this patch series?

Ximin

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PATCH v2] Generate reproducible output independently of the build-path

2017-05-03 Thread Ximin Luo
Joseph Myers:
> On Tue, 11 Apr 2017, Ximin Luo wrote:
> 
>> Copyright disclaimer
>> 
>>
>> I dedicate these patches to the public domain by waiving all of my rights to
>> the work worldwide under copyright law, including all related and neighboring
>> rights, to the extent allowed by law.
>>
>> See https://creativecommons.org/publicdomain/zero/1.0/legalcode for full 
>> text.
>>
>> Please let me know if the above is insufficient and I will be happy to sign 
>> any
>> relevant forms.
> 
> I believe the FSF wants its own disclaimer forms signed as evidence code 
> is in the public domain.  The process for getting disclaimer forms is to 
> complete 
> https://git.savannah.gnu.org/cgit/gnulib.git/plain/doc/Copyright/request-disclaim.changes
>  
> and then you should be sent a disclaimer form for disclaiming the 
> particular set of changes you have completed (if you then make further 
> significant changes afterwards, the disclaimer form would then need 
> completing for them as well).
> 

I've now done this, and the copyright clerk at the FSF has told me that this is 
complete on their side as well.

Did any of you get a chance to look at the patch yet?

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PATCH v2] Generate reproducible output independently of the build-path

2017-04-18 Thread Ximin Luo
Ximin Luo:
> [..]
> 
> I will soon test this patch backported to Debian GCC-6 on
> tests.reproducible-builds.org and will have results in a few days or weeks.
> Some preliminary tests earlier gave good results (about +40 packages
> reproducible over ~2 days) but we had to abort due to some misscheduling.
> 
> [..]

This has been completed and we reproduced ~1700 extra packages when building 
with a GCC-6 with this patch, as well as a patched dpkg that sets the 
environment variable appropriately.

This is about 6.5% of ~26100 Debian source packages, and about 1/2 of the ones 
whose irreproducibility is due to build-path issues. Most of the rest are not 
related to GCC, such as things built by R, OCaml, Erlang, LLVM, PDF IDs, etc, 
etc.

https://tests.reproducible-builds.org/debian/unstable/index_suite_amd64_stats.html

The dip afterwards is due to reverting back to an unpatched GCC-6, but I'll be 
rebasing the patch continually over the next few weeks so the graph should stay 
up.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


[PATCH 2/3] Use BUILD_PATH_PREFIX_MAP envvar to transform __FILE__

2017-04-11 Thread Ximin Luo
Use the BUILD_PATH_PREFIX_MAP environment variable when expanding the __FILE__
macro, in the same way that debug-prefix-map works for debugging symbol paths.

This patch follows similar lines to the earlier patch for SOURCE_DATE_EPOCH.
Specifically, we read the environment variable not in libcpp but via a hook
which has an implementation defined in gcc/c-family.  However, to achieve this
is more complex than the earlier patch: we need to share the prefix_map data
structure and associated functions between libcpp and c-family.  Therefore, we
need to move these to libiberty.  (For comparison, the SOURCE_DATE_EPOCH patch
did not need this because time_t et. al. are in the standard C library.)

Acknowledgements


Dhole  who wrote the earlier patch for SOURCE_DATE_EPOCH
which saved me a lot of time on figuring out what to edit.

ChangeLogs
--

gcc/c-family/ChangeLog:

2017-03-27  Ximin Luo  

* c-common.c (cb_get_build_path_prefix_map): Define new call target.
* c-common.h (cb_get_build_path_prefix_map): Declare call target.
* c-lex.c (init_c_lex): Set the get_build_path_prefix_map callback.

libcpp/ChangeLog:

2017-03-27  Ximin Luo  

* include/cpplib.h (cpp_callbacks): Add get_build_path_prefix_map
callback.
* init.c (cpp_create_reader): Initialise build_path_prefix_map field.
* internal.h (cpp_reader): Add new field build_path_prefix_map.
* macro.c (_cpp_builtin_macro_text): Set the build_path_prefix_map
field if unset and apply it when expanding __FILE__ macros.

gcc/testsuite/ChangeLog:

2017-03-27  Ximin Luo  

* gcc.dg/cpp/build_path_prefix_map-1.c: New test.
* gcc.dg/cpp/build_path_prefix_map-2.c: New test.

Index: gcc-7-20170402/gcc/c-family/c-common.c
===
--- gcc-7-20170402.orig/gcc/c-family/c-common.c
+++ gcc-7-20170402/gcc/c-family/c-common.c
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.
 
 #include "config.h"
 #include "system.h"
+#include "prefix-map.h"
 #include "coretypes.h"
 #include "target.h"
 #include "function.h"
@@ -8012,6 +8013,25 @@ cb_get_source_date_epoch (cpp_reader *pf
   return (time_t) epoch;
 }
 
+/* Read BUILD_PATH_PREFIX_MAP from environment to have deterministic relative
+   paths to replace embedded absolute paths to get reproducible results.
+   Returns NULL if BUILD_PATH_PREFIX_MAP is badly formed.  */
+
+prefix_map **
+cb_get_build_path_prefix_map (cpp_reader *pfile ATTRIBUTE_UNUSED)
+{
+  prefix_map **map = XCNEW (prefix_map *);
+
+  const char *arg = getenv ("BUILD_PATH_PREFIX_MAP");
+  if (!arg || prefix_map_parse (map, arg))
+return map;
+
+  free (map);
+  error_at (input_location, "environment variable BUILD_PATH_PREFIX_MAP is "
+   "not well formed; see the GCC documentation for more details.");
+  return NULL;
+}
+
 /* Callback for libcpp for offering spelling suggestions for misspelled
directives.  GOAL is an unrecognized string; CANDIDATES is a
NULL-terminated array of candidate strings.  Return the closest
Index: gcc-7-20170402/gcc/c-family/c-common.h
===
--- gcc-7-20170402.orig/gcc/c-family/c-common.h
+++ gcc-7-20170402/gcc/c-family/c-common.h
@@ -1085,6 +1085,11 @@ extern time_t cb_get_source_date_epoch (
__TIME__ can store.  */
 #define MAX_SOURCE_DATE_EPOCH HOST_WIDE_INT_C (253402300799)
 
+/* Read BUILD_PATH_PREFIX_MAP from environment to have deterministic relative
+   paths to replace embedded absolute paths to get reproducible results.
+   Returns NULL if BUILD_PATH_PREFIX_MAP is badly formed.  */
+extern prefix_map **cb_get_build_path_prefix_map (cpp_reader *pfile);
+
 /* Callback for libcpp for offering spelling suggestions for misspelled
directives.  */
 extern const char *cb_get_suggestion (cpp_reader *, const char *,
Index: gcc-7-20170402/gcc/c-family/c-lex.c
===
--- gcc-7-20170402.orig/gcc/c-family/c-lex.c
+++ gcc-7-20170402/gcc/c-family/c-lex.c
@@ -81,6 +81,7 @@ init_c_lex (void)
   cb->read_pch = c_common_read_pch;
   cb->has_attribute = c_common_has_attribute;
   cb->get_source_date_epoch = cb_get_source_date_epoch;
+  cb->get_build_path_prefix_map = cb_get_build_path_prefix_map;
   cb->get_suggestion = cb_get_suggestion;
 
   /* Set the debug callbacks if we can use them.  */
Index: gcc-7-20170402/libcpp/include/cpplib.h
===
--- gcc-7-20170402.orig/libcpp/include/cpplib.h
+++ gcc-7-20170402/libcpp/include/cpplib.h
@@ -607,6 +607,9 @@ struct cpp_callbacks
   /* Callback to parse SOURCE_DATE_EPOCH from environment.  */
   time_t (*get_source_date_epoch) (cpp_reader *);
 
+  /* Callback to parse BUILD_PATH_PREFIX_MAP from environment.  */
+  st

[PATCH v2] Generate reproducible output independently of the build-path

2017-04-11 Thread Ximin Luo
(Please keep me on CC, I am not subscribed)

Background
==

Previous background is here: 
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00182.html

Upon further discussion, we decided to add support for multiple mappings and to
rename the environment variable to BUILD_PATH_PREFIX_MAP. We have also prepared
a document that describes how this works in detail, so that projects can be
confident that they are interoperable:

https://reproducible-builds.org/specs/build-path-prefix-map/

The specification is currently in DRAFT status, awaiting some final feedback,
including what the GCC maintainers think about it.

If one is interested in reading about this topic in the wider context of
reproducible builds, there's some more background here:

https://wiki.debian.org/ReproducibleBuilds/StandardEnvironmentVariables

Proposal


This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. When
this is set, GCC will treat this as extra implicit "-fdebug-prefix-map=$value"
command-line arguments that precede any explicit ones. This makes the final
binary output reproducible, and also hides the unreproducible value (the source
path prefixes) from CFLAGS et. al. which many build tools (understandably)
embed as-is into their build output.

This environment variable also acts on the __FILE__ macro, mapping it in the
same way that debug-prefix-map works for debug symbols. We have seen that
__FILE__ is also a very large source of unreproducibility, and is represented
quite heavily in the 3k+ figure given earlier.

Finally, we tweak the mapping algorithm so that it applies only to whole path
components when matching prefixes. This algorithm contains fewer corner cases
and is more predictable, so it is easier for users to figure out how to set the
mapping appropriately, and it is better as a standardised algorithm that other
build tools might like to adopt. (The original idea came from discussions with
some rustc developers about this same topic.) This does technically break
backwards-compatibility, but I was under the impression that this option was
not seen as such a critical feature, that this would be too important. I am
also happy to justify it in more detail on request.

Nevertheless, for this reason our draft specification currently offers two
algorithms for implementers, but I would reduce this to one if the GCC
maintainers agree to accept this third patch.

Testing
===

I've tested these patches on a Debian unstable x86_64-linux-gnu schroot running
inside a Debian jessie system, on a full-bootstrap build. The output of
contrib/compare_tests is as follows:


gcc-7-20170409$ contrib/compare_tests ../gcc-build-0 ../gcc-build-1
# Comparing directories
## Dir1=../gcc-build-0: 8 sum files
## Dir2=../gcc-build-1: 8 sum files

# Comparing 8 common sum files
## /bin/sh contrib/compare_tests  /tmp/gxx-sum1.24154 /tmp/gxx-sum2.24154
New tests that PASS:

gcc.dg/cpp/build_path_prefix_map-1.c (test for excess errors)
gcc.dg/cpp/build_path_prefix_map-1.c execution test
gcc.dg/cpp/build_path_prefix_map-2.c (test for excess errors)
gcc.dg/cpp/build_path_prefix_map-2.c execution test
gcc.dg/debug/dwarf2/build_path_prefix_map-1.c (test for excess errors)
gcc.dg/debug/dwarf2/build_path_prefix_map-1.c scan-assembler DW_AT_comp_dir: 
"DWARF2TEST/gcc
gcc.dg/debug/dwarf2/build_path_prefix_map-2.c (test for excess errors)
gcc.dg/debug/dwarf2/build_path_prefix_map-2.c scan-assembler DW_AT_comp_dir: "/

# No differences found in 8 common sum files


I can also provide the full logs on request.

--

I've also fuzzed the prefix-map code using AFL with ASAN enabled. Due to how
AFL works I did not fuzz this patch directly but a smaller program with just
the parser and remapper, available here:

https://anonscm.debian.org/cgit/reproducible/build-path-prefix-map-spec.git/tree/consume

Over the course of about ~4k cycles, no crashes were found.

To reproduce, you could run something like:

$ echo performance | sudo tee 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
$ make CC=afl-gcc clean reset-fuzz-pecsplit.c fuzz-pecsplit.c

--

I will soon test this patch backported to Debian GCC-6 on
tests.reproducible-builds.org and will have results in a few days or weeks.
Some preliminary tests earlier gave good results (about +40 packages
reproducible over ~2 days) but we had to abort due to some misscheduling.

Copyright disclaimer


I dedicate these patches to the public domain by waiving all of my rights to
the work worldwide under copyright law, including all related and neighboring
rights, to the extent allowed by law.

See https://creativecommons.org/publicdomain/zero/1.0/legalcode for full text.

Please let me know if the above is insufficient and I will be happy to sign any
relevant forms.

However, I would prefer it if the prefix-map.{h,c} remain public domain since
its code is also duplicated in our "example code" repo (url above), which is
meant for other projects to copy+paste.


[PATCH 1/3] Use BUILD_PATH_PREFIX_MAP envvar for debug-prefix-map

2017-04-11 Thread Ximin Luo
Define the BUILD_PATH_PREFIX_MAP environment variable, and treat it as implicit
-fdebug-prefix-map CLI options specified before any explicit such options.

Much of the generic code for applying and parsing prefix-maps is implemented in
libiberty instead of the dwarf2 parts of the code, in order to make subsequent
patches unrelated to debuginfo easier.

Acknowledgements


Daniel Kahn Gillmor who wrote the patch for r231835, which saved me a lot of
time figuring out what to edit.

HW42 for discussion on the details of the proposal, and for suggesting that we
retain the ability to map the prefix to something other than ".".

Other contributors to the BUILD_PATH_PREFIX_MAP specification, see
https://reproducible-builds.org/specs/build-path-prefix-map/

ChangeLogs
--

include/ChangeLog:

2017-04-09  Ximin Luo  

* prefix-map.h: New file implementing the BUILD_PATH_PREFIX_MAP
specification; includes code from /gcc/final.c and code adapted from
examples attached to the specification.

libiberty/ChangeLog:

2017-04-09  Ximin Luo  

* prefix-map.c: New file implementing the BUILD_PATH_PREFIX_MAP
specification; includes code from /gcc/final.c and code adapted from
examples attached to the specification.
* Makefile.in: Update for new files.

gcc/ChangeLog:

2017-04-09  Ximin Luo  

* debug.h: Declare add_debug_prefix_map_from_envvar.
* final.c: Define add_debug_prefix_map_from_envvar, and refactor
prefix-map utilities to use equivalent code from libiberty instead.
* opts-global.c: (handle_common_deferred_options): Call
add_debug_prefix_map_from_envvar before processing options.

gcc/testsuite/ChangeLog:

2017-04-09  Ximin Luo  

* lib/gcc-dg.exp: Allow dg-set-compiler-env-var to take only one
argument in which case it unsets the given env var.
* gcc.dg/debug/dwarf2/build_path_prefix_map-1.c: New test.
* gcc.dg/debug/dwarf2/build_path_prefix_map-2.c: New test.

Index: gcc-7-20170409/include/prefix-map.h
===
--- /dev/null
+++ gcc-7-20170409/include/prefix-map.h
@@ -0,0 +1,94 @@
+/* Declarations for manipulating filename prefixes.
+   Written 2017 by Ximin Luo 
+   This code is in the public domain. */
+
+#ifndef _PREFIX_MAP_H
+#define _PREFIX_MAP_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#ifdef HAVE_STDLIB_H
+#include 
+#endif
+
+/* Linked-list of mappings from old prefixes to new prefixes.  */
+
+struct prefix_map
+{
+  const char *old_prefix;
+  const char *new_prefix;
+  size_t old_len;
+  size_t new_len;
+  struct prefix_map *next;
+};
+
+
+/* Find a mapping suitable for the given OLD_NAME in the linked list MAP.\
+
+   If a mapping is found, writes a pointer to the non-matching suffix part of
+   OLD_NAME in SUFFIX, and its length in SUF_LEN.
+
+   Returns NULL if there was no suitable mapping.  */
+struct prefix_map *
+prefix_map_find (struct prefix_map *map, const char *old_name,
+const char **suffix, size_t *suf_len);
+
+/* Prepend a prefix map before a given SUFFIX.
+
+   The remapped name is written to NEW_NAME and returned as a const pointer. No
+   allocations are performed; the caller must ensure it can hold at least
+   MAP->NEW_LEN + SUF_LEN + 1 characters.  */
+const char *
+prefix_map_prepend (struct prefix_map *map, char *new_name,
+   const char *suffix, size_t suf_len);
+
+/* Remap a filename.
+
+   Returns OLD_NAME unchanged if there was no remapping, otherwise returns a
+   pointer to newly-allocated memory for the remapped filename.  The memory is
+   allocated by the given ALLOC function, which also determines who is
+   responsible for freeing it.  */
+#define prefix_map_remap_alloc_(map_head, old_name, alloc)\
+  __extension__
   \
+  ({  \
+const char *__suffix; \
+size_t __suf_len; \
+struct prefix_map *__map; \
+(__map = prefix_map_find ((map_head), (old_name), &__suffix, &__suf_len))  
\
+  ? prefix_map_prepend (__map,\
+   (char *) alloc (__map->new_len + __suf_len + 1),   \
+   __suffix, __suf_len)   \
+  : (old_name);   \
+  })
+
+/* Remap a filename.
+
+   Returns OLD_NAME unchanged if there was no remapping, otherwise returns a
+   stack-allocated pointer to the newly-remapped filename.  */
+#define prefix_map_remap_alloca(map_head, old_name) \
+  prefix_map_remap_alloc_ (map_head, old_name,

[PATCH 3/3] When remapping paths, only match whole path components

2017-04-11 Thread Ximin Luo
Change the remapping algorithm so that each old_prefix only matches paths that
have old_prefix as a whole path component prefix.  (A whole path component is a
part of a path that begins and ends at a directory separator or at either end
of the path string.)

This remapping algorithm is more predictable than the old algorithm, because
there is no chance of mappings for one directory interfering with mappings for
other directories.  It contains less corner cases and is therefore nicer for
clients to use.  For these reasons, in our BUILD_PATH_PREFIX_MAP specification
we recommend this algorithm, and it would be good for GCC to follow suit.

This does technically break backwards compatibility but I don't think anyone
would be reasonably depending on the corner cases of the previous algorithm,
which are surprising and counterintuitive.

Acknowledgements


Discussions with Michael Woerister and other members of the Rust compiler team
on Github, and discussions with Daniel Shahaf on the rb-general@ mailing list
on lists.reproducible-builds.org.

ChangeLogs
--

gcc/ChangeLog:

2017-04-09  Ximin Luo  

* doc/invoke.texi (Environment Variables): Document form and behaviour
of BUILD_PATH_PREFIX_MAP.

libiberty/ChangeLog:

2017-04-09  Ximin Luo  

* prefix-map.c: When remapping paths, only match whole path components.

Index: gcc-7-20170409/gcc/doc/invoke.texi
===
--- gcc-7-20170409.orig/gcc/doc/invoke.texi
+++ gcc-7-20170409/gcc/doc/invoke.texi
@@ -26637,6 +26637,26 @@ Recognize EUCJP characters.
 If @env{LANG} is not defined, or if it has some other value, then the
 compiler uses @code{mblen} and @code{mbtowc} as defined by the default locale 
to
 recognize and translate multibyte characters.
+
+@item BUILD_PATH_PREFIX_MAP
+@findex BUILD_PATH_PREFIX_MAP
+If this variable is set, it specifies an ordered map used to transform
+filepaths output in debugging symbols and expansions of the @code{__FILE__}
+macro.  This may be used to achieve fully reproducible output.  In the context
+of running GCC within a higher-level build tool, it is typically more reliable
+than setting command line arguments such as @option{-fdebug-prefix-map} or
+common environment variables such as @env{CFLAGS}, since the build tool may
+save these latter values into other output outside of GCC's control.
+
+The value is of the form
+@samp{@var{dst@r{[0]}}=@var{src@r{[0]}}:@var{dst@r{[1]}}=@var{src@r{[1]}}@r{@dots{}}}.
+If any @var{dst@r{[}i@r{]}} or @var{src@r{[}i@r{]}} contains @code{%}, @code{=}
+or @code{:} characters, they must be replaced with @code{%#}, @code{%+}, and
+@code{%.} respectively.
+
+Whenever GCC emits a filepath that starts with a whole path component matching
+@var{src@r{[}i@r{]}} for some @var{i}, with rightmost @var{i} taking priority,
+the matching part is replaced with @var{dst@r{[}i@r{]}} in the final output.
 @end table
 
 @noindent
Index: gcc-7-20170409/libiberty/prefix-map.c
===
--- gcc-7-20170409.orig/libiberty/prefix-map.c
+++ gcc-7-20170409/libiberty/prefix-map.c
@@ -87,12 +87,22 @@ struct prefix_map *
 prefix_map_find (struct prefix_map *map, const char *old_name,
 const char **suffix, size_t *suf_len)
 {
+  size_t len;
+
   for (; map; map = map->next)
-if (filename_ncmp (old_name, map->old_prefix, map->old_len) == 0)
-  {
-   *suf_len = strlen (*suffix = old_name + map->old_len);
-   break;
-  }
+{
+  len = map->old_len;
+  /* Ignore trailing path separators at the end of old_prefix */
+  while (len > 0 && IS_DIR_SEPARATOR (map->old_prefix[len-1])) len--;
+  /* Check if old_name matches old_prefix at a path component boundary */
+  if (! filename_ncmp (old_name, map->old_prefix, len)
+ && (IS_DIR_SEPARATOR (old_name[len])
+ || old_name[len] == '\0'))
+   {
+ *suf_len = strlen (*suffix = old_name + len);
+ break;
+   }
+}
 
   return map;
 }


Re: [PATCH] Generate reproducible output independently of the build-path

2016-11-04 Thread Ximin Luo
Mike Stump:
> On Nov 3, 2016, at 1:01 PM, Ximin Luo  wrote:
>> Log snippets attached.
> 
> Ick.  You missed the utility of contrib/compare_tests.  :-)
> 
> If you say:
> 
>   contrib/compare_tests oldbuilddir newbuilddir
> 
> You can then sit back and see everything as you'd expect and want.  The 
> output is priority sorted and usually around 0 lines line or so.
> 

Oh, thanks! Here is the output:

~/reproducible/gcc-7-20161030$ contrib/compare_tests ../gcc-build-0/ 
../gcc-build-1/
# Comparing directories
## Dir1=../gcc-build-0/: 8 sum files
## Dir2=../gcc-build-1/: 8 sum files

# Comparing 8 common sum files
## /bin/sh contrib/compare_tests  /tmp/gxx-sum1.9339 /tmp/gxx-sum2.9339
New tests that PASS:

gcc.dg/cpp/source_date_epoch-4.c (test for excess errors)
gcc.dg/cpp/source_date_epoch-4.c execution test
gcc.dg/cpp/source_date_epoch-5.c (test for excess errors)
gcc.dg/cpp/source_date_epoch-5.c execution test
gcc.dg/cpp/source_prefix_map-1.c (test for excess errors)
gcc.dg/cpp/source_prefix_map-1.c execution test
gcc.dg/cpp/source_prefix_map-2.c (test for excess errors)
gcc.dg/cpp/source_prefix_map-2.c execution test
gcc.dg/debug/dwarf2/source_prefix_map-1.c (test for excess errors)
gcc.dg/debug/dwarf2/source_prefix_map-1.c scan-assembler-dem DW_AT_comp_dir: 
"DWARF2TEST/gcc
gcc.dg/debug/dwarf2/source_prefix_map-2.c (test for excess errors)
gcc.dg/debug/dwarf2/source_prefix_map-2.c scan-assembler DW_AT_comp_dir: "/

# No differences found in 8 common sum files


-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PATCH] Generate reproducible output independently of the build-path

2016-11-03 Thread Ximin Luo
Ximin Luo:
> Testing
> ===
> 
> I've tested these patches on a Debian testing/unstable x86_64-linux-gnu 
> system.
> So far I've only run the new tests that this patch adds, on a 
> disable-bootstrap
> build. I will do a full bootstrap and run the full testsuite over the next few
> days, both with and without this patch, and report back.
> 

Tested with a full bootstrap on a Debian unstable x86_64-linux-gnu system.

Log snippets attached. These were produced by running the below shell command 
in each of the respective build directories (0 = without, 1 = with the patches):

find . -name '*.log' -a -not -name 'config.log' | sort | while read x; do echo 
"== $x =="; grep -a '^FAIL: \|^# of' "$x"; done > ../gcc-build-X.snippets

Due to make -j22, they didn't run the tests in the same order. However in 
between doing this:

$ diff -ru <(sort gcc-build-0.snippets) <(sort gcc-build-1.snippets)
--- /dev/fd/63  2016-11-03 20:58:02.892334031 +0100
+++ /dev/fd/62  2016-11-03 20:58:02.892334031 +0100
@@ -323,7 +323,7 @@
 # of expected failures 76
 # of expected passes   110511
 # of expected passes   11259
-# of expected passes   127561
+# of expected passes   127573
 # of expected passes   2797
 # of expected passes   42
 # of expected passes   43915
1

and a manual review of the unsorted diffs, hopefully you can see that 
everything is good.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git
== ./gcc/testsuite/g++/g++.log ==
FAIL: g++.dg/guality/pr55665.C   -O2  line 23 p == 40
FAIL: g++.dg/guality/pr55665.C   -O3 -g  line 23 p == 40
# of expected passes110511
# of unexpected failures2
# of unexpected successes   2
# of expected failures  333
# of unsupported tests  3989
== ./gcc/testsuite/gcc/gcc.log ==
FAIL: gcc.dg/guality/pr54200.c   -Os  line 20 z == 3
FAIL: gcc.dg/guality/pr54519-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 23 y == 117
FAIL: gcc.dg/guality/pr54519-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 23 z == 8
FAIL: gcc.dg/guality/pr54519-2.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 17 y == 25
FAIL: gcc.dg/guality/pr54519-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  line 17 y == 25
FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 23 y == 117
FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 23 z == 8
FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  line 23 y == 117
FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  line 23 z == 8
FAIL: gcc.dg/guality/pr54519-4.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 17 y == 25
FAIL: gcc.dg/guality/pr54519-4.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  line 17 y == 25
FAIL: gcc.dg/guality/pr54519-5.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 17 y == 25
FAIL: gcc.dg/guality/pr54519-5.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  line 17 y == 25
FAIL: gcc.dg/guality/pr43051-1.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  line 34 c == &a[0]
FAIL: gcc.dg/guality/pr43051-1.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  line 36 e == &a[1]
FAIL: gcc.dg/guality/pr45882.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  line 16 d == 112
FAIL: gcc.dg/guality/pr45882.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  line 16 e == 142
FAIL: gcc.dg/guality/pr68860-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 14 arg1 == 1
FAIL: gcc.dg/guality/pr68860-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 14 arg2 == 2
FAIL: gcc.dg/guality/pr68860-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 14 arg3 == 3
FAIL: gcc.dg/guality/pr68860-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 14 arg4 == 4
FAIL: gcc.dg/guality/pr68860-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 14 arg5 == 5
FAIL: gcc

Re: [PATCH] Generate reproducible output independently of the build-path

2016-11-02 Thread Ximin Luo
Ximin Luo:
> Joseph Myers:
>> On Wed, 2 Nov 2016, Ximin Luo wrote:
>>
>>> This patch series adds a new environment variable SOURCE_PREFIX_MAP. When 
>>> this
>>> is set, GCC will treat this as an implicit "-fdebug-prefix-map=$value"
>>> command-line argument. This makes the final binary output reproducible, and
>>
>> Only one argument?  Sometimes you may want multiple -fdebug-prefix-map 
>> options (for both source and build directories, for example).  Perhaps the 
>> value should be split on spaces to provide multiple such mappings?
>>
> 
> We wanted to keep this environment variable simple, and allowing this would 
> make it more costly for other projects to adopt.
> 
> Whichever separator character we choose, we would have to either (a) disallow 
> it within the oldprefix/newprefix strings like what PATH does or (b) think of 
> an escaping scheme. Neither choice is great - with (a) since we want to map 
> *arbitrary* paths the restriction is less acceptable here than in PATH, and 
> with (b) it adds complexity to the spec, for uncommon use-cases.
> 

If we picked '\n' as the separator character, we could perhaps live with the 
(a) restriction. I'll wait a while for others to comment some more.

> In the case of an out-of-tree build, it is still possible with the current 
> proposal. Instead of:
> 
> SOURCE_PREFIX_MAP="srcprefix=srcname:buildprefix=buildname"
> 
> One could arrange the tree like:
> 
> commonprefix/srcname
> commonprefix/buildname
> 
> then set SOURCE_PREFIX_MAP="commonprefix=." or 
> SOURCE_PREFIX_MAP="commonprefix/=".
> 

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PATCH] Generate reproducible output independently of the build-path

2016-11-02 Thread Ximin Luo
Joseph Myers:
> On Wed, 2 Nov 2016, Ximin Luo wrote:
> 
>> This patch series adds a new environment variable SOURCE_PREFIX_MAP. When 
>> this
>> is set, GCC will treat this as an implicit "-fdebug-prefix-map=$value"
>> command-line argument. This makes the final binary output reproducible, and
> 
> Only one argument?  Sometimes you may want multiple -fdebug-prefix-map 
> options (for both source and build directories, for example).  Perhaps the 
> value should be split on spaces to provide multiple such mappings?
> 

We wanted to keep this environment variable simple, and allowing this would 
make it more costly for other projects to adopt.

Whichever separator character we choose, we would have to either (a) disallow 
it within the oldprefix/newprefix strings like what PATH does or (b) think of 
an escaping scheme. Neither choice is great - with (a) since we want to map 
*arbitrary* paths the restriction is less acceptable here than in PATH, and 
with (b) it adds complexity to the spec, for uncommon use-cases.

In the case of an out-of-tree build, it is still possible with the current 
proposal. Instead of:

SOURCE_PREFIX_MAP="srcprefix=srcname:buildprefix=buildname"

One could arrange the tree like:

commonprefix/srcname
commonprefix/buildname

then set SOURCE_PREFIX_MAP="commonprefix=." or 
SOURCE_PREFIX_MAP="commonprefix/=".

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


[PATCH] Generate reproducible output independently of the build-path

2016-11-02 Thread Ximin Luo
(Please keep me on CC, I am not subscribed)

Background
==

We are on a long journey to make build processes be able to reproduce the build
outputs independently of which filesystem path the build is being executed from
- e.g. if the executing user doesn't have root access to be able to execute it
under a standard path such as /build. This currently is making about 2k-3k [1]
packages in Debian unreproducible when build-paths are varied across builds.

[1] 
https://tests.reproducible-builds.org/debian/issues/unstable/captures_build_path_issue.html

Previous attempts have involved using -fdebug-prefix-map to strip away the
prefix of an absolute path, leaving behind the part relative to the top-level
directory of the source code, which is reproducible. But this flag was itself
stored in DW_AT_producer, which makes the final output unreproducible. This was
pointed out in bug 68848 and fixed in r231835.

However, through more testing, we have found out that the fix just mentioned is
not enough to achieve reproducibility in practice. The main issue is that many
different packages like to store CFLAGS et. al. in arbitrary ways. So if we add
an explicit -fdebug-prefix-map flag to the system-level CFLAGS etc, this will
often propagate into the build result, making it again dependent on the
build-path, and not reproducible. For example:

Some packages embed compiler flags into their pkg-config files (or equivalent):
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/curl.html
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/perl.html
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/qt4-x11.html
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/fflas-ffpack.html
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/sip4.html

Other packages embed compiler flags directly into the binary:
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/singular.html
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/mutt.html

etc etc.

We think it's not appropriate to patch all (3k+) of these packages to strip out
-fdebug-prefix-map flags. This would involve adding quite complex logic to
everyone's build scripts, and we have to adapt this logic every single time to
that particular package. Also, in general CFLAGS is *supposed* to affect the
compiler output, and saving it unconditionally is quite a reasonable thing for
packages to do. If we tried to patch all of these packages, we would be turning
"reproducible builds" in to a costly opt-in feature, rather than on-by-default
that everyone can easily benefit from.

So, we believe it is better to patch GCC centrally. Our proposed solution is 
similar to (a) the SOURCE_DATE_EPOCH environment variable which was previously 
accepted into GCC and was used to successfully make 400+ packages reproducible, 
and (b) the -fdebug-prefix-map mechanism that already exists in GCC and which 
nearly but not quite, achieves at-scale build-path-independent reproducibility.

Proposal


This patch series adds a new environment variable SOURCE_PREFIX_MAP. When this
is set, GCC will treat this as an implicit "-fdebug-prefix-map=$value"
command-line argument. This makes the final binary output reproducible, and
also hides the unreproducible value (the build path prefix) from CFLAGS et. al.
which everyone is (understandably) embedding as-is into their build output.

This environment variable also acts on the __FILE__ macro, mapping it in the
same way that debug-prefix-map works for debug symbols. We have seen that
__FILE__ is also a very large source of unreproducibility, and is represented
quite heavily in the 3k+ figure given above.

Finally, we tweak the __TIMESTAMP__ macro so it honours SOURCE_DATE_EPOCH, in a
similar way to how __DATE__ and __TIME__ do so already.

More details are given in the headers of the patch files themselves.

Testing
===

I've tested these patches on a Debian testing/unstable x86_64-linux-gnu system.
So far I've only run the new tests that this patch adds, on a disable-bootstrap
build. I will do a full bootstrap and run the full testsuite over the next few
days, both with and without this patch, and report back.

Copyright disclaimer


I dedicate these patches to the public domain by waiving all of my rights to
the work worldwide under copyright law, including all related and neighboring
rights, to the extent allowed by law.

See https://creativecommons.org/publicdomain/zero/1.0/legalcode for full text.

Please let me know if the above is insufficient and I will be happy to sign any
relevant forms.


[PATCH 3/4] Use SOURCE_PREFIX_MAP envvar to transform __FILE__

2016-11-02 Thread Ximin Luo
Honour the SOURCE_PREFIX_MAP environment variable when expanding the __FILE__
macro, in the same way that debug-prefix-map works for debugging symbol paths.

This patch follows similar lines to the earlier patch for SOURCE_DATE_EPOCH.
Specifically, we read the environment variable not in libcpp but via a hook
which has an implementation defined in gcc/c-family. However, to achieve this
is more complex than the earlier patch: we need to share the prefix_map data
structure and associated functions between libcpp and c-family. Therefore, we
need to move these to libiberty. (For comparison, the SOURCE_DATE_EPOCH patch
did not need this because time_t et. al. are in the standard C library.)

Acknowledgements


Dhole  who wrote the earlier patch for SOURCE_DATE_EPOCH
which saved me a lot of time on figuring out what to edit.

ChangeLogs
--

include/ChangeLog:

2016-11-01  Ximin Luo  

* prefix-map.h: New file, mostly derived from /gcc/final.c.

libiberty/ChangeLog:

2016-11-01  Ximin Luo  

* prefix-map.c: New file, mostly derived from /gcc/final.c.
* Makefile.in: Update for new files.

gcc/ChangeLog:

2016-11-01  Ximin Luo  

* final.c: Generalise and refactor code related to debug_prefix_map.
Move some of it to /libiberty/prefix-map.c, /include/prefix-map.h
and refactor the remaining code to use the moved-out things.
* doc/invoke.texi (Environment Variables): Update SOURCE_PREFIX_MAP to
describe how it affects __FILE__ expansion.

gcc/c-family/ChangeLog:

2016-11-01  Ximin Luo  

* c-common.c (cb_get_source_prefix_map): Define new call target.
* c-common.h (cb_get_source_prefix_map): Declare call target.
* c-lex.c (init_c_lex): Set the get_source_prefix_map callback.

libcpp/ChangeLog:

2016-11-01  Ximin Luo  

* include/cpplib.h (cpp_callbacks): Add get_source_prefix_map
callback.
* init.c (cpp_create_reader): Initialise source_prefix_map field.
* internal.h (cpp_reader): Add new field source_prefix_map.
* macro.c (_cpp_builtin_macro_text): Set the source_prefix_map field
if unset and apply it to the __FILE__ macro.

gcc/testsuite/ChangeLog:

2016-11-01  Ximin Luo  

* gcc.dg/cpp/source_prefix_map-1.c: New test.
* gcc.dg/cpp/source_prefix_map-2.c: New test.

Index: gcc-7-20161030/include/prefix-map.h
===
--- /dev/null
+++ gcc-7-20161030/include/prefix-map.h
@@ -0,0 +1,71 @@
+/* Declarations for manipulating filename prefixes.
+
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software Foundation,
+   Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA.  */
+
+#ifndef _PREFIX_MAP_H
+#define _PREFIX_MAP_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#ifdef HAVE_STDLIB_H
+#include 
+#endif
+
+/* Linked-list of mappings from old prefixes to new prefixes.  */
+
+struct prefix_map
+{
+  const char *old_prefix;
+  const char *new_prefix;
+  size_t old_len;
+  size_t new_len;
+  struct prefix_map *next;
+};
+
+/* Parse a single prefix-map.
+
+   The string `arg' is split at the final '=' character. The part before
+   it is used to set `map->old_prefix' and `map->old_len', and the part
+   after it is used to set `map->new_prefix' and `map->new_len'.
+
+   If `arg' does not contain a '=' then 0 is returned. Otherwise, a
+   non-zero value is returned.
+   */
+
+extern int parse_prefix_map (const char *arg, struct prefix_map *map);
+
+/* Perform mapping of filename prefixes.
+
+   Return the filename corresponding to `old_name'. The return value is
+   equal to `old_name' if no transformation occurred, else it is equal
+   to `new_name' where the new filename is stored.
+
+   On entry into this function, `new_name' must be able to hold at least
+   `(old_name - map->old_len + map->old_len + 1)' characters, where
+   `map' is the mapping that will be selected and performed.
+   */
+
+extern const char *apply_prefix_map (const char *old_name, char *new_name,
+struct prefix_map *map_head);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _PREFIX_MAP_H */
Index: gcc-7-20161030/libiberty/prefix-map.c
===

[PATCH 1/4] Use SOURCE_PREFIX_MAP envvar as an implicit debug-prefix-map

2016-11-02 Thread Ximin Luo
Define the SOURCE_PREFIX_MAP environment variable, and treat it as an implicit
CLI -fdebug-prefix-map option specified before any explicit such options.

Acknowledgements


Daniel Kahn Gillmor who wrote the patch for r231835, which saved me a lot of
time figuring out what to edit.

HW42 for discussion on the details of the proposal, and for suggesting that we
retain the ability to map the prefix to something other than ".".

ChangeLogs
--

gcc/ChangeLog:

2016-11-01  Ximin Luo  

* opts-global.c (add_debug_prefix_map_from_envvar): Add the value of
SOURCE_PREFIX_MAP as a debug_prefix_map if the former is set.
(handle_common_deferred_options): Call add_debug_prefix_map_from_envvar
before processing options.
* final.c: (add_debug_prefix_map): Be less specific in the error 
message,
since it can also be triggered by the SOURCE_PREFIX_MAP variable.
* doc/invoke.texi (Environment Variables): Document form and behaviour 
of
SOURCE_PREFIX_MAP.

gcc/testsuite/ChangeLog:

2016-11-01  Ximin Luo  

* gcc.dg/debug/dwarf2/source_prefix_map-1.c: New test.
* gcc.dg/debug/dwarf2/source_prefix_map-2.c: New test.

Index: gcc-7-20161030/gcc/opts-global.c
===
--- gcc-7-20161030.orig/gcc/opts-global.c
+++ gcc-7-20161030/gcc/opts-global.c
@@ -310,6 +310,21 @@ decode_options (struct gcc_options *opts
   finish_options (opts, opts_set, loc);
 }
 
+/* Add a debug-prefix-map using the SOURCE_PREFIX_MAP environment variable if
+   it is set.  */
+
+static void
+add_debug_prefix_map_from_envvar ()
+{
+  char *prefix_map;
+
+  prefix_map = getenv ("SOURCE_PREFIX_MAP");
+  if (!prefix_map)
+return;
+
+  add_debug_prefix_map (prefix_map);
+}
+
 /* Hold command-line options associated with stack limitation.  */
 const char *opt_fstack_limit_symbol_arg = NULL;
 int opt_fstack_limit_register_no = -1;
@@ -335,6 +350,8 @@ handle_common_deferred_options (void)
   if (flag_opt_info)
 opt_info_switch_p (NULL);
 
+  add_debug_prefix_map_from_envvar ();
+
   FOR_EACH_VEC_ELT (v, i, opt)
 {
   switch (opt->opt_index)
Index: gcc-7-20161030/gcc/final.c
===
--- gcc-7-20161030.orig/gcc/final.c
+++ gcc-7-20161030/gcc/final.c
@@ -1531,7 +1531,7 @@ add_debug_prefix_map (const char *arg)
   p = strchr (arg, '=');
   if (!p)
 {
-  error ("invalid argument %qs to -fdebug-prefix-map", arg);
+  error ("invalid value %qs for debug-prefix-map", arg);
   return;
 }
   map = XNEW (debug_prefix_map);
Index: gcc-7-20161030/gcc/doc/invoke.texi
===
--- gcc-7-20161030.orig/gcc/doc/invoke.texi
+++ gcc-7-20161030/gcc/doc/invoke.texi
@@ -26222,6 +26222,23 @@ Recognize EUCJP characters.
 If @env{LANG} is not defined, or if it has some other value, then the
 compiler uses @code{mblen} and @code{mbtowc} as defined by the default locale 
to
 recognize and translate multibyte characters.
+
+@item SOURCE_PREFIX_MAP If this variable is set, it specifies a mapping
+that is used to transform filepaths that are output in debug symbols.
+This helps the embedded paths become reproducible, without having the
+unreproducible value be visible in other input sources - such as GCC
+command-line flags or standardised build-time environment variables like
+@code{CFLAGS} - that are commonly legitimately-embedded in the build
+output by higher-level build processes.
+
+The form and behaviour is similar to @option{-fdebug-prefix-map}. That
+is, the value of @env{SOURCE_PREFIX_MAP} must be of the form
+@samp{@var{old}=@var{new}}. The split occurs on the first @code{=}
+character, so that @var{old} cannot itself contain a @code{=}.
+
+Whenever an absolute source- or build-related filepath is to be emitted
+in a final end-result output, GCC will replace @var{old} with @var{new}
+if that filepath starts with @var{old}.
 @end table
 
 @noindent
Index: gcc-7-20161030/gcc/testsuite/gcc.dg/debug/dwarf2/source_prefix_map-1.c
===
--- /dev/null
+++ gcc-7-20161030/gcc/testsuite/gcc.dg/debug/dwarf2/source_prefix_map-1.c
@@ -0,0 +1,9 @@
+/* DW_AT_comp_dir should be relative if SOURCE_PREFIX_MAP is a prefix of it. */
+/* { dg-do compile } */
+/* { dg-options "-gdwarf -dA" } */
+/* { dg-set-compiler-env-var SOURCE_PREFIX_MAP "[file dirname 
[pwd]]=DWARF2TEST" } */
+/* { dg-final { scan-assembler-dem "DW_AT_comp_dir: \"DWARF2TEST/gcc" } } */
+
+void func (void)
+{
+}
Index: gcc-7-20161030/gcc/testsuite/gcc.dg/debug/dwarf2/source_prefix_map-2.c
===
--- /dev/null
+++ gcc-7-20161030/gcc/testsuite/gcc.dg/debug/dwarf2/source_prefix_map-2.c
@@ -0,0 +1,8 @@
+/* DW_AT_comp_dir 

[PATCH 4/4] Use SOURCE_DATE_EPOCH in place of __TIMESTAMP__ if the latter is newer.

2016-11-02 Thread Ximin Luo
This brings the behaviour in line with the __DATE__ and __TIME__ macros. Note
though the minor difference: __TIMESTAMP__ is defined as the file-modification
date and not the "current" date or time like the other two macros. Therefore,
we do a clamping behaviour (similar to tar --clamp-mtime).

Acknowledgements


Reiner Herrmann suggested the clamping behaviour.

ChangeLogs
--

libcpp/ChangeLog:

2016-11-01  Ximin Luo  

* macro.c (_cpp_builtin_macro_text): Use SOURCE_DATE_EPOCH in place of
__TIMESTAMP__ if the latter is newer than the former.

gcc/ChangeLog:

2016-11-01  Ximin Luo  

* doc/cppenv.texi (Environment Variables): Update SOURCE_DATE_EPOCH to
describe the new effect on __TIMESTAMP__.

gcc/testsuite/ChangeLog:

2016-11-01  Ximin Luo  

* gcc.dg/cpp/source_date_epoch-4.c: New test.
* gcc.dg/cpp/source_date_epoch-5.c: New test.

Index: gcc-7-20161030/libcpp/macro.c
===
--- gcc-7-20161030.orig/libcpp/macro.c
+++ gcc-7-20161030/libcpp/macro.c
@@ -264,7 +264,30 @@ _cpp_builtin_macro_text (cpp_reader *pfi
struct tm *tb = NULL;
struct stat *st = _cpp_get_file_stat (file);
if (st)
- tb = localtime (&st->st_mtime);
+ {
+   /* Set a reproducible timestamp for __DATE__ and __TIME__ 
macro
+  if SOURCE_DATE_EPOCH is defined.  */
+   if (pfile->source_date_epoch == (time_t) -2
+   && pfile->cb.get_source_date_epoch != NULL)
+ pfile->source_date_epoch = 
pfile->cb.get_source_date_epoch (pfile);
+
+   if (pfile->source_date_epoch >= (time_t) 0)
+ {
+   /* If SOURCE_DATE_EPOCH is set, we want reproducible
+  timestamps so use gmtime not localtime.  */
+   if (st->st_mtime >= pfile->source_date_epoch)
+ tb = gmtime (&pfile->source_date_epoch);
+   else
+ /* Don't use SOURCE_DATE_EPOCH if the timestamp is
+older, since that means it was probably already
+set in a reproducible way (e.g. via source tarball
+extraction or some other method). */
+ tb = gmtime (&st->st_mtime);
+ }
+   else
+ tb = localtime (&st->st_mtime);
+ }
+
if (tb)
  {
char *str = asctime (tb);
Index: gcc-7-20161030/gcc/doc/cppenv.texi
===
--- gcc-7-20161030.orig/gcc/doc/cppenv.texi
+++ gcc-7-20161030/gcc/doc/cppenv.texi
@@ -83,8 +83,9 @@ main input file is omitted.
 @item SOURCE_DATE_EPOCH
 If this variable is set, its value specifies a UNIX timestamp to be
 used in replacement of the current date and time in the @code{__DATE__}
-and @code{__TIME__} macros, so that the embedded timestamps become
-reproducible.
+and @code{__TIME__} macros, and in replacement of the file modification
+date in the @code{__TIMESTAMP__} macro if the latter is newer than the
+former, so that the embedded timestamps become reproducible.
 
 The value of @env{SOURCE_DATE_EPOCH} must be a UNIX timestamp,
 defined as the number of seconds (excluding leap seconds) since
Index: gcc-7-20161030/gcc/testsuite/gcc.dg/cpp/source_date_epoch-4.c
===
--- /dev/null
+++ gcc-7-20161030/gcc/testsuite/gcc.dg/cpp/source_date_epoch-4.c
@@ -0,0 +1,13 @@
+/* __TIMESTAMP__ should use SOURCE_DATE_EPOCH if the latter is older. */
+/* { dg-do run } */
+/* { dg-set-compiler-env-var TZ "UTC" } */
+/* { dg-set-compiler-env-var LC_ALL "C" } */
+/* { dg-set-compiler-env-var SOURCE_DATE_EPOCH "0" } */
+
+int
+main ()
+{
+  if (__builtin_strcmp (__TIMESTAMP__, "Thu Jan  1 00:00:00 1970") != 0)
+__builtin_abort ();
+  return 0;
+}
Index: gcc-7-20161030/gcc/testsuite/gcc.dg/cpp/source_date_epoch-5.c
===
--- /dev/null
+++ gcc-7-20161030/gcc/testsuite/gcc.dg/cpp/source_date_epoch-5.c
@@ -0,0 +1,13 @@
+/* __TIMESTAMP__ should not use SOURCE_DATE_EPOCH if the latter is newer. */
+/* { dg-do run } */
+/* { dg-set-compiler-env-var TZ "UTC" } */
+/* { dg-set-compiler-env-var LC_ALL "C" } */
+/* { dg-set-compiler-env-var SOURCE_DATE_EPOCH "253402300799" } */
+
+int
+main ()
+{
+  if (__builtin_strcmp (__TIMESTAMP__, "Fri Dec 31 23:59:59 UTC ") == 0)
+__builtin_abort ();
+  return 0;
+}


[PATCH 2/4] Split prefix-map values on the last '=' character, not the first

2016-11-02 Thread Ximin Luo
We are planning to ask other tools to support SOURCE_PREFIX_MAP, in the same
way that we have already done this for SOURCE_DATE_EPOCH. So, this will last
for some time and have quite wide reach. Consequently, we believe it is better
to split on the final '=', since it is much less likely to result in problems.

For example, with the previous behaviour (splitting on the first '=') it would
not be possible to map paths like the following:

C:\Documents and Settings\Betrand Russell\Proofs of 1+1=2\Automated 
Proofs\Source Code\
/srv/net/distributed-hash-table/address/VaIWP8YIWDChR2O76yiufXBsbw5g2skB_kp9VP-qVLvydovdGw==/projects/gcc-6/

These are contrived examples, but hopefully you can agree that they're not *so*
unrealistic - future software or users might plausibly wish to run reproducible
build processes underneath paths similar to these.

On the other hand, I can think of much fewer cases where the new-prefix *must*
include a '=' character. I can't think of any software project that includes
it, and I'd imagine that any such (or future) projects that might exist would
already have standardised "ASCII-only" versions of its name.

ChangeLogs
--

gcc/ChangeLog:

2016-11-01  Ximin Luo  

* final.c: (add_debug_prefix_map): Split on the last and not first '='.
* doc/invoke.texi (Environment Variables): Update SOURCE_PREFIX_MAP to
describe new parsing.

Index: gcc-7-20161030/gcc/final.c
===
--- gcc-7-20161030.orig/gcc/final.c
+++ gcc-7-20161030/gcc/final.c
@@ -1528,7 +1528,7 @@ add_debug_prefix_map (const char *arg)
   debug_prefix_map *map;
   const char *p;
 
-  p = strchr (arg, '=');
+  p = strrchr (arg, '=');
   if (!p)
 {
   error ("invalid value %qs for debug-prefix-map", arg);
Index: gcc-7-20161030/gcc/doc/invoke.texi
===
--- gcc-7-20161030.orig/gcc/doc/invoke.texi
+++ gcc-7-20161030/gcc/doc/invoke.texi
@@ -26233,8 +26233,8 @@ output by higher-level build processes.
 
 The form and behaviour is similar to @option{-fdebug-prefix-map}. That
 is, the value of @env{SOURCE_PREFIX_MAP} must be of the form
-@samp{@var{old}=@var{new}}. The split occurs on the first @code{=}
-character, so that @var{old} cannot itself contain a @code{=}.
+@samp{@var{old}=@var{new}}. The split occurs on the last @code{=}
+character, so that @var{new} cannot itself contain a @code{=}.
 
 Whenever an absolute source- or build-related filepath is to be emitted
 in a final end-result output, GCC will replace @var{old} with @var{new}


Re: [PATCH] PR77985: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path

2016-10-24 Thread Ximin Luo
Richard Biener:
> On Mon, Oct 24, 2016 at 12:53 PM, Ximin Luo  wrote:
>> Richard Biener:
>>> On Fri, Oct 21, 2016 at 12:56 PM, Ximin Luo  wrote:
>>>> Richard Biener:
>>>>> On Tue, Oct 18, 2016 at 2:35 PM, Ximin Luo  wrote:
>>>>>>
>>>>>> Thanks, I'll add the Changelog entry. My computer isn't very powerful, 
>>>>>> so I didn't bootstrap it yet, I only tested it on a stage1 compiler, on 
>>>>>> Debian testing/unstable. I'll find some time to bootstrap it and test it 
>>>>>> fully over the next few days.
>>>>>>
>>>>>> Shall I also get rid of the Darwin force_at_comp_dir stuff? Looking into 
>>>>>> it a bit more, my patch basically obsoletes the need for this so I can 
>>>>>> delete that as well.
>>>>>
>>>>> That would be nice.
>>>>>
>>>>
>>>> Hi,
>>>>
>>>> Attached is the ChangeLog plus updated patch, rebased against the 
>>>> 2016-10-16 snapshot. Also I noticed I got the wrong bug number, the 
>>>> correct one is 77985 not 77895.
>>>>
>>>> I've tested it on a Debian testing/unstable x86_64-linux-gnu system. The 
>>>> results are good, the same tests fail both before and after the patch, and 
>>>> we have 2 new expected successes. Unfortunately I don't have access (and 
>>>> am unlikely to get access) to a Darwin system to test it on.
>>>>
>>>> Snippets of the test logs are attached. The full logs are about 200MB each 
>>>> in size (4MB XZ-compressed, each) so I guessed I shouldn't send them via 
>>>> email... The snippets were grepped from the logs using the '^FAIL: \|^# 
>>>> of\|pr77985' pattern. You can diff them to check that the results are same 
>>>> in both cases.
>>>
>>> The patch is ok.  Do you have commit privileges?
>>>
>>
>> No, I don't.
> 
> Ok, I included the patch in a x86_64 bootstrap & regtest and committed
> it as r241473.
> 

Great, thanks!

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PATCH] PR77985: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path

2016-10-24 Thread Ximin Luo
Richard Biener:
> On Fri, Oct 21, 2016 at 12:56 PM, Ximin Luo  wrote:
>> Richard Biener:
>>> On Tue, Oct 18, 2016 at 2:35 PM, Ximin Luo  wrote:
>>>>
>>>> Thanks, I'll add the Changelog entry. My computer isn't very powerful, so 
>>>> I didn't bootstrap it yet, I only tested it on a stage1 compiler, on 
>>>> Debian testing/unstable. I'll find some time to bootstrap it and test it 
>>>> fully over the next few days.
>>>>
>>>> Shall I also get rid of the Darwin force_at_comp_dir stuff? Looking into 
>>>> it a bit more, my patch basically obsoletes the need for this so I can 
>>>> delete that as well.
>>>
>>> That would be nice.
>>>
>>
>> Hi,
>>
>> Attached is the ChangeLog plus updated patch, rebased against the 2016-10-16 
>> snapshot. Also I noticed I got the wrong bug number, the correct one is 
>> 77985 not 77895.
>>
>> I've tested it on a Debian testing/unstable x86_64-linux-gnu system. The 
>> results are good, the same tests fail both before and after the patch, and 
>> we have 2 new expected successes. Unfortunately I don't have access (and am 
>> unlikely to get access) to a Darwin system to test it on.
>>
>> Snippets of the test logs are attached. The full logs are about 200MB each 
>> in size (4MB XZ-compressed, each) so I guessed I shouldn't send them via 
>> email... The snippets were grepped from the logs using the '^FAIL: \|^# 
>> of\|pr77985' pattern. You can diff them to check that the results are same 
>> in both cases.
> 
> The patch is ok.  Do you have commit privileges?
> 

No, I don't.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PATCH] PR77985: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path

2016-10-21 Thread Ximin Luo
Richard Biener:
> On Tue, Oct 18, 2016 at 2:35 PM, Ximin Luo  wrote:
>>
>> Thanks, I'll add the Changelog entry. My computer isn't very powerful, so I 
>> didn't bootstrap it yet, I only tested it on a stage1 compiler, on Debian 
>> testing/unstable. I'll find some time to bootstrap it and test it fully over 
>> the next few days.
>>
>> Shall I also get rid of the Darwin force_at_comp_dir stuff? Looking into it 
>> a bit more, my patch basically obsoletes the need for this so I can delete 
>> that as well.
> 
> That would be nice.
> 

Hi,

Attached is the ChangeLog plus updated patch, rebased against the 2016-10-16 
snapshot. Also I noticed I got the wrong bug number, the correct one is 77985 
not 77895.

I've tested it on a Debian testing/unstable x86_64-linux-gnu system. The 
results are good, the same tests fail both before and after the patch, and we 
have 2 new expected successes. Unfortunately I don't have access (and am 
unlikely to get access) to a Darwin system to test it on.

Snippets of the test logs are attached. The full logs are about 200MB each in 
size (4MB XZ-compressed, each) so I guessed I shouldn't send them via email... 
The snippets were grepped from the logs using the '^FAIL: \|^# of\|pr77985' 
pattern. You can diff them to check that the results are same in both cases.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git
== gcc-build/gcc/testsuite/gcc/gcc.log ==
FAIL: c is -1, not 6303904
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303904
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303920
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303920
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303936
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303936
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303952
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303952
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303968
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303968
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303984
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: ret is -1, not 6303984
FAIL: c is -1, not 6303920
FAIL: o is -1, not 6303904
FAIL: w is -1, not 6303984
FAIL: c is -1, not 6303920
FAIL: o is -1, not 6303904
FAIL: w is -1, not 6303984
FAIL: c is -1, not 6303936
FAIL: o is -1, not 6303920
FAIL: w is -1, not 6303984
FAIL: c is -1, not 6303936
FAIL: o is -1, not 6303920
FAIL: w is -1, not 6303984
FAIL: c is -1, not 6303952
FAIL: o is -1, not 6303936
FAIL: w is -1, not 6303984
FAIL: c is -1, not 6303952
FAIL: o is -1, not 6303936
FAIL: w is -1, not 6303984
FAIL: c is -1, not 6303968
FAIL: o is -1, not 6303952
FAIL: w is -1, not 6303984
FAIL: c is -1, not 6303968
FAIL: o is -1, not 6303952
FAIL: w is -1, not 6303984
FAIL: c is -1, not 6303984
FAIL: o is -1, not 6303968
FAIL: w is -1, not 6303984
FAIL: ret is -1, not 6303968
FAIL: c is -1, not 6303904
FAIL: e is -1, not 6303984
FAIL: c is -1, not 6303904
FAIL: e is -1, not 6303984
FAIL: c is -1, not 6303920
FAIL: e is -1, not 6303984
FAIL: c is -1, not 6303920
FAIL: e is -1, not 6303984
FAIL: c is -1, not 6303936
FAIL: e is -1, not 6303984
FAIL: c is -1, not 6303936
FAIL: e is -1, not 6303984
FAIL: c is -1, not 6303952
FAIL: e is -1, not 6303984
FAIL: c is -1, not 6303952
FAIL: e is -1, not 6303984
FAIL: c is -1, not 6303968
FAIL: e is -1, not 6303984
FAIL: c is -1, not 6303968
FAIL: e is -1, not 6303984
FAIL: ret is -1, not 0
FAIL: c is -1, not 6303904
FAIL: n is -1, not 6303920
FAIL: t is -1, not 6303984
FAIL: c is -1, not 6303904
FAIL: n is -1, not 6303920
FAIL: t is -1, not 6303984
FAIL: c is -1, not 6303920
FAIL: n is -1, not 6303936
FAIL: t is -1, not 6303984
FAIL: c is -1, not 6303920
FAIL: n is -1, not 6303936
FAIL: t is -1, not 6303984
FAIL: c is -1, not 6303936
FAIL: n is -1, not 6303952
FAIL: t is -1, not 6303984
FAIL: c is -1, not 6303936
FAIL: n is -1, not 6303952
FAIL: t is -1, not 6303984
FAIL: c is -1, not 6303952
FAIL: n is -1, not 6303968
FAIL: t is -1, not 6303984
FAIL: c is -1, not 6303952
FAIL: n is -1, not 6303968
FAIL: t is -1, not 6303984
FAIL: c is -1, not 6303968
FAIL: n is -1, not 6303984
FAIL: t is -1, not 6303984
FAIL: c is -1, not 6303968
FAIL: n is -1, not 6303984
FAIL: t is -1, not 6303984
FAIL: ret is -1, not 0
FAIL: 5 PASS, 114 FAIL, 0 UNRESOLVED
FAIL: ret is -1, not 6299888
FAIL: o is -1, not 6299808
FAIL: w is -1, not 6299888
FAIL: o is -1, not 6299808
FAIL: w is -1, not 6299888
FAIL: o is -1, not 6299824
FAIL: w is -1, not 6299888
FAIL: o is -1, not 6299824
FAIL: w is -1, not 6299888
FAIL: o is -1, not 6299840
FAIL: w is -1, not 6299888
FAIL: o is -1, not 6299840
FAIL: w is -1, not 6299888
FAIL: o is

Re: [PATCH] PR77895: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path

2016-10-18 Thread Ximin Luo
Richard Biener:
> On Mon, Oct 17, 2016 at 11:44 PM, Mike Stump  wrote:
>> On Oct 17, 2016, at 2:38 PM, Ximin Luo  wrote:
>>>
>>> Mike Stump:
>>>> On Oct 17, 2016, at 11:00 AM, Ximin Luo  wrote:
>>>>> Therefore, it is better to emit it in all circumstances, in case the 
>>>>> reader needs to know what the working
>>>>> directory was at compile-time.
>>>>
>>>> I can't help but wonder if this would break ccache some?
>>>>
>>>
>>> Could you explain this in some more detail? At the moment, GCC will already 
>>> emit DW_AT_name with an absolute path (in the scenario that this patch is 
>>> relevant to). How does ccache work around this at the moment? (Does it use 
>>> debug-prefix-map? In which case, this also affects DW_AT_comp_dir, so my 
>>> patch should be fine.)
>>
>> If you compile the same file, but in a different directory, I wonder if cwd 
>> will cause the cache entry to not be reused.
>>
>> I expect one of the ccache people that are around would just know if it will 
>> care at all.
> 
> I believe ccache compares preprocessed source, definitely _not_ DWARF
> output, so this shouldn't break anything there.
> It might result in different object file output but as the reporter
> figured due to a bug in dwarf2out.c we end up generating
> DW_AT_comp_dir in almost all cases already.
> 
> I think the patch is ok but it misses a ChangeLog entry.  How did you
> test the patch? (bootstrapped and tested on ...?)
> 

Thanks, I'll add the Changelog entry. My computer isn't very powerful, so I 
didn't bootstrap it yet, I only tested it on a stage1 compiler, on Debian 
testing/unstable. I'll find some time to bootstrap it and test it fully over 
the next few days.

Shall I also get rid of the Darwin force_at_comp_dir stuff? Looking into it a 
bit more, my patch basically obsoletes the need for this so I can delete that 
as well.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PATCH] PR77895: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path

2016-10-17 Thread Ximin Luo
Mike Stump:
> On Oct 17, 2016, at 11:00 AM, Ximin Luo  wrote:
>> Therefore, it is better to emit it in all circumstances, in case the reader 
>> needs to know what the working
>> directory was at compile-time.
> 
> I can't help but wonder if this would break ccache some?
> 

Could you explain this in some more detail? At the moment, GCC will already 
emit DW_AT_name with an absolute path (in the scenario that this patch is 
relevant to). How does ccache work around this at the moment? (Does it use 
debug-prefix-map? In which case, this also affects DW_AT_comp_dir, so my patch 
should be fine.)

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


[PATCH] PR77895: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path

2016-10-17 Thread Ximin Luo
(Please keep me on CC, I am not subscribed)

In working on another patch for DW_AT_comp_dir behaviour, I tried to write a
test for that other patch. This test did not work as expected - DW_AT_comp_dir
was not even generated in the resulting output! But this had nothing to do with
my patch so I dug a bit deeper.

It turns out that GCC does not emit DW_AT_comp_dir if the source path given is
an absolute path, and the file being compiled contains a typedef to a compiler
builtin (sorry, I don't know the terminology here very well). This typedef
exists in the vast majority of cases in the real world via standard library
includes, so from the outside, the behaviour of GCC "looks like" it emits
DW_AT_comp_dir in all/most cases.

>From looking at the source code dwarf2out.c, Richard Biener determined that the
current code is written to "intend to" not emit DW_AT_comp_dir if to path is
absolute, regardless of the typedef. It is possible to "fix" this bug by a
1-line change, to conform to the way currently "intended" by GCC code. However,
I think a much better fix is a 22-line deletion of code from dwarf2out.c, to
instead emit DW_AT_comp_dir in all cases.

The original aim of not emitting DW_AT_comp_dir seems to be to avoid emitting
redundant information. However, this information is *not* redundant! The DWARF2
spec says:

"A DW_AT_comp_dir attribute whose value is a null-terminated string containing
the current working directory of the compilation command that produced this
compilation unit in whatever form makes sense for the host system."

Conceptually, the working directory is independent from an input file, so it
would *not* be redundant to emit it in the general case. In future versions of
DWARF, other information might be added (such as relative -I flags, etc) that
are dependent on knowing the working directory. The logic of "don't emit
DW_AT_comp_dir if DW_AT_name is absolute" would then break. In other words, the
choice to avoid emitting DW_AT_comp_dir is a brittle and premature optimisation
that loses information in the general case. Therefore, it is better to emit it
in all circumstances, in case the reader needs to know what the working
directory was at compile-time.

*Sometimes*, *parts* of it might be redundant yes - and rewriting DW_AT_name to
be relative to this, would remove the redundancy. Doing this is compatible with
the above points, and I could amend the patch to do this - although I suggest
it's not worth the effort, unless there is a function in GCC code that already
does this.

Some more minor advantages are:

- We delete 21 lines, this reduces complexity in the already-complex code.
  We can also get rid of the Darwin-specific workaround described here:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53453 (Not currently part of my
  patch, but I can add this in.)

- The GCC test suite compiles tests via their absolute paths, and many many
  other buildsystems do this too. This was in fact my original problem - I was
  trying to test something involving DW_AT_comp_dir, but GCC does not emit this
  in the testsuite! Rather than saying "this is a bug, everyone else should
  spend effort fixing this", it is better to fix it in one place, i.e. the
  source of where it is (not) generated.

Thanks for your time,

Ximin

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git
Index: gcc-7-20161009/gcc/dwarf2out.c
===
--- gcc-7-20161009.orig/gcc/dwarf2out.c
+++ gcc-7-20161009/gcc/dwarf2out.c
@@ -21994,7 +21994,7 @@ gen_compile_unit_die (const char *filena
 {
   add_name_attribute (die, filename);
   /* Don't add cwd for .  */
-  if (!IS_ABSOLUTE_PATH (filename) && filename[0] != '<')
+  if (filename[0] != '<')
 	add_comp_dir_attribute (die);
 }
 
@@ -26332,20 +26332,6 @@ prune_unused_types (void)
 prune_unmark_dies (ctnode->root_die);
 }
 
-/* Set the parameter to true if there are any relative pathnames in
-   the file table.  */
-int
-file_table_relative_p (dwarf_file_data **slot, bool *p)
-{
-  struct dwarf_file_data *d = *slot;
-  if (!IS_ABSOLUTE_PATH (d->filename))
-{
-  *p = true;
-  return 0;
-}
-  return 1;
-}
-
 /* Helpers to manipulate hash table of comdat type units.  */
 
 struct comdat_type_hasher : nofree_ptr_hash 
@@ -28152,15 +28138,7 @@ dwarf2out_early_finish (const char *file
   /* Add the name for the main input file now.  We delayed this from
  dwarf2out_init to avoid complications with PCH.  */
   add_name_attribute (comp_unit_die (), remap_debug_filename (filename));
-  if (!IS_ABSOLUTE_PATH (filename) || targetm.force_at_comp_dir)
-add_comp_dir_attribute (comp_unit_die ());
-  else if (get_AT (comp_unit_die (), DW_AT_comp_dir) == NULL)
-{
-  bool p = false;
-  file_table->traverse (&p);
-  if (p)
-	add_comp_dir_attribute (comp_unit_die ());
-}
+  add_comp_dir_attribute (comp_unit_die ());