Re: Completely reproducible Haskell builds

2011-05-27 Thread Matthias Kilian
On Fri, May 27, 2011 at 04:35:17PM +0100, Simon Marlow wrote:
> >Next best idea is to make GHC use repeatable temporary .c&  .o file
> >names for each invocation. There is already a unique temporary
> >directory where all the the temporary files are created. This suggests
> >I do not need to worry about adversarial races. So GHC just need to
> >avoid racing with itself. I see a couple of options:
> >
> >   1) newTempName should create a new subdirectory for each call and
> >  the return a fixed name inside of this (so 
> >  /tmp/ghc28016_0/ghc28016_0.c
> >  above would become /tmp/ghc28016_0/0/dummy.c)
> >   2) mkExtraCObj could compute some hash function of its xs
> >  argument (C program text) and then create a file named, e.g.
> >  /tmp/ghc28016_0/38eb8d8eb0abe9c828ba60983e2a97f7a069ec41.c
> >
> >Which of these two looks better? Other ideas?
> 
> The first is easier, and would be fine with me.

An alternative could be to just strip the single `symbol' off the
object file (using something like strip -N ...). I didn't yet test
this for real.

Ciao,
Kili

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Completely reproducible Haskell builds

2011-05-27 Thread Simon Marlow

On 26/05/2011 17:15, Greg Steuck wrote:

I am trying to get ghc-7.0.3 build procedure down to a byte-identical
rebuild on Linux-amd64.


This is likely to be very fragile with GHC at the moment, due to some 
non-deterministic behaviour in its internals.  I have tried to eliminate 
as much as I can, but there are still a couple of known sources, and 
possibly more unknown ones.  See the last bullet point here:


http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/RecompilationAvoidance#Interfacestability

Having said that I think it's a good idea to fix any sources of binary 
differences as you're doing.



The best option for dealing with this seems to be using gcc ability to
accept input from a pipe. I know I could make this work on a Posix
system. Yet I suspect getting it to work on Windows would be overly
onerous.

Next best idea is to make GHC use repeatable temporary .c&  .o file
names for each invocation. There is already a unique temporary
directory where all the the temporary files are created. This suggests
I do not need to worry about adversarial races. So GHC just need to
avoid racing with itself. I see a couple of options:

   1) newTempName should create a new subdirectory for each call and
  the return a fixed name inside of this (so /tmp/ghc28016_0/ghc28016_0.c
  above would become /tmp/ghc28016_0/0/dummy.c)
   2) mkExtraCObj could compute some hash function of its xs
  argument (C program text) and then create a file named, e.g.
  /tmp/ghc28016_0/38eb8d8eb0abe9c828ba60983e2a97f7a069ec41.c

Which of these two looks better? Other ideas?


The first is easier, and would be fine with me.


Would people be open to accepting a patch along these lines if I were
to write one?


Sure.  But as I mentioned above, this might not be enough, or at least 
you might still get random differences from time to time.


Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Completely reproducible Haskell builds

2011-05-26 Thread Greg Steuck
I am trying to get ghc-7.0.3 build procedure down to a byte-identical
rebuild on Linux-amd64.

I solved one source of variability: ar embedding timestamps into .a
files (HOWTO at the end)

Now I am looking to eliminate variations in ELF64 executables. To make
things easy, I am going to demonstrate with unlit binary.

I start at a point where make leaves off.

% cp utils/unlit/dist/build/tmp/unlit{,.1}
% "/usr/bin/ghc" -o utils/unlit/dist/build/tmp/unlit   -O -H64m
-package-conf libraries/bootstrapping.conf   -i -iutils/unlit/.
-iutils/unlit/dist/build -iutils/unlit/dist/build/autogen
-Iutils/unlit/dist/build -Iutils/unlit/dist/build/autogen
-no-user-package-conf -rtsopts -odir utils/unlit/dist/build -hidir
utils/unlit/dist/build -stubdir utils/unlit/dist/build -hisuf hi -osuf
o -hcsuf hc  -no-auto-link-packages -no-hs-main
utils/unlit/dist/build/unlit.o
% sha1sum utils/unlit/dist/build/tmp/unlit{,.1}
6f679d9dd9a9ea84a68be99369c9f1dc72ba41f0  utils/unlit/dist/build/tmp/unlit
beed059e09c9429c3b74ea613d5be30c6c17ac3c  utils/unlit/dist/build/tmp/unlit.1
% ls -l utils/unlit/dist/build/tmp/unlit{,.1}
-rwxr-x--- 1 gnezdo eng 15112 May 25 21:53 utils/unlit/dist/build/tmp/unlit
-rwxr-x--- 1 gnezdo eng 15112 May 25 21:53 utils/unlit/dist/build/tmp/unlit.1
% for i in utils/unlit/dist/build/tmp/unlit{,.1}; do readelf -a $i >
$i.elf; done
% diff utils/unlit/dist/build/tmp/unlit{,.1}.elf
250c250
< 27:  0 FILELOCAL  DEFAULT  ABS ghc27965_0.c
---
> 27:  0 FILELOCAL  DEFAULT  ABS ghc20499_0.c

Looks like there is a temporary file name baked into the ELF file.

Indeed, running with -v reveals:

*** C Compiler:
/usr/bin/gcc -c /tmp/ghc28016_0/ghc28016_0.c -o
/tmp/ghc28016_0/ghc28016_0.o -I/usr/lib/ghc-7.0.3/include
-fno-stack-protector
*** Linker:
/usr/bin/gcc -v -o utils/unlit/dist/build/tmp/unlit
-fno-stack-protector utils/unlit/dist/build/unlit.o
/tmp/ghc28016_0/ghc28016_0.o

Digging a bit into the sources reveals mkExtraCObj in
DriverPipeline.hs which calls newTempName.

The best option for dealing with this seems to be using gcc ability to
accept input from a pipe. I know I could make this work on a Posix
system. Yet I suspect getting it to work on Windows would be overly
onerous.

Next best idea is to make GHC use repeatable temporary .c & .o file
names for each invocation. There is already a unique temporary
directory where all the the temporary files are created. This suggests
I do not need to worry about adversarial races. So GHC just need to
avoid racing with itself. I see a couple of options:

  1) newTempName should create a new subdirectory for each call and
 the return a fixed name inside of this (so /tmp/ghc28016_0/ghc28016_0.c
 above would become /tmp/ghc28016_0/0/dummy.c)
  2) mkExtraCObj could compute some hash function of its xs
 argument (C program text) and then create a file named, e.g.
 /tmp/ghc28016_0/38eb8d8eb0abe9c828ba60983e2a97f7a069ec41.c

Which of these two looks better? Other ideas?

Would people be open to accepting a patch along these lines if I were
to write one?

The steps to make ar not include the timestamps were

1) Add "AR_OPTS = qD" to build.mk. This takes care of most .a files.
2) Set AR_FLAGS=qD in the evironment and dummy out ranlib (create a
   no-op script called ranlib on your PATH prior to real ranlib). This
   takes care of libffi build.

Thanks
Greg

-- 
nest.cx is Gmail hosted, use PGP for anything private. Key:
http://tinyurl.com/ho8qg
Fingerprint: 5E2B 2D0E 1E03 2046 BEC3  4D50 0B15 42BD 8DF5 A1B0

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users