Re: [patch] sort 'find' output to enable deterministic builds.

Chris Demetriou Tue, 16 Mar 2010 14:02:03 -0700

On Tue, Mar 16, 2010 at 13:12, Ralf Wildenhues <ralf.wildenh...@gmx.de> wrote:
> Yes, that may be it.  However, that also means that, while the patch
> fixes things for you, it doesn't really add value to Libtool in the
> sense that we cannot guarantee an improvement of some portable kind to
> users.  That's ok per se, but of course not ideal.  ;-)


Sort-of.  It (or something similar) is necessary, but not sufficient.

In order to get a benefit from this, users definitely need a
tightly-controlled environment.  It's achievable with relatively
little effort with ELF, tho.

OTOH, without this, even with a tightly-controlled environment, the
only way to achieve the desired result is by wrapping/hacking find (to
produce a deterministic output order) or by making a similar change
locally, or by restricting the environment even more tightly (e.g.,
filesystems or host OSes used for build).

It's not particularly hard for people who care to replicate this
change (it took me longer to rebuild my tools with this change on N
host systems to verify the change, than it did to identify the source
of the problem once I started looking)... but IMO it does add *some*
value.


>> (BTW, my purpose isn't to be able to compare output files so much as
>> create bit-identical output files.  There are several reasons for
>> this, but in summary, comparison isn't quite good enough.
>
> Wanna share a couple of those reasons why comparison isn't good enough?

Philosophically, I strongly believe that bit-for-bit reproducible
builds are good.  They give me confidence in my own build processes.

However, there are several points related to this, why I prefer to
avoid comparison tools:

* Let's start off with "I'm lazy!" 8-)  If i've gotta maintain a
custom diff tool, then that's one more tool that I have to maintain.
It *will* break, and it will grow more and more special cases and hair
over time -- and IMO the very existence of a tool encourages people to
make changes which result in more and more binary divergences from
build to build.  Can do it, but bit-identical is better, and
zero-tolerance for divergence means less long-term maintenance: it's
the responsibility of people making changes to make sure they work
properly w.r.t. reproducible builds.  (Related: if I have to maintain
a custom diff tool, e.g., to compare the contents of RPM packages...
I'd rather write one that's simpler, e.g., operates on the hashes of
the files in the package, rather than extracts the package contents
and does special comparisons.)

* convincing others and/or auditing changes.  If I want to convince
you that my build processes produce consistent results from build to
build (e.g., across different host system types), or that my new
release contains *no* changes to a particular component... which would
you rather see: bit-for-bit identical or "mutated within acceptable
limits" (as verified by a special tool)?   Again, use of a custom
comparison tool is possible, but not ideal.

* revision control and build systems.  For some purposes, we check
toolchains (including related libraries) into revision control, in
which case diffs -> extra revisions.  For build or revision control
systems which cache or which are content-addressable (e.g., use file
hashes to look up files), any bit different == "completely different,"
with follow-on inefficiencies.  Again, could be addressed via a
special tool (just like I know people have written 'ar' file timestamp
mungers, before I added 'D' support to 'ar')... but just better to get
it right during the build.


Those are my thoughts, anyway.


chris

Re: [patch] sort 'find' output to enable deterministic builds.

Reply via email to