Hi hackers

As a followup to the "Timestamps in static libraries" thread which resulted in 
a '-D' option to ar(1), I'd like to discuss if it is a worthy goal of the 
Project to create deterministic builds. By that I mean for two make 
build+install world+kernel+distribution runs, every contained file is bitwise 
identical between the two runs.

Deterministic builds would be useful for me, since I'm creating binary diffs 
against lots of FreeBSD builds, and smaller diffs are good. Also, I'd like to 
detect which files have changed between two commits. I imagine it would also be 
useful for things like IDS and freebsd-update.

Currently, this does not hold for static libraries (*.a), kernel modules (*.ko 
/ *.ko.symbols) and the following:

bthidd
cc1
cc1obj
cc1plus
clang
clang++
ctfconvert
freebsd.cf
freebsd.submit.cf
kernel
kernel.symbols
libcrypto.so.6
libufs.so.5
loader
pxeboot
sendmail.cf
submit.cf
tblgen
zfsloader

Most of the libraries can be brought to be identical by using ar -D. Some 
record the absolute OBJDIR path to header files, though (libc.a for example).

I tried adding 'D' to ARFLAGS in share/mk/sys.mk, but that's only part of the 
solution. ARFLAGS are overridden hundreds of places in the source code, and in 
some places ARFLAGS isn't even used (or AR for that matter). Is it worthwhile 
to go through the whole tree, fixing up these calls to ar? A lot of this is in 
contrib/ code.

Another option is to add a WITH_DETERMINISTIC_AR knob to the build to compile 
ar with D as default behaviour. This would make the above changes unnecessary, 
but is more intrusive.

A third option is that this is not a priority for the community, or directly 
unwanted, and that I just post-process my builds myself.

I don't know what causes the checksum difference in .ko files - there is no 
size difference, and no difference according to strings(1). A bsdiff on the two 
is typically around 160B.

.ko.symbols have some unique identifiers or addresses internally.

kernel, loader, zfsloader and pxeboot have a build date recorded, kernel also 
has absolute path to GENERIC. OK for the kernel, I think, although it would be 
easier for me if this was just stored in a separate file since binary diffs on 
large files are expensive.

clang, clang++ and tblgen store some absolute paths to .cpp files in the src 
repo internally, plus unique identifiers.

freebsd.cf, freebsd.submit.cf, sendmail.cf and submit.cf record the absolute 
OBJDIR path to sendmail

What do you think?


Thanks,
Erik

Reply via email to