On Mon, Dec 28, 2020 at 10:26 AM Jonas Hahnfeld <hah...@hahnjo.de> wrote:
> Am Sonntag, dem 27.12.2020 um 22:24 +0100 schrieb Werner LEMBERG: > > > Intercepting syscalls (or whatever the library does, I didn't > > > check) doesn't sound like the right approach outside of testing > > > reproducibility. > > > > Why? It's even less intrusive than the `SOURCE_DATE_EPOCH` solution. > > I definitely consider intercepting various syscalls by means of > LD_PRELOADing more intrusive than setting a single environment variable > that was invented for the purpose of setting timestamps. Just think of > a new shiny syscall that might add a new source of non-reproducibility. > I agree with Jonas. As a further argument, LD_PRELOAD is also dependent on the platform; I think it wouldn't work on OSX, for example. > > > I think that's a pity, but nothing we can change as a > > > "consumer" of library functions. > > > > Exactly. As long as we don't change LilyPond to produce PDFs by > > itself – which is a huge undertaking that I certainly won't start – > > I think we have no other choice than using something like > > 'libfaketime' or a patched gs version. I definitely prefer the > > former. > > What I wanted to say is that we cannot change the developers' minds to > support the environment variable. But we can (and IMHO should) use all > available interfaces if we care about reproducibility. I see at least > two more options: > Yes, +1 from me. > > 1) Strip non-determinism from the generated PDF. This is even mentioned > at https://reproducible-builds.org/docs/timestamps/ - before discussing > libfaketime which spends more than half of the paragraph mentioning > possible issues. > > 2) As we control the input PS code, we don't have to worry about the > operators that get the current time, draw a random number, etc. (as > long as we don't use them ourselves). Instead the bug linked above says > we just need to tell GS which CreationDate and ModDate to use (via > PDFmarks) and this should be straight-forward to fill with values > depending on SOURCE_DATE_EPOCH. > This probably leaves the UUIDs (is that the issue you mention above?) > which can be overridden using -sDocumentUUID and -sInstanceUUID. > Setting a constant time using libfaketime will result in the same UUID > for all generated PDFs, so it can't get worse; but I think it would be > desirable to do better than that and compute a "unique" ID based on the > input file, maybe as simple as the hash of the file path. It must be > considered that different values will prevent reuse of the GS API > instance, but I'd argue that a constant value should be fine in this > case. > the man for DocumentUUID says Note that Ghostscript has no assess to the host node ID due to a minimization of platform dependent modules. Therefore it uses an MD5 hash of the document contents for generating UUIDs. I wonder if we'd get reproducible documents if we provide only InstanceUUID -- Han-Wen Nienhuys - hanw...@gmail.com - http://www.xs4all.nl/~hanwen