[bug #57218] [PATCH] Reproducible builds support is broken and embeds timezone

2020-12-27 Thread G. Branden Robinson
Follow-up Comment #16, bug #57218 (project groff):


[comment #14 comment #14:]
> My Debian-based system is running groff 1.22.4.
> 
> On the other hand, groff from Git HEAD and 1.22.4 do handle the time zone
differently; this is easily demonstrated by generating a blank PostScript
document.
> 
> 
> $ echo | groff | grep '%%CreationDate'
> %%CreationDate: Mon Dec 28 03:17:07 2020
> $ echo | ./build/test-groff | grep '%%CreationDate'
> %%CreationDate: Mon Dec 28 14:17:10 2020
> $ echo | SOURCE_DATE_EPOCH=1609125163 groff | grep '%%CreationDate'
> %%CreationDate: Mon Dec 28 03:12:43 2020
> $ echo | SOURCE_DATE_EPOCH=1609125163 ./build/test-groff | grep
'%%CreationDate'
> %%CreationDate: Mon Dec 28 14:12:43 2020
> 
> 
> So that's the next item for research.

The above was the product of my repeated error of forgetting that Debian's
groff is patched to return `gmtime()` instead of `localtime()`.

So this issue now boils down to the one posed by Eli in the first place.

Should groff implicitly incorporate the system's time zone into built
artifacts?

Consensus on the mailing list appears to have roughly converged on "yes",
because it is easily overcome--anyone who sets SOURCE_DATE_EPOCH can with
equal ease also set TZ=UTC, whereas going the other direction (inferring and
reversing a time zone offset) is immensely more complicated.

Comments?

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #57218] [PATCH] Reproducible builds support is broken and embeds timezone

2020-12-27 Thread G. Branden Robinson
Follow-up Comment #15, bug #57218 (project groff):

Really attaching the man-db build diff this time.

(file #50579)
___

Additional Item Attachment:

File name: man-db-reproducible-build.diff Size:4 KB
   




___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #57218] [PATCH] Reproducible builds support is broken and embeds timezone

2020-12-27 Thread G. Branden Robinson
Follow-up Comment #14, bug #57218 (project groff):

With groff's own build now as reproducible as we can get it without support
from other GNU projects, the challenge now is to reproduce the problem
described in the original report.

I performed two consecutive package rebuilds of man-db 2.8.5 on my
Debian-based system and cannot reproduce the problem with groff-generated
PostScript files.  The only differences I observe are in temporary file names
produced by GNU Autoconf tests run by ./configure.  I am attaching this
artifact.

My strategy was simple:


SOURCE_DATE_EPOCH=1609123901 debuild -us -uc


Note that there is no difference in the generated PostScript files.


$ grep -r '%%CreationDate' build.*
build.new/manual/man_db.ps:%%CreationDate: Mon Dec 28 02:51:41 2020
build.old/manual/man_db.ps:%%CreationDate: Mon Dec 28 02:51:41 2020


My Debian-based system is running groff 1.22.4.

On the other hand, groff from Git HEAD and 1.22.4 do handle the time zone
differently; this is easily demonstrated by generating a blank PostScript
document.


$ echo | groff | grep '%%CreationDate'
%%CreationDate: Mon Dec 28 03:17:07 2020
$ echo | ./build/test-groff | grep '%%CreationDate'
%%CreationDate: Mon Dec 28 14:17:10 2020
$ echo | SOURCE_DATE_EPOCH=1609125163 groff | grep '%%CreationDate'
%%CreationDate: Mon Dec 28 03:12:43 2020
$ echo | SOURCE_DATE_EPOCH=1609125163 ./build/test-groff | grep
'%%CreationDate'
%%CreationDate: Mon Dec 28 14:12:43 2020


So that's the next item for research.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #57218] [PATCH] Reproducible builds support is broken and embeds timezone

2020-12-25 Thread G. Branden Robinson
Follow-up Comment #13, bug #57218 (project groff):

Today's commits significantly improve the situation.


2020-12-25  G. Branden Robinson 

* doc/doc.am (.texi.dvi): Call texi2dvi with FORCE_SOURCE_DATE=1
in the environment, avoiding an embedded timestamp in the
generated groff.dvi file, which frustrated reproducible builds.
Thanks to Werner Lemberg for the suggestion.

* src/roff/groff/tests/string_case_xform_unicode_escape.sh: Fix
test to no longer use Bash process substitution, resulting in
nondeterministic file descriptor numbers appearing in test logs,
frustrating reproducible builds.

* contrib/pdfmark/pdfmark.am (PDFROFF): Call pdfroff without
`--keep-temporary-files` option.  Temporary directories are
created with mktemp(1) and files with an embedded process
identifier, which frustrates reproducible builds.

See .


Attaching new, much shorter (2KiB), diff of two sequential builds using
SOURCE_DATE_EPOCH.

(file #50566)
___

Additional Item Attachment:

File name: groff-repro-build_2020-12-25.diff Size:2 KB
   




___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #57218] [PATCH] Reproducible builds support is broken and embeds timezone

2020-12-25 Thread G. Branden Robinson
Follow-up Comment #12, bug #57218 (project groff):


[comment #11 comment #11:]
> To get reproducible DVI files produces by TeX engines you have to set the
following two environment variables:
> 
> export SOURCE_DATE_EPOCH=...
> export FORCE_SOURCE_DATE=1
> 
> However, to get reproducible PDFs produced by ghostscript, a lot of manual
work is necessary.  See
> 
> https://bugs.ghostscript.com/show_bug.cgi?id=696765
> 
> for more information

Thanks, Werner!

SOURCE_DATE_EPOCH + FORCE_SOURCE_DATE fixed the groff.dvi point but not the
logs generated by pdfTeX.

I've got the diff down to points A, C, and D.  These are pretty easy to
eyeball.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #57218] [PATCH] Reproducible builds support is broken and embeds timezone

2020-12-22 Thread Werner LEMBERG
Follow-up Comment #11, bug #57218 (project groff):

To get reproducible DVI files produces by TeX engines you have to set the
following two environment variables:

export SOURCE_DATE_EPOCH=...
export FORCE_SOURCE_DATE=1

However, to get reproducible PDFs produced by ghostscript, a lot of manual
work is necessary.  See

https://bugs.ghostscript.com/show_bug.cgi?id=696765

for more information 

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #57218] [PATCH] Reproducible builds support is broken and embeds timezone

2020-12-22 Thread G. Branden Robinson
Update of bug #57218 (project groff):

 Assigned to:  bgarrigues => gbranden   

___

Follow-up Comment #10:

I'm attaching a diff of two back to back builds I did today with:


SOURCE_DATE_EPOCH=1608607664 CORES=3 make-groff


...where `make-groff` is a local script I have that essentially wraps the
following:


./bootstrap && \
mkdir build && \
cd build && \
../configure && \
make -j $CORES && \
make check && \
make doc && \
make distcheck


The attached diff is 542KiB.  If, however, I delete the diffs resulting from
pdfroff's different choices of temporary filenames, it shrinks to 5.3KiB.

We can literally solve 99% of our reproducibility problem by making pdfroff
use deterministic temporary file names.

I further find the following.

A. There are nondeterministic temporary file names in config.log.  I assume
this is a problem for GNU autoconf to solve.

B. groff.dvi differs.  Of TeX DVI, it has famously been observed that its
"wiki page...feels the need to specifically point out that “DVI is not a
document encryption format”"[1].  I propose not to undertake research of
this item until the next one is resolved.  In any event, this DVI file is
produced not by groff -Tdvi but by makeinfo, so if resolving (C) below doesn't
fix this as a side effect, it seems likely that this is GNU Texinfo's problem
to solve.

C. Log files created by pdfTeX query the system clock and embed a timestamp. 
I assume this is a problem for pdfTeX to solve.

D. The generated distribution tar archives (gzipped) differ.  As far as I can
tell from a quick inspection, the gunzipped tar files differ only in file
metadata, not a surprise given tar's purpose. GNU tar has a feature to support
SOURCE_DATE_EPOCH[2], so I assume that this detail of archive generation is a
problem for GNU automake to solve.

E. One test case log differs, appearing two places (the individual and
combined test logs).  The difference is in the number of the file descriptor
GNU Bash opens when using process substitution, a non-POSIX shell feature. 
Ingo has encouraged me to get rid of Bashisms in the test cases, and this item
of trouble is sufficient to motivate me to do so in the instant case.

Conclusions: (1) Adjusting pdfroff's temporary file naming will pay a big
dividend.  (2) Taking a Bashism out of one of our test cases (which I wrote)
will pay a small one.  (3) The rest are going to require effort from other
projects.

[1] https://yakshav.es/the-patron-saint-of-yakshaves/
[2] https://www.gnu.org/software/tar/manual/html_section/tar_33.html

(file #50541, file #50542)
___

Additional Item Attachment:

File name: groff-repro-build_2020-12-22.diff Size:541 KB
   


File name: groff-repro-build_2020-12-22_pdfroff_trimmed.diff Size:5 KB
   




___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #57218] [PATCH] Reproducible builds support is broken and embeds timezone

2020-12-19 Thread G. Branden Robinson
Follow-up Comment #9, bug #57218 (project groff):

See https://lists.gnu.org/archive/html/groff/2020-12/msg00039.html for further
discussion.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #57218] [PATCH] Reproducible builds support is broken and embeds timezone

2020-12-16 Thread Jim Avera
Follow-up Comment #8, bug #57218 (project groff):

Commenting as an ordinary user:

Please do not make groff ignore TZ.  That breaks user functionality (for
example \*(DT will be wrong around midnight when using -mm).

The build systems should simply run everything with TZ=UTC to make displays be
the same regardless of where the build executes.  

* What would people think if the date(1) command were modified to ignore the
locale?   The answer is exactly the same for groff.



Also, as a user, I agree that timestamps or any other
possibly-privacy-sensitive info should not be be embedded in output files
unless there is a functional necessity.  I realize there was not consensus to
get rid of %%CreationDate in generated pdfs, but I wanted to express my
opinion.



Thanks for listening

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #57218] [PATCH] Reproducible builds support is broken and embeds timezone

2020-11-05 Thread Ingo Schwarze
Follow-up Comment #7, bug #57218 (project groff):

To clarify:
1. I do not object to reproducible build support as far as it is easy to
implement and has no downsides.
2. In particular, i do not object to cjwatson@'s patch #50208 that gbranden@
puts forward in comment #6.  I did not test the patch, but i inspected it and
i think it makes things better.  Trusting that it has been tested, i think it
should be pushed.  While randomizing the order of functions in shared object
files definitely does provide security benefits, i don't see any security
benefit from randomizing the order of fonts in PDF files and the like...
3. I agree with gbranden@ that there is value in avoiding gratuitious, random
changes in build logs because that makes spotting build regressions easier. 
Sometimes, it's not possible without other downsides, but when it is, that's
good.
4. I do not insist that we delete %%CreationDate; Dave is right that there is
no consensus for doing that.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #57218] [PATCH] Reproducible builds support is broken and embeds timezone

2020-11-04 Thread G. Branden Robinson
Follow-up Comment #6, bug #57218 (project groff):

If we're trying to support reproducible builds--and I acknowledge that Ingo
thinks we shouldn't, but I disagree, because when I started working on groff,
I wanted to be able to diff two build trees before and after a change I made,
and there were numerous silly differences that Colin has fixed with patches
like the one I'm attaching. 

(file #50208)
___

Additional Item Attachment:

File name: sort-perl-hash-keys.patch  Size:3 KB
   




___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #57218] [PATCH] Reproducible builds support is broken and embeds timezone

2020-10-29 Thread Dave
Follow-up Comment #5, bug #57218 (project groff):

[comment #4 comment #4:]
> I think groff should not write %%CreationDate into PostScript files it
creates.

This would be my preferred solution.  It's simplest to implement, and in most
cases, this field is merely repeating the timestamp of the PostScript file:
it's less common for PostScript code to be directly modified after its
creation, than for the files or commands that generated it to be modified and
rerun, updating both timestamps again to the same value.  (This is probably
even more true of gropdf but much less true of grohtml.)

But there didn't seem to be consensus around this solution in the 2014 email
thread referenced in comment #0.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




Re: [bug #57218] [PATCH] Reproducible builds support is broken and embeds timezone

2020-10-22 Thread Steffen Nurpmeso
Hu-hu!

Ingo Schwarze wrote in
 <20201022-173721.sv97361.91...@savannah.gnu.org>:
 |Follow-up Comment #4, bug #57218 (project groff):
 |
 |I think groff should not write %%CreationDate into PostScript files it
 |creates.  Not because of reproducible builds, which i consider a useless \
 |and
 |even detrimental feature, but because of privacy concerns.  That would also
 |solve this so-called "problem" as a side effect.
 |
 |Basically, it amounts to something like deleting these lines from
 |src/devices/grops/ps.cpp:
 |
 |  {
 |fputs("%%CreationDate: ", out.get_file());
 |  #ifdef LONG_FOR_TIME_T
 |long
 |  #else
 |time_t
 |  #endif
 |t = current_time();
 |fputs(ctime(&t), out.get_file());
 |}

I think better it would be to check for
getenv("SOURCE_DATE_EPOCH"), and use a fixed constant if that is
true.  (For the MUA i maintain i use SOURCE_DATE_EPOCH=844221007
in mx-test.sh, i have forgotten why, hmm.)
It is a pity that reproducible-builds.org only goes for that
single constant, and does not offer finer control for tests etc.

Ciao and good night,

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



[bug #57218] [PATCH] Reproducible builds support is broken and embeds timezone

2020-10-22 Thread Ingo Schwarze
Follow-up Comment #4, bug #57218 (project groff):

I think groff should not write %%CreationDate into PostScript files it
creates.  Not because of reproducible builds, which i consider a useless and
even detrimental feature, but because of privacy concerns.  That would also
solve this so-called "problem" as a side effect.

Basically, it amounts to something like deleting these lines from
src/devices/grops/ps.cpp:

  {
fputs("%%CreationDate: ", out.get_file());
  #ifdef LONG_FOR_TIME_T
long
  #else
time_t
  #endif
t = current_time();
fputs(ctime(&t), out.get_file());
  }

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #57218] [PATCH] Reproducible builds support is broken and embeds timezone

2020-10-21 Thread G. Branden Robinson
Follow-up Comment #3, bug #57218 (project groff):

> I guess we could have a register \n[.utc] which is normally zero but is set
to 1 if SOURCE_DATE_EPOCH is in the environment. 

After thinking about that for 60 more seconds, I retract it.  It would mean
having to insert a layer on top of all those C library time calls in the patch
to choose which ones to use.  Gross.

> \n[dm]

I misspoke here.  They day-of-the-month register is \n[dy].

I think it'd be better if we just decreed that groff 1.23 runs on UTC
henceforth, even in compatibility mode.  If people want to reproduce unaltered
legacy documents with legacy dates, they already have to contrive a fake time
in the system clock anyway.

And if they're doing that, they can finagle that clock by up to 14[*] hours
more to correct for the desired local time.

Also, the registers are writable, so people can just -r their way around the
problem.

And as far as I can reason it out, any document that hard-codes these
registers doesn't have a problem (and it'll certainly be reproducible!).

[*] Aloha from the Line Islands.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #57218] [PATCH] Reproducible builds support is broken and embeds timezone

2020-10-21 Thread G. Branden Robinson
Update of bug #57218 (project groff):

Severity:  3 - Normal => 5 - Blocker
  Status:None => Need Info  

___

Follow-up Comment #2:

I nominate this as a blocker because the groff 1.23 release should be
reproducible.

> This patch should be incorporated upstream if possible. Was it ever proposed
here? Is it suitable for inclusion as-is?

No and no, not as far as I know.  Colin Watson is active on the groff mailing
list (and has committed to its repository), so I reckon he'd have brought it
forward if he thought it was ready.

The tricky part would seem to be this:

> (Note that this changes the semantics of \n[hours] etc., so may need further
work.)

If anything the above is understated; it affects the semantics of every
date-related register coarser than the second, I reckon.  

There are localtime offsets smaller than one hour; Australia perversely does
this, and as the great Christopher Hitchens pointed out:

"The time-zone difference between India and Pakistan, for example, is half an
hour.  That's a nicely irrational and arbitrary slice out of daily life.  In
Cyprus, the difference between the clocks in the Greek and Turkish sectors is
an hour--but it's the only in-country north-south time change that I am aware
of, and it operates on two sides of the same capital city."

The closer to New Year's Day you generate your document, the worse the
situation becomes.

That includes the traditional AT&T troff registers: \n[dw], \n[dm], \n[mo],
and \n[yr], plus the groff extensions \n[hours], \n[minutes], and \n[year].

(\n[yr] had a Y2K bug.)

I guess we could have a register \n[.utc] which is normally zero but is set to
1 if SOURCE_DATE_EPOCH is in the environment.

Thoughts?

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #57218] [PATCH] Reproducible builds support is broken and embeds timezone

2020-10-13 Thread Dave
Update of bug #57218 (project groff):

 Summary: Reproducible builds support is broken and embeds
timezone => [PATCH] Reproducible builds support is broken and embeds timezone


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/