Michael - On Thu, Apr 06, 2023 at 12:11:38PM +0200, Michael Schierl wrote: > Am 06.04.2023 um 10:28 schrieb Larry Doolittle: > > I'm trying to make a process to generate byte-for-byte reproducible zip > > files. > > I got the contents identical, including timestamps and permissions. > > But three bytes at the 98.08% mark (bytes 5543078 to 5543081, > > out of a file size 5651451) differ between my run and a friend's run. > > Looking at the zip entry starting at 0x00549481: > [...] > Let's dissect the fields: > ID 0x5455 ("UT") Length 0x0009 Data 03 68 ca 2c 64 XX XX XX 64 > ID 0x7875 ("ux") Length 0x000b Data 01 04 e8 03 00 00 04 e8 03 00 00 > 0x5455 is Info-Zip's "extended timestamp" field: > [...] > As the flags are 03, mod time and access time are present, and the > different bits are within access time.
Thanks! That helps a lot. If I'm careful, I can even see the difference between the two zip files by unpacking and $ diff <(ls --full-time -u fab-ea2bb52c-ld) <(ls --full-time -u fab-ea2bb52c-mb) 22c22 < -rw-r--r-- 1 redacted redacted 644661 2023-04-04 18:10:00.000000000 -0700 marble-ipc-d-356.txt --- > -rw-r--r-- 1 redacted redacted 644661 2023-04-06 00:25:03.000000000 -0700 > marble-ipc-d-356.txt Do you know of any tooling that can help decode zip file contents in general? Ideally something that could be absorbed into diffoscope? Maybe that one-liner above would be a useful addition to diffoscope. I took a quick look for the documentation you quoted. That's proginfo/extrafld.txt in Debian's zip source package, right? It sure looks reverse-engineered. I guess I shouldn't expect anything different for a package where upstream source ends in 2008. :-/ > I have no experience with the various zip tools used on Unix/Linux, but > probably you can avoid including those extra fields by using the -X option. Good: smaller file Good: less to go wrong with reproducibility Bad: the only time stamps left in the file are DOS-style implied-local- timezone. So a zip file prepared with TZ=UTC (as needed for reproducibility) will unpack to files with future timestamps (if unpacked shortly after being created) for non-expert users in half the globe. The correct unpacking instruction on *nix to avoid that becomes TZ=UTC unzip foo.zip Again, thanks for your prompt and constructive response! - Larry