Re: Three bytes in a zip file

2024-04-08 Thread Chris Lamb
Larry Doolittle wrote:

> Yes.  The -X isn't needed, sometimes, and then when you least expect it, it 
> is.
> Classic reproducible-builds gotcha.
>
>> I'm happy to update this document myself if need be. :)
>
> Go for it.

Finally got around to doing this. Many thanks. :)


Best wishes,

-- 
  o
⬋   ⬊  Chris Lamb
   o o reproducible-builds.org 💠
⬊   ⬋
  o


Re: Three bytes in a zip file

2024-03-12 Thread Larry Doolittle
Chris -

On Wed, Mar 13, 2024 at 01:01:47AM +, Chris Lamb wrote:
> >   TZ=UTC zip -X --latest-time "$zipfile" fab/*
> >   # Note the -X flag; to be pedantic about timestamps,
> >   # that means you should unpack with TZ=UTC unzip "$zipfile".  See
> >   # 
> > https://lists.reproducible-builds.org/pipermail/rb-general/2023-April/002927.html
> Ah, interesting! Does that -X mean that
>   https://reproducible-builds.org/docs/archives/
> ... is incomplete?

Yes.  The -X isn't needed, sometimes, and then when you least expect it, it is.
Classic reproducible-builds gotcha.

> I'm happy to update this document myself if need be. :)

Go for it.

  - Larry


Re: Three bytes in a zip file

2024-03-12 Thread Chris Lamb
Hey,

>   echo "Forcing timestamp $SOURCE_DATE_EPOCH"
>   touch --date="@$SOURCE_DATE_EPOCH" fab/*
>   TZ=UTC zip -X --latest-time "$zipfile" fab/*
>   # Note the -X flag; to be pedantic about timestamps,
>   # that means you should unpack with TZ=UTC unzip "$zipfile".  See
>   # 
> https://lists.reproducible-builds.org/pipermail/rb-general/2023-April/002927.html

Ah, interesting! Does that -X mean that

  https://reproducible-builds.org/docs/archives/

... is incomplete? I'm happy to update this document myself if need be. :)


Best wishes,

-- 
  o
⬋   ⬊  Chris Lamb
   o o reproducible-builds.org 💠
⬊   ⬋
  o


Re: Three bytes in a zip file

2024-03-09 Thread Larry Doolittle via rb-general
Friends -

On Fri, Mar 08, 2024 at 05:21:43PM +0100, Fay Stegerman wrote:
> * Chris Lamb  [2024-03-08 12:16]:
> > Oh this is great work! So, using your tool, did you manage to solve the
> > underlying non-determinism? :)
> The original reproducibility issue this thread started with was traced back to
> the atime back then, my tool just hopefully makes doing that a bit easier :)
> I don't know how the original issue was fixed, but I can, eh, reproduce (and 
> get
> rid of) such an atime difference easily [with zip -X]:

For completeness, here is the production (working, well-tested) stanza of
shell scripting the emerged from that discussion.

# Create the final zip file
rm -f "$zipfile"
if test -n "$SOURCE_DATE_EPOCH"; then
  echo "Forcing timestamp $SOURCE_DATE_EPOCH"
  touch --date="@$SOURCE_DATE_EPOCH" fab/*
  TZ=UTC zip -X --latest-time "$zipfile" fab/*
  # Note the -X flag; to be pedantic about timestamps,
  # that means you should unpack with TZ=UTC unzip "$zipfile".  See
  # 
https://lists.reproducible-builds.org/pipermail/rb-general/2023-April/002927.html
else
  zip "$zipfile" fab/*
fi

As found on https://github.com/BerkeleyLab/Marble
in design/scripts/manufacturing.sh .

  - Larry


Re: Three bytes in a zip file

2024-03-09 Thread Chris Lamb
Hey Fay,

> The original reproducibility issue this thread started with was traced back to
> the atime back then, my tool just hopefully makes doing that a bit easier :)

Oh! I somehow missed that this was an atime-related issue at the time…
in addition to missing your PoC to call out to repro-apk from diffoscope
as well. Sorry about both of those.

Learning that it was an atime issue was especially interesting as it
connected a few things in my head, including the fact that I could
never reproduce a few .zip-related issues in the past — which I now
realise is because I mount my filesystems with noatime.

I'll implement zipdetails support shortly. As you say, it will be
quicker/straightforward to integrate. :)

Chris




> I almost forgot: zipdetails (that comes with perl) can also show this 
> difference
> (and quite a lot of other things, though its output is not usually so easy to
> diff which is why I tend to prefer my own tools -- diff-zip-meta, zipinfo.py,
> apksigtool -- but it might be easier to use that for diffoscope, at least for
> now):
>
> $ diff -Naur <( zipdetails atime1.zip ) <( zipdetails atime2.zip )
> @@ -15,7 +15,7 @@
>  0023   Length  0009
>  0025   Flags   '03 mod access'
>  0026   Mod Time65EB87EA 'Fri Mar  8 22:49:30 2024'
> -002A   Access Time 65EB87EA 'Fri Mar  8 22:49:30 2024'
> +002A   Access Time 65EB87EE 'Fri Mar  8 22:49:34 2024'
>  002E Extra ID #00027875 'ux: Unix Extra Type 3'
>  0030   Length  000B
>  0032   Version 01

-- 
  o
⬋   ⬊  Chris Lamb
   o o reproducible-builds.org 💠
⬊   ⬋
  o


Re: Three bytes in a zip file

2024-03-08 Thread Fay Stegerman
* Fay Stegerman  [2024-03-08 02:37]:
> * Larry Doolittle  [2023-04-06 23:59]:
> > Do you know of any tooling that can help decode zip file contents in 
> > general?
> 
> I know this thread is almost a year old now, but I finally got to my backlog
> working on diff-zip-meta.py [1], which is a tool specifically intended to
> elucidated differences in ZIP/APK metadata.  And as of today, the master 
> branch
> supports showing the kind of timestamp differences you reported in
> human-readable form, not just as a difference in the raw data:
> 
> $ diff-zip-meta foo.zip bar.zip
> --- foo.zip
> +++ bar.zip
> entry foo:
>   extra (entry):
> -   55540900035164ea655164ea6575780b000104e80304e803
> +   55540900035164ea655464ea6575780b000104e80304e803
> - extra (entry) atime=2024-03-08 01:05:21
> + extra (entry) atime=2024-03-08 01:05:24
> 
> - Fay
> 
> [1] https://github.com/obfusk/reproducible-apk-tools#diff-zip-metapy

I almost forgot: zipdetails (that comes with perl) can also show this difference
(and quite a lot of other things, though its output is not usually so easy to
diff which is why I tend to prefer my own tools -- diff-zip-meta, zipinfo.py,
apksigtool -- but it might be easier to use that for diffoscope, at least for
now):

$ diff -Naur <( zipdetails atime1.zip ) <( zipdetails atime2.zip )
@@ -15,7 +15,7 @@
 0023   Length  0009
 0025   Flags   '03 mod access'
 0026   Mod Time65EB87EA 'Fri Mar  8 22:49:30 2024'
-002A   Access Time 65EB87EA 'Fri Mar  8 22:49:30 2024'
+002A   Access Time 65EB87EE 'Fri Mar  8 22:49:34 2024'
 002E Extra ID #00027875 'ux: Unix Extra Type 3'
 0030   Length  000B
 0032   Version 01

- Fay


Re: Three bytes in a zip file

2024-03-08 Thread Fay Stegerman
* Chris Lamb  [2024-03-08 12:16]:
> Oh this is great work! So, using your tool, did you manage to solve the
> underlying non-determinism? :)
> 
> Based on the output (which labels the field as an 'extra atime' or
> similar), it seems like you've managed to work out which part of your
> toolchain is making the build reproducible — or am I being too
> optimistic?

The original reproducibility issue this thread started with was traced back to
the atime back then, my tool just hopefully makes doing that a bit easier :)

I don't know how the original issue was fixed, but I can, eh, reproduce (and get
rid of) such an atime difference easily:

$ touch foo
$ zip foo.zip foo   # this modifies the atime
$ zip bar.zip foo   # so this sees a different atime
$ diff-zip-meta foo.zip bar.zip
--- foo.zip
+++ bar.zip
entry foo:
  extra (entry):
-   5554090003fb32eb65fb32eb6575780b000104e80304e803
+   5554090003fb32eb650133eb6575780b000104e80304e803
- extra (entry) atime=2024-03-08 15:47:07
+ extra (entry) atime=2024-03-08 15:47:13
$ rm foo.zip bar.zip
$ zip -X foo.zip foo
$ zip -X bar.zip foo
$ diff-zip-meta foo.zip bar.zip
--- foo.zip
+++ bar.zip
$ rm foo.zip bar.zip
$ touch -a --date @0 foo
$ zip foo.zip foo
$ touch -a --date @0 foo
$ zip bar.zip foo
$ diff-zip-meta foo.zip bar.zip
--- foo.zip
+++ bar.zip

> ps. Separate to that, how amenable would you be to working with me getting
> this extra .ZIP metadata support built directly into diffoscope at
> some point…?

I haven't had time to work on packaging repro-apk for Debian, or to make an MR
for integrating it w/ diffoscope (or work on diffoscope at all really), but I
did make a quick PoC for the latter (though only for APK files, not regular ZIP
files) a while ago [1], FWIW.

- Fay

[1] 
https://salsa.debian.org/obfusk/diffoscope/-/commit/50a3830a7d433d968a92f24911dc85846d843bae

> Fay Stegerman wrote:
> > * Larry Doolittle  [2023-04-06 23:59]:
> >> Do you know of any tooling that can help decode zip file contents in 
> >> general?
> >
> > I know this thread is almost a year old now, but I finally got to my backlog
> > working on diff-zip-meta.py [1], which is a tool specifically intended to
> > elucidated differences in ZIP/APK metadata.  And as of today, the master 
> > branch
> > supports showing the kind of timestamp differences you reported in
> > human-readable form, not just as a difference in the raw data:
> >
> > $ diff-zip-meta foo.zip bar.zip
> > --- foo.zip
> > +++ bar.zip
> > entry foo:
> >   extra (entry):
> > -   55540900035164ea655164ea6575780b000104e80304e803
> > +   55540900035164ea655464ea6575780b000104e80304e803
> > - extra (entry) atime=2024-03-08 01:05:21
> > + extra (entry) atime=2024-03-08 01:05:24
> >
> > - Fay
> >
> > [1] https://github.com/obfusk/reproducible-apk-tools#diff-zip-metapy


Re: Three bytes in a zip file

2024-03-08 Thread Larry Doolittle
Chris -

On Fri, Mar 08, 2024 at 11:16:06AM +, Chris Lamb wrote:
> Oh this is great work!

As the (year-old) thread originator, I agree!  This fills a real gap!

> So, using your tool, did you manage to solve the
> underlying non-determinism? :)

We did figure it out, very tediously, a year ago.

  - Larry


Re: Three bytes in a zip file

2024-03-08 Thread Chris Lamb
Hey Fay,

Oh this is great work! So, using your tool, did you manage to solve the
underlying non-determinism? :)

Based on the output (which labels the field as an 'extra atime' or
similar), it seems like you've managed to work out which part of your
toolchain is making the build reproducible — or am I being too
optimistic?


Best wishes,

Chris


ps. Separate to that, how amenable would you be to working with me getting
this extra .ZIP metadata support built directly into diffoscope at
some point…?



Fay Stegerman wrote:

> * Larry Doolittle  [2023-04-06 23:59]:
>> Do you know of any tooling that can help decode zip file contents in general?
>
> I know this thread is almost a year old now, but I finally got to my backlog
> working on diff-zip-meta.py [1], which is a tool specifically intended to
> elucidated differences in ZIP/APK metadata.  And as of today, the master 
> branch
> supports showing the kind of timestamp differences you reported in
> human-readable form, not just as a difference in the raw data:
>
> $ diff-zip-meta foo.zip bar.zip
> --- foo.zip
> +++ bar.zip
> entry foo:
>   extra (entry):
> -   55540900035164ea655164ea6575780b000104e80304e803
> +   55540900035164ea655464ea6575780b000104e80304e803
> - extra (entry) atime=2024-03-08 01:05:21
> + extra (entry) atime=2024-03-08 01:05:24
>
> - Fay
>
> [1] https://github.com/obfusk/reproducible-apk-tools#diff-zip-metapy


-- 
  o
⬋   ⬊  Chris Lamb
   o o reproducible-builds.org 💠
⬊   ⬋
  o



Re: Three bytes in a zip file

2024-03-07 Thread Fay Stegerman
* Larry Doolittle  [2023-04-06 23:59]:
> Do you know of any tooling that can help decode zip file contents in general?

I know this thread is almost a year old now, but I finally got to my backlog
working on diff-zip-meta.py [1], which is a tool specifically intended to
elucidated differences in ZIP/APK metadata.  And as of today, the master branch
supports showing the kind of timestamp differences you reported in
human-readable form, not just as a difference in the raw data:

$ diff-zip-meta foo.zip bar.zip
--- foo.zip
+++ bar.zip
entry foo:
  extra (entry):
-   55540900035164ea655164ea6575780b000104e80304e803
+   55540900035164ea655464ea6575780b000104e80304e803
- extra (entry) atime=2024-03-08 01:05:21
+ extra (entry) atime=2024-03-08 01:05:24

- Fay

[1] https://github.com/obfusk/reproducible-apk-tools#diff-zip-metapy


Re: Three bytes in a zip file

2023-04-07 Thread Bernhard M. Wiedemann via rb-general



On 06/04/2023 10.28, Larry Doolittle wrote:

I'm trying to make a process to generate byte-for-byte reproducible zip files.


Try adding the -X option to the zip call.
It will suppress adding of extended attributes (atime/ctime).
And with
https://github.com/distropatches/zip/commit/501ae4e93fd6fa2f7d20d00d1b011f9006802eae
it will also normalize mtime.


Ciao
Bernhard M.


OpenPGP_signature
Description: OpenPGP digital signature


Re: Three bytes in a zip file

2023-04-07 Thread Larry Doolittle
Michael -

On Fri, Apr 07, 2023 at 01:31:24PM +0200, Michael Schierl wrote:
> Larry's script already called touch immediately before zip. But I assume
> the nature of atime can mean that any other process may have "won the
> race" and accessed the file just in between these two lines.

That's my working assumption.  Also, the irreproducibility
did not reproduce.  Maybe it _is_ velocity-dependent.  ;-)

> The DOS timestamps encode only mtime, and not ctime or atime.

It does seem simpler and more reliable to keep atime out of it.

On Fri, Apr 07, 2023 at 01:25:04PM +0200, Michael Schierl wrote:
> When I distribute
> ZIP files, I often touch all files to UNIX epoch anyway as I don't want
> to leak the exact time I have built/compiled them.

Right.  Except the time I set them to here is the time of
the source git commit (SOURCE_DATE_EPOCH).

> Another option would be to use an UTC-12:00 timezone like TZ=Etc/GMT+12
> for building the .zip file to ensure the files are "old enough" for
> every place in the world.

That sounds too unexpected to me.  I'll stick to UTC.
In practice, I bet nobody will notice, and if they do, it's easy to explain.

  - Larry


Re: Three bytes in a zip file

2023-04-07 Thread Michael Schierl

Hello John,


Am 07.04.2023 um 03:56 schrieb John Gilmore:

Larry Doolittle  wrote:

$ diff <(ls --full-time -u fab-ea2bb52c-ld) <(ls --full-time -u fab-ea2bb52c-mb)
22c22
< -rw-r--r-- 1 redacted redacted  644661 2023-04-04 18:10:00.0 -0700 
marble-ipc-d-356.txt
---

-rw-r--r-- 1 redacted redacted  644661 2023-04-06 00:25:03.0 -0700 
marble-ipc-d-356.txt


So I'm guessing that even before the zip file is re-created, the rebuild
process is leaking the rebuild timestamp into the last-modified metadata
of the generated marble-ipc-d-356.txt file?


atime is not the same as mtime. -u switch shows atime.


That seems like it should
be handled by the build process explicitly setting its timestamp to
something related to the last-source-code-checkin time (with "touch
--date=XXX") rather than to current time.


Larry's script already called touch immediately before zip. But I assume
the nature of atime can mean that any other process may have "won the
race" and accessed the file just in between these two lines.


Truncating the timestamps to DOS timestamps wouldn't work to eliminate
this difference anyway, since the date in the two files is two days
different; DOS timestamps are accurate to 2 seconds, as I recall.


The DOS timestamps encode only mtime, and not ctime or atime.


Regards,


Michael



Re: Three bytes in a zip file

2023-04-07 Thread Michael Schierl

Hello,


Am 06.04.2023 um 23:59 schrieb Larry Doolittle:


Do you know of any tooling that can help decode zip file contents in general?
Ideally something that could be absorbed into diffoscope?
Maybe that one-liner above would be a useful addition to diffoscope.


I don't know.

I would assume that the usual commercial reverse engineering or forensic
applications would also include a dissector for .zip files, but those
could probably not be included into diffoscope anyway.


I took a quick look for the documentation you quoted.
That's proginfo/extrafld.txt in Debian's zip source package, right?


I used 
(yes I am oldschool and still have that old reference documentation on
my hard disk :-D).

But the documentation you quoted looks more recent and contains way more
extra fields than the "old" Info-Zip document. Probably I'll refer to it
in the future :-)


It sure looks reverse-engineered.  I guess I shouldn't expect anything
different for a package where upstream source ends in 2008.  :-/


... implementing a previously proprietary file format from the '80s.


Bad: the only time stamps left in the file are DOS-style implied-local-
timezone.  So a zip file prepared with TZ=UTC (as needed for reproducibility)
will unpack to files with future timestamps (if unpacked shortly after being 
created)
for non-expert users in half the globe.


Assuming you have "real" timestamps in your ZIP files. When I distribute
ZIP files, I often touch all files to UNIX epoch anyway as I don't want
to leak the exact time I have built/compiled them. But YMMV.

Another option would be to use an UTC-12:00 timezone like TZ=Etc/GMT+12
for building the .zip file to ensure the files are "old enough" for
every place in the world.


Regards,


Michael


Re: Three bytes in a zip file

2023-04-06 Thread John Gilmore
Larry Doolittle  wrote:
> $ diff <(ls --full-time -u fab-ea2bb52c-ld) <(ls --full-time -u 
> fab-ea2bb52c-mb)
> 22c22
> < -rw-r--r-- 1 redacted redacted  644661 2023-04-04 18:10:00.0 -0700 
> marble-ipc-d-356.txt
> ---
> > -rw-r--r-- 1 redacted redacted  644661 2023-04-06 00:25:03.0 -0700 
> > marble-ipc-d-356.txt

So I'm guessing that even before the zip file is re-created, the rebuild
process is leaking the rebuild timestamp into the last-modified metadata
of the generated marble-ipc-d-356.txt file?  That seems like it should
be handled by the build process explicitly setting its timestamp to
something related to the last-source-code-checkin time (with "touch
--date=XXX") rather than to current time.

Truncating the timestamps to DOS timestamps wouldn't work to eliminate
this difference anyway, since the date in the two files is two days
different; DOS timestamps are accurate to 2 seconds, as I recall.

John



Re: Three bytes in a zip file

2023-04-06 Thread Larry Doolittle
Michael -

On Thu, Apr 06, 2023 at 12:11:38PM +0200, Michael Schierl wrote:
> Am 06.04.2023 um 10:28 schrieb Larry Doolittle:
> > I'm trying to make a process to generate byte-for-byte reproducible zip 
> > files.
> > I got the contents identical, including timestamps and permissions.
> > But three bytes at the 98.08% mark (bytes 5543078 to 5543081,
> > out of a file size 5651451) differ between my run and a friend's run.
> 
> Looking at the zip entry starting at 0x00549481:
> [...] 
> Let's dissect the fields:
> ID 0x5455 ("UT") Length 0x0009 Data 03 68 ca 2c 64 XX XX XX 64
> ID 0x7875 ("ux") Length 0x000b Data 01 04 e8 03 00 00 04 e8 03 00 00
> 0x5455 is Info-Zip's "extended timestamp" field:
> [...]
> As the flags are 03, mod time and access time are present, and the
> different bits are within access time.

Thanks!  That helps a lot.

If I'm careful, I can even see the difference between the two zip files
by unpacking and
$ diff <(ls --full-time -u fab-ea2bb52c-ld) <(ls --full-time -u fab-ea2bb52c-mb)
22c22
< -rw-r--r-- 1 redacted redacted  644661 2023-04-04 18:10:00.0 -0700 
marble-ipc-d-356.txt
---
> -rw-r--r-- 1 redacted redacted  644661 2023-04-06 00:25:03.0 -0700 
> marble-ipc-d-356.txt

Do you know of any tooling that can help decode zip file contents in general?
Ideally something that could be absorbed into diffoscope?
Maybe that one-liner above would be a useful addition to diffoscope.

I took a quick look for the documentation you quoted.
That's proginfo/extrafld.txt in Debian's zip source package, right?
It sure looks reverse-engineered.  I guess I shouldn't expect anything
different for a package where upstream source ends in 2008.  :-/

> I have no experience with the various zip tools used on Unix/Linux, but
> probably you can avoid including those extra fields by using the -X option.

Good: smaller file

Good: less to go wrong with reproducibility

Bad: the only time stamps left in the file are DOS-style implied-local-
timezone.  So a zip file prepared with TZ=UTC (as needed for reproducibility)
will unpack to files with future timestamps (if unpacked shortly after being 
created)
for non-expert users in half the globe.
The correct unpacking instruction on *nix to avoid that becomes
  TZ=UTC unzip foo.zip

Again, thanks for your prompt and constructive response!

  - Larry


Re: Three bytes in a zip file

2023-04-06 Thread Michael Schierl

Hello,


Am 06.04.2023 um 10:28 schrieb Larry Doolittle:

Friends -

I'm trying to make a process to generate byte-for-byte reproducible zip files.

I got the contents identical, including timestamps and permissions.
But three bytes at the 98.08% mark (bytes 5543078 to 5543081,
out of a file size 5651451) differ between my run and a friend's run.
Velocity-dependent?  His was done on a train.  ;-)



Any zip file format experts here, who can explain where this comes from?
And more importantly, can suggest how to fix the environment to prevent it?



Looking at the zip entry starting at 0x00549481:


| 00549481 18 00 1c 00 66 61 62  2f 6d 61 72 62 6c 65 2d  | fab/marble-|
| 00549490  69 70 63 2d 64 2d 33 35  36 2e 74 78 74 55 54 09  |ipc-d-356.txtUT.|
| 005494a0  00 03 68 ca 2c 64 XX XX  XX 64 75 78 0b 00 01 04  |..h.,dXXXdux|
| 005494b0  e8 03 00 00 04 e8 03 00  00 d4 fd 4b 73 5d bb ae  |...Ks]..|


18 00File name length (0x0018)
1c 00File extra data length (0x001c)

File name: fab/marble-ipc-d-356.txt

File extra data:


| 0054949d  55 54 09  | UT.|
| 005494a0  00 03 68 ca 2c 64 XX XX  XX 64 75 78 0b 00 01 04  |..h.,dXXXdux|
| 005494b0  e8 03 00 00 04 e8 03 00  00   |.   |


What follows is the compressed data.

Extra data consists of multiple fields. Each field starts with a 2-byte
ID, followed by a 2-byte length, and ends with the data.

Let's dissect the fields:

ID 0x5455 ("UT") Length 0x0009 Data 03 68 ca 2c 64 XX XX XX 64
ID 0x7875 ("ux") Length 0x000b Data 01 04 e8 03 00 00 04 e8 03 00 00

0x5455 is Info-Zip's "extended timestamp" field:


Extended Timestamp Extra Field:
==

The following is the layout of the extended-timestamp extra block.
(Last Revision 970118)

Local-header version:

Value   SizeDescription
-   ---
0x5455  Short   tag for this extra block type
TSize   Short   total data size for this block
Flags   Byteinfo bits
(ModTime)   Longtime of last modification (UTC/GMT)
(AcTime)Longtime of last access (UTC/GMT)
(CrTime)Longtime of original creation (UTC/GMT)



If "Flags" indicates that Modtime is present in the local header
field, it MUST be present in the central header field, too!
This correspondence is required because the modification time
value may be used to support trans-timezone freshening and
updating operations with zip archives.

The time values are in standard Unix signed-long format, indicating
the number of seconds since 1 January 1970 00:00:00.  The times
are relative to Coordinated Universal Time (UTC), also sometimes
referred to as Greenwich Mean Time (GMT).  To convert to local time,
the software must know the local timezone offset from UTC/GMT.

The lower three bits of Flags in both headers indicate which time-
stamps are present in the LOCAL extra field:

bit 0   if set, modification time is present
bit 1   if set, access time is present
bit 2   if set, creation time is present
bits 3-7reserved for additional timestamps; not set

Those times that are present will appear in the order indicated, but
any combination of times may be omitted.  (Creation time may be
present without access time, for example.)  TSize should equal
(1 + 4*(number of set bits in Flags)), as the block is currently
defined.  Other timestamps may be added in the future.


As the flags are 03, mod time and access time are present, and the
different bits are within access time.


I have no experience with the various zip tools used on Unix/Linux, but
probably you can avoid including those extra fields by using the -X option.


Regards,


Michael



Re: Three bytes in a zip file

2023-04-06 Thread Chris Lamb
Hi Larry,

> TZ=UTC zip --latest-time "$zipfile" fab/*
  ^

Just as a quick glance, this may be expanding the '*' shell glob in
a different order between your two systems due to the underlying
filesystem order. Perhaps try "zip […] -r […] fab/"?

> The diff is so small, it seems silly to post both files, but I'll
> do that anyway.

(Assuming I'm parsing this right, I think you forgot to attach or
link them.)


Regards,

-- 
  o
⬋   ⬊  Chris Lamb
   o o reproducible-builds.org 💠
⬊   ⬋
  o



Three bytes in a zip file

2023-04-06 Thread Larry Doolittle
Friends -

I'm trying to make a process to generate byte-for-byte reproducible zip files.

I got the contents identical, including timestamps and permissions.
But three bytes at the 98.08% mark (bytes 5543078 to 5543081,
out of a file size 5651451) differ between my run and a friend's run.
Velocity-dependent?  His was done on a train.  ;-)

try.diffoscope.org is no help.
"Format-specific differences are supported for ZIP archives but no 
file-specific differences were detected; falling back to a binary diff."

I can get the same info as provided by diffoscope with
$ diff <(hexdump marble-ea2bb52c-mb-fab.zip) <(hexdump 
marble-ea2bb52c-ld-fab.zip)
346443c346443
< 05494a0 0300 ca68 642c 73cf 642e 7875 000b 0401
---
> 05494a0 0300 ca68 642c ca68 642c 7875 000b 0401

That is, 73cf642e becomes ca68642c.

The diff is so small, it seems silly to post both files, but I'll
do that anyway.
7cbdcc8b2fed002ed73017ff55e574b654fb82d061658534b4287de22339df64  
marble-ea2bb52c-ld-fab.zip
573fe7e8cb662fb3e22e16c1ab4d3520f8275a0ab3dd2064df841e108a08af0e  
marble-ea2bb52c-mb-fab.zip
http://recycle.lbl.gov/~ldoolitt/marble-ea2bb52c-ld-fab.zip
http://recycle.lbl.gov/~ldoolitt/marble-ea2bb52c-mb-fab.zip

Any zip file format experts here, who can explain where this comes from?
And more importantly, can suggest how to fix the environment to prevent it?

The script making this file is at
https://github.com/BerkeleyLab/Marble/blob/main/design/scripts/manufacturing.sh
but because I got the _contents_ to match already, I assert
the only important lines for the purposes of this question are

export LC_COLLATE=C
umask 0022
touch --date="@$SOURCE_DATE_EPOCH" fab/*
TZ=UTC zip --latest-time "$zipfile" fab/*

Side note, the "ea2bb52c" in the file names above refers
to the commit ID in the github repo.

  - Larry