Re: timestamps for sqlports-compact (was: Re: CVS: cvs.openbsd.org: src)

2015-03-28 Thread Marc Espie
On Mon, Mar 23, 2015 at 07:42:02PM -0700, Constantine A. Murenin wrote:
> On 23 March 2015 at 15:59, Stuart Henderson  wrote:
> > On 2015/03/23 10:14, Constantine A. Murenin wrote:
> >> May I also ask why is it necessary to remove the timestamp information
> >> from the tar archives themselves?
> >
> > To improve rsyncability.
> 
> Could you elaborate?
> 
> 0. Doesn't rsync ignore timestamps by default anyways?
> 
> 1. Doesn't src/usr.sbin/pkg_add/OpenBSD/ArcCheck.pm#rev1.29 wipe out
> the timestamps only directly from the tar archives, still leaving them
> intact otherwise?
> 
> C.

Dude.  Not having timestamps means that when files don't change, the archive
chunk doesn't change.

We do gzip them by chunks as well, so when a big package like texlive gets
updated, a lot of time, most of the actual .tgz package file *doesn't change 
at all*.

Put back timestamps in the tarball, and gzip will compress things differently,
thus destroying rsyncability completely.



Re: timestamps for sqlports-compact (was: Re: CVS: cvs.openbsd.org: src)

2015-03-24 Thread Stuart Henderson
On 2015-03-24, Constantine A. Murenin  wrote:
> On 23 March 2015 at 15:59, Stuart Henderson  wrote:
>> On 2015/03/23 10:14, Constantine A. Murenin wrote:
>>> May I also ask why is it necessary to remove the timestamp information
>>> from the tar archives themselves?
>>
>> To improve rsyncability.
>
> Could you elaborate?
>
> 0. Doesn't rsync ignore timestamps by default anyways?
>
> 1. Doesn't src/usr.sbin/pkg_add/OpenBSD/ArcCheck.pm#rev1.29 wipe out
> the timestamps only directly from the tar archives, still leaving them
> intact otherwise?

You're thinking of timestamps of the tgz here. This is about something
else: file *contents*.

Packages these days do some smart ordering of files (so that the
transfer can be stopped before downloading everything, in the case of
an update where binaries change but docs/data files stay the same), to
reduce pkg_add time. They also restart the compression stream at various
points (i.e new dictionary) to improve the chance of rsync finding
common parts between package files. Avoiding timestamps in the tar
removes one thing that might change between one build and the next. The
packing list (+CONTENTS file) which has the real timestamps (and the
signature etc) is in a different compression stream than the main files
in the package.
the main files.



Re: timestamps for sqlports-compact (was: Re: CVS: cvs.openbsd.org: src)

2015-03-23 Thread Constantine A. Murenin
On 23 March 2015 at 15:59, Stuart Henderson  wrote:
> On 2015/03/23 10:14, Constantine A. Murenin wrote:
>> May I also ask why is it necessary to remove the timestamp information
>> from the tar archives themselves?
>
> To improve rsyncability.

Could you elaborate?

0. Doesn't rsync ignore timestamps by default anyways?

1. Doesn't src/usr.sbin/pkg_add/OpenBSD/ArcCheck.pm#rev1.29 wipe out
the timestamps only directly from the tar archives, still leaving them
intact otherwise?

C.



Re: timestamps for sqlports-compact (was: Re: CVS: cvs.openbsd.org: src)

2015-03-23 Thread Stuart Henderson
On 2015/03/23 10:14, Constantine A. Murenin wrote:
> May I also ask why is it necessary to remove the timestamp information
> from the tar archives themselves?

To improve rsyncability.



Re: timestamps for sqlports-compact (was: Re: CVS: cvs.openbsd.org: src)

2015-03-23 Thread Constantine A. Murenin
On 14 March 2015 at 02:28, Marc Espie  wrote:
> On Fri, Mar 13, 2015 at 01:13:39PM -0700, Constantine A. Murenin wrote:
>> Hello,
>>
>> The commit below from 2014-09-16 must have broken the snapshot time
>> detection on http://ports.su/ , which must have been broken since
>> 2014-09-21.
>>
>> Is there a cross-platform way to best get it back from the package file?
> Yes, parse the +CONTENTS file and restore the associated @ts.
>
> Any tool that reads files can do so easily. In perl,
> perl -ne '$seen = 1 if m/sqlports-compact/; if ($seen == 1 && m/\@ts\s+(.*)/) 
> { system("date -r $1"); exit 0; }' /var/db/pkg/sqlports-compact-*/+CONTENTS
> will retrieve and display the timestamp.

May I also ask why is it necessary to remove the timestamp information
from the tar archives themselves?

I can think of at least one other scenario where the files from the
package might be of interest without http://mdoc.su/o/pkg_add.1 --
harvesting the man-pages from the packages.

Perhaps the change of deleting mtime from tar could be undone going
forward?  Especially since mtime already appears to be required as
part of the header of the tar format, http://mdoc.su/f/tar.5, thus not
even occupying any extra space at all, apart from the possible
compression considerations.

What do we gain by losing the ability to use the normal and
interoperable UNIX tools to examine the archive files of the packages?

>
> Something similar can be easily cobbled using any kind of sed/awk/grep
>
>
>> I guess the easiest way would be to fix the date with the snippet as
>> follows; or is there a better way?
>>
>> %sh -c 'touch -d $(env TZ=GMT date -r $(stat -f"%m" +CONTENTS)
>> +%FT%TZ) share/sqlports-compact'
>
> You assume sqlports-compacts should/can have the snapshots of +CONTENTS... 
> this is not quite true.

Would any difference be material here?  It seems that it's basically
the same date, as far as estimating the time of the snapshot is
concerned, which in reality is more of an undefined period of multiple
hours anyways.

Cheers,
Constantine.



Re: timestamps for sqlports-compact (was: Re: CVS: cvs.openbsd.org: src)

2015-03-14 Thread Marc Espie
On Fri, Mar 13, 2015 at 01:13:39PM -0700, Constantine A. Murenin wrote:
> Hello,
> 
> The commit below from 2014-09-16 must have broken the snapshot time
> detection on http://ports.su/ , which must have been broken since
> 2014-09-21.
> 
> Is there a cross-platform way to best get it back from the package file?
Yes, parse the +CONTENTS file and restore the associated @ts.

Any tool that reads files can do so easily. In perl,
perl -ne '$seen = 1 if m/sqlports-compact/; if ($seen == 1 && m/\@ts\s+(.*)/) { 
system("date -r $1"); exit 0; }' /var/db/pkg/sqlports-compact-*/+CONTENTS
will retrieve and display the timestamp.

Something similar can be easily cobbled using any kind of sed/awk/grep


> I guess the easiest way would be to fix the date with the snippet as
> follows; or is there a better way?
> 
> %sh -c 'touch -d $(env TZ=GMT date -r $(stat -f"%m" +CONTENTS)
> +%FT%TZ) share/sqlports-compact'

You assume sqlports-compacts should/can have the snapshots of +CONTENTS... this 
is not quite true.



timestamps for sqlports-compact (was: Re: CVS: cvs.openbsd.org: src)

2015-03-13 Thread Constantine A. Murenin
Hello,

The commit below from 2014-09-16 must have broken the snapshot time
detection on http://ports.su/ , which must have been broken since
2014-09-21.

Is there a cross-platform way to best get it back from the package file?

I see that the timestamps are now embedded within "+CONTENTS", and
both "+CONTENTS" and "+DESC" themselves have the correct timestamps,
too, just not "share/sqlports-compact".

>From the reversing as per below, it seems like there are 4 sources for
two timestamps:

2014-09-18T08:47Z
+DESC
+CONTENTS /@ts

2014-09-18T12:17Z
+CONTENTS
+CONTENTS /signify:

I guess the easiest way would be to fix the date with the snippet as
follows; or is there a better way?

%sh -c 'touch -d $(env TZ=GMT date -r $(stat -f"%m" +CONTENTS)
+%FT%TZ) share/sqlports-compact'

Cheers,
Constantine.



  538 Sep 15 12:29 2014-09-20/+CONTENTS
 3936 Sep 14 07:49 2014-09-20/+DESC
  36918272 Sep 14 07:49 2014-09-20/share/sqlports-compact

  553 Sep 19 05:17 2014-09-21/+CONTENTS
 3936 Sep 18 01:47 2014-09-21/+DESC
  37057536 Dec 31  1969 2014-09-21/share/sqlports-compact



--- 2014-09-20/+CONTENTSMon Sep 15 12:29:20 2014
+++ 2014-09-21/+CONTENTSFri Sep 19 05:17:22 2014
@@ -1,7 +1,7 @@
 @comment $OpenBSD: PLIST-compact,v 1.2 2009/12/01 18:27:46 espie Exp $
 @name sqlports-compact-4.2
 @signer openbsd-56-pkg
-@digital-signature
signify:2014-09-15T19:29:20Z:RWSPEf7Vpp2j0NzTldW+gbzVKDKUDBNV8yr4qbGsLA1j4qi41qVwdNxzzy5+d4Z6ocIuxOXLGAZkL1rZzO9bIXt8vpm6ia6vxg8=
+@digital-signature
signify:2014-09-19T12:17:22Z:RWSPEf7Vpp2j0KKBEBZR4Q/ZNEDFC509xrP2LO6eVXgEzctauBidBAZl4/pMWVqfoG9JXABPiI+wlfZobwoZTxZikYLrX5HXRg0=
 @option always-update
 @comment pkgpath=databases/sqlports,-compact cdrom=yes ftp=yes
 @arch amd64
@@ -10,5 +10,6 @@
 @size 3936
 @cwd /usr/local
 share/sqlports-compact
-@sha E+dRAH5CB2qQd2ZPSRUgGL4uUKYd9WiQDjWkt9lpkP8=
-@size 36918272
+@sha QJrtFZ+Kp8m9as8c6iYSqUjytHlIvopDki/vTw7enlE=
+@size 37057536
+@ts 1411030040



% cat 2014-09-21/+CONTENTS
@comment $OpenBSD: PLIST-compact,v 1.2 2009/12/01 18:27:46 espie Exp $
@name sqlports-compact-4.2
@signer openbsd-56-pkg
@digital-signature
signify:2014-09-19T12:17:22Z:RWSPEf7Vpp2j0KKBEBZR4Q/ZNEDFC509xrP2LO6eVXgEzctauBidBAZl4/pMWVqfoG9JXABPiI+wlfZobwoZTxZikYLrX5HXRg0=
@option always-update
@comment pkgpath=databases/sqlports,-compact cdrom=yes ftp=yes
@arch amd64
+DESC
@sha NdJMzcyumfz+I9nbW0P/qfHc1CT7KcZcGIaCiolyRZo=
@size 3936
@cwd /usr/local
share/sqlports-compact
@sha QJrtFZ+Kp8m9as8c6iYSqUjytHlIvopDki/vTw7enlE=
@size 37057536
@ts 1411030040



% date -r 1411030040
Thu Sep 18 01:47:20 PDT 2014

% env TZ=GMT date -r 1411030040
Thu Sep 18 08:47:20 GMT 2014



% sh -c 'touch -d $(env TZ=GMT date -r $(stat -f"%m" +CONTENTS)
+%FT%TZ) share/sqlports-compact'

% stat share/sqlports-compact
... 37057536 "Sep 19 05:17:22 2014" "Sep 19 05:17:22 2014" "Mar 13
12:49:33 2015" 16384 72416 0 share/sqlports-compact



On 16 September 2014 at 01:51, Marc Espie  wrote:
> CVSROOT:/cvs
> Module name:src
> Changes by: es...@cvs.openbsd.org   2014/09/16 02:51:38
>
> Modified files:
> usr.sbin/pkg_add/OpenBSD: ArcCheck.pm
>
> Log message:
> if a @ts annotation is detected, wipe tarball timestamp. Archives will
> look as if all files were born in 1970. The useful side-effect is that
> meta-data for content-identical files WILL be identical as well.
>