Re: [bug] gzip archives created with pkg_create have wrong data sizes

2016-04-25 Thread Todd C. Miller
On Mon, 25 Apr 2016 18:24:53 -0400, dan mclaughlin wrote:

> is it? one of the reasons i brought it up is that in gzip(1) it seems pretty
> clear that it is supposed to be the file size 

The original file size (and thus compression ratio) is not stored
in the gzip archive, it is computed by gzip(1) on the fly by
decompressing the file and keeping track of how many bytes were
read vs. how many would have been written.

 - todd



Re: [bug] gzip archives created with pkg_create have wrong data sizes

2016-04-25 Thread dan mclaughlin
On Sun, 24 Apr 2016 12:57:46 +0200 Marc Espie  wrote:
> On Sun, Apr 24, 2016 at 01:47:24AM -0400, dan mclaughlin wrote:
> > the sizes of the compressed/uncompressed data are wrong. i have tested gzip
> > and 'tar zcf' and the values are right, but using pkg_create fails.
> 
> gzip -l will just give you the first chunk, that's a limitation of the gzip
> tool itself.
> 

is it? one of the reasons i brought it up is that in gzip(1) it seems pretty
clear that it is supposed to be the file size 

 -l  List information for the specified compressed files.  The
 following information is listed:

 compressed sizeSize of the compressed file.

 uncompressed size  Size of the file when uncompressed.

 compression ratio  Ratio of the difference between the compressed
and uncompressed sizes to the uncompressed
size.

so perhaps the manual needs to be adjusted.

On Mon, 25 Apr 2016 11:19:45 -0400 "Ted Unangst"  wrote:
> Or we rename the files .pkg and nobody pokes them with the wrong tool. :)

how about setting the final values to 0 (or some other number), so that if
someone did look they would see that it is obviously not a standard gzip, and
that the values cannot be trusted. as it is, it has values that at least
conflict with what gzip(1) says they should be.

those values are also printed out with the assumption that gzip(1) is right.

$ gzip -vd bzip2-1.0.6p7.tgz
bzip2-1.0.6p7.tgz:  96.0% -- replaced with bzip2-1.0.6p7.tar
322 bytes in, 7680 bytes out



Re: [bug] gzip archives created with pkg_create have wrong data sizes

2016-04-25 Thread Marc Espie
On Mon, Apr 25, 2016 at 11:19:45AM -0400, Ted Unangst wrote:
> Marc Espie wrote:
> > On Sun, Apr 24, 2016 at 12:57:46PM +0200, Marc Espie wrote:
> > > On Sun, Apr 24, 2016 at 01:47:24AM -0400, dan mclaughlin wrote:
> > > > the sizes of the compressed/uncompressed data are wrong. i have tested 
> > > > gzip
> > > > and 'tar zcf' and the values are right, but using pkg_create fails.
> > > 
> > > gzip -l will just give you the first chunk, that's a limitation of the 
> > > gzip
> > > tool itself.
> > 
> > I've had a slightly closer look at gzip...
> > 
> > making gzip -l able to recognize multiple chunks archive should be doable,
> > but it would require a lot of code churn.
> 
> Or we rename the files .pkg and nobody pokes them with the wrong tool. :)

1/ we're not debian.
2/ they're perfectly agreeable gzip files. It's truely a limitation of
gzip(1).




Re: [bug] gzip archives created with pkg_create have wrong data sizes

2016-04-25 Thread Ted Unangst
Marc Espie wrote:
> On Sun, Apr 24, 2016 at 12:57:46PM +0200, Marc Espie wrote:
> > On Sun, Apr 24, 2016 at 01:47:24AM -0400, dan mclaughlin wrote:
> > > the sizes of the compressed/uncompressed data are wrong. i have tested 
> > > gzip
> > > and 'tar zcf' and the values are right, but using pkg_create fails.
> > 
> > gzip -l will just give you the first chunk, that's a limitation of the gzip
> > tool itself.
> 
> I've had a slightly closer look at gzip...
> 
> making gzip -l able to recognize multiple chunks archive should be doable,
> but it would require a lot of code churn.

Or we rename the files .pkg and nobody pokes them with the wrong tool. :)



Re: [bug] gzip archives created with pkg_create have wrong data sizes

2016-04-25 Thread Marc Espie
On Sun, Apr 24, 2016 at 12:57:46PM +0200, Marc Espie wrote:
> On Sun, Apr 24, 2016 at 01:47:24AM -0400, dan mclaughlin wrote:
> > the sizes of the compressed/uncompressed data are wrong. i have tested gzip
> > and 'tar zcf' and the values are right, but using pkg_create fails.
> 
> gzip -l will just give you the first chunk, that's a limitation of the gzip
> tool itself.

I've had a slightly closer look at gzip...

making gzip -l able to recognize multiple chunks archive should be doable,
but it would require a lot of code churn.

More precisely, the gz_read code has a check that we arrived at the end,
it tries to read a new header, and it keeps going if it can.

So this would require seeking on the input file, trying to read a new header
and displaying it.

I'm pretty sure it's not worth it.


if you need looking at chunked tarballs further, there's some code in
regress/usr.sbin/pkg_add/extract_chunks that does precisely that: look
at the actual boundaries, and uncompress each chunk separately.



Re: [bug] gzip archives created with pkg_create have wrong data sizes

2016-04-24 Thread Marc Espie
On Sun, Apr 24, 2016 at 01:47:24AM -0400, dan mclaughlin wrote:
> the sizes of the compressed/uncompressed data are wrong. i have tested gzip
> and 'tar zcf' and the values are right, but using pkg_create fails.

gzip -l will just give you the first chunk, that's a limitation of the gzip
tool itself.

That could probably get fixed, but it's not that annoying.


pkg_create files are a succession of gzip chunks, for two reasons:
1/ putting the plist in its separate chunk makes pkg_sign drastically faster,
as it doesn't have to uncompress/recompress gzip files.
2/ files are ordered from last changed to least changed, and put into chunks
of 8 files, starting at the end, making it possible for rsync to perform its
magic on compressed packages, since the ending chunks do not change at all.

The actual uncompressed size of each package can be obtained with
pkg_info -s.



Re: [bug] gzip archives created with pkg_create have wrong data sizes

2016-04-24 Thread Stuart Henderson
On 2016/04/24 01:47, dan mclaughlin wrote:
> the sizes of the compressed/uncompressed data are wrong. i have tested gzip
> and 'tar zcf' and the values are right, but using pkg_create fails.

The gzip stream is broken into chunks for more efficient package
signing and to improve rsync-friendliness. See e.g.
http://anoncvs.spacehopper.org/openbsd-src/commit/?id=86ace4402e1421117708700d6f0ef008e0bee8b6



[bug] gzip archives created with pkg_create have wrong data sizes

2016-04-23 Thread dan mclaughlin
the sizes of the compressed/uncompressed data are wrong. i have tested gzip
and 'tar zcf' and the values are right, but using pkg_create fails.

$ sysctl hw.machine kern.version
hw.machine=i386
kern.version=OpenBSD 5.9-current (GENERIC) #0: Thu Apr  7 17:24:30 EDT 2016
build@node04:/usr/src/sys/arch/i386/compile/GENERIC

it's not just i386 specific though, since i tested the amd64 packages as well.


$ ftp ftp://ftp3.usa.openbsd.org/pub/OpenBSD/snapshots/packages/i386/bzip2...
...
Retrieving pub/OpenBSD/snapshots/packages/i386/bzip2-1.0.6p7.tgz
...
$ ls -l bzip2-1.0.6p7.tgz
-rw-r--r--  1 user  user  125979 Apr 23 12:19 bzip2-1.0.6p7.tgz
$ gzip -l bzip2-1.0.6p7.tgz
compressed  uncompressed  ratio  uncompressed_name
   322  7680  96.0%  bzip2-1.0.6p7.tar
$ gzip -vd bzip2-1.0.6p7.tgz
bzip2-1.0.6p7.tgz:  96.0% -- replaced with bzip2-1.0.6p7.tar
322 bytes in, 7680 bytes out
$ ls -l bzip2-1.0.6p7.tar
-rw-r--r--  1 user  user  375808 Apr 23 12:19 bzip2-1.0.6p7.tar
$ gzip -v bzip2-1.0.6p7.tar
bzip2-1.0.6p7.tar:  66.6% -- replaced with bzip2-1.0.6p7.tar.gz
375808 bytes in, 125704 bytes out
$ gzip -l bzip2-1.0.6p7.tar.gz
compressed  uncompressed  ratio  uncompressed_name
125704375808  66.6%  bzip2-1.0.6p7.tar

$ pkg_create -f /var/db/pkg/bzip2-1.0.6p7/+CONTENTS
$ ls -l bzip2-1.0.6p7.tgz
-rw-r--r--  1 user  user  125891 Apr 24 00:36 bzip2-1.0.6p7.tgz
$ gzip -l bzip2-1.0.6p7.tgz
compressed  uncompressed  ratio  uncompressed_name
   319  7680  96.1%  bzip2-1.0.6p7.tar

$ tar zcf test.tgz -I list
and
$ tar cf - -I list | gzip -c >test.tgz

give the expected correct results.

i tried to track it a bit further

$ grep gzip /usr/src/usr.sbin/pkg_add/OpenBSD/*
/usr/src/usr.sbin/pkg_add/OpenBSD/Paths.pm:sub gzip() { '/usr/bin/gzip' }
/usr/src/usr.sbin/pkg_add/OpenBSD/PkgCreate.pm: $state->say("Creating gzip'd 
tar ball in '#1'", $wname)

but perl's not really my thing and so i really know where to go from here.