Re: Change to tarball generation?

2012-06-21 Thread Nicolás Alvarez
2012/5/23 Michael Pyne :
> As an example, try:
>
> $ tar cf kdefoo-x.y.z.tar kdefoo-x.y.z/
> $ pixz kdefoo-x.y.z.tar
> # resulting in kdefoo-x.y.z.tar.xz
>
> Because pixz is parallelized it works on whole blocks of data at a time and as
> far as I can tell makes no special provision for the last bits of compressed
> data being smaller than the block size.
>
> With a normal tar file the decompressed data you get is:
>
> 0*  (where * is end of data and end of file)
>
> With a pixz-encoded tar file the decompressed data you get is:
>
> 0*x$  (* is end of data, $ is end of file)
>
> When you run a command like "tar xfJ kdefoo-x.y.z.tar.xz" everything will
> still work fine: tar knows exactly where the data should really end and will
> stop decompressing when it needs to.
>
> When you run a pipeline like "xz --decompress kdefoo-x.y.z.tar.xz | tar xf -"
> though, there's no way to tell xz to stop decompressing early. It tries to
> write all the decompressed data to the pipe. tar still knows exactly where to
> stop, and does so at the '*', not the '$', and closes its input (a pipe!)
> early.
>
> When xz tries to write the 'x$' (garble data) of the decompressed output it
> gets sent to a now-broken pipe, which kills xz on SIGPIPE.
>
> Scripts trying to drive automated extraction of that data using a pipeline
> just see that an error occurred, and will therefore abort. This has affected a
> couple of distributions that are source-based, but is annoying even for those
> manually extracting to have to figure out that their tarball actually
> extracted correctly.
>
> So the problem is only parallelizing compressors that take advantage of the
> allowance to write garbled data past the end of a file and still have the
> decompressor "figure it out". It seems pretty implausible to me that a
> parallelizing compressor would always do this, perhaps this only occurs when
> the compressor is run with tar (e.g. tar cJf) instead of as a separate step?

The "garbled data" has nothing to do with parallelization. pixz stands
for "parallel and indexed xz". Apart from being parallel, it stores a
custom-formatted index at the end of the tarball, apparently to allow
random access.

I also noticed that pixz produces larger results than standard xz,
even when ignoring the extra index data. See:
http://article.gmane.org/gmane.comp.kde.releases/

Please do not use pixz for KDE tarballs again...

-- 
Nicolás
___
release-team mailing list
release-team@kde.org
https://mail.kde.org/mailman/listinfo/release-team


Re: Change to tarball generation?

2012-05-23 Thread Michael Pyne
On Wednesday, May 23, 2012 19:40:52 Allen Winter wrote:
> This whole thread is confusing me.
> 
> Maybe a command line would help?
> 
> Is this correct?
> % tar cvf kdefoo-x.y.z.tar 
> % xz kdefoo-xy.z.tar
> => resulting in kdefoo-x.y.z.tar.xz

That's fine.

> if not, please tell us what a command line should be
> 
> I take it from mpyne's original posting that:
> % tar Jcvf kdefoo-x.y.z.tar.xz 
> isn't the way to go??

That's actually fine too, as it turns out.

As an example, try:

$ tar cf kdefoo-x.y.z.tar kdefoo-x.y.z/
$ pixz kdefoo-x.y.z.tar
# resulting in kdefoo-x.y.z.tar.xz

Because pixz is parallelized it works on whole blocks of data at a time and as 
far as I can tell makes no special provision for the last bits of compressed 
data being smaller than the block size.

With a normal tar file the decompressed data you get is:

0*  (where * is end of data and end of file)

With a pixz-encoded tar file the decompressed data you get is:

0*x$  (* is end of data, $ is end of file)

When you run a command like "tar xfJ kdefoo-x.y.z.tar.xz" everything will 
still work fine: tar knows exactly where the data should really end and will 
stop decompressing when it needs to.

When you run a pipeline like "xz --decompress kdefoo-x.y.z.tar.xz | tar xf -" 
though, there's no way to tell xz to stop decompressing early. It tries to 
write all the decompressed data to the pipe. tar still knows exactly where to 
stop, and does so at the '*', not the '$', and closes its input (a pipe!) 
early.

When xz tries to write the 'x$' (garble data) of the decompressed output it 
gets sent to a now-broken pipe, which kills xz on SIGPIPE.

Scripts trying to drive automated extraction of that data using a pipeline 
just see that an error occurred, and will therefore abort. This has affected a 
couple of distributions that are source-based, but is annoying even for those 
manually extracting to have to figure out that their tarball actually 
extracted correctly.

So the problem is only parallelizing compressors that take advantage of the 
allowance to write garbled data past the end of a file and still have the 
decompressor "figure it out". It seems pretty implausible to me that a 
parallelizing compressor would always do this, perhaps this only occurs when 
the compressor is run with tar (e.g. tar cJf) instead of as a separate step?

I hope this makes more sense.

Regards,
 - Michael Pyne

signature.asc
Description: This is a digitally signed message part.
___
release-team mailing list
release-team@kde.org
https://mail.kde.org/mailman/listinfo/release-team


Re: Change to tarball generation?

2012-05-23 Thread Allen Winter
On Wednesday 23 May 2012 1:28:04 PM Wulf C. Krueger wrote:
> Hello Albert et al,
> 
> On 23.05.2012 18:58, Albert Astals Cid wrote:
> > The machine i'm generating the tarballs doesn't have pixz so i'll
> > use xz.
> 
> Even though it's somewhat coincidental ;): Thank you.
> 
> Personally, I think it would be an excellent idea to stick to the
> standard xz format until the non-standard extensions that were used
> for the last batch of tarballs get standardised themselves.
> 
> (Those extensions caused user-visible breakage at least on Exherbo
> Linux and Gentoo Linux.)
> 

This whole thread is confusing me.

Maybe a command line would help?

Is this correct?
% tar cvf kdefoo-x.y.z.tar 
% xz kdefoo-xy.z.tar
=> resulting in kdefoo-x.y.z.tar.xz

if not, please tell us what a command line should be

I take it from mpyne's original posting that:
% tar Jcvf kdefoo-x.y.z.tar.xz 
isn't the way to go??



___
release-team mailing list
release-team@kde.org
https://mail.kde.org/mailman/listinfo/release-team


Re: Change to tarball generation?

2012-05-23 Thread Wulf C. Krueger
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello Albert et al,

On 23.05.2012 18:58, Albert Astals Cid wrote:
> The machine i'm generating the tarballs doesn't have pixz so i'll
> use xz.

Even though it's somewhat coincidental ;): Thank you.

Personally, I think it would be an excellent idea to stick to the
standard xz format until the non-standard extensions that were used
for the last batch of tarballs get standardised themselves.

(Those extensions caused user-visible breakage at least on Exherbo
Linux and Gentoo Linux.)

- -- 
Best regards, Wulf
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk+9HiQACgkQnuVXRcSi+5pEuQCferJgyt0sn2pgnhdUcjM5JT0X
PI0AoMcVThO4OlNyW4Fx9JabGQYV7GFx
=mm5E
-END PGP SIGNATURE-
___
release-team mailing list
release-team@kde.org
https://mail.kde.org/mailman/listinfo/release-team


Re: Change to tarball generation?

2012-05-23 Thread Albert Astals Cid
El Dimarts, 22 de maig de 2012, a les 23:48:25, Michael Pyne va escriure:
> Hi all,
> 
> I noticed something while we were on the topic of tagging the beta tomorrow
> that I wanted to bring up, which is a concern with tarball generation.
> Specifically, the various parallizeable tarball generators (pixz, pbzip2)
> seem to generate extraneous data.
> 
> tar is smart enough to ignore this extra data, but this can affect
> decompressing our tarballs in a pipeline (i.e. xz --decompress
> kdelibs-4.foo.tar.xz | tar xf -), as tar closing its STDIN causes xz to
> write its excess data to a broken pipe.
> 
> This probably doesn't annoy a ton of different people (except for the
> obvious problem with source-based distros like Gentoo, e.g.
> https://bugs.gentoo.org/show_bug.cgi?id=410861) but if the speedup is not
> very substantial it would be better to use xz or bzip2 to avoid the problem
> entirely.
> 
> (This is done by adjusting the value of "compressors" in the pack release
> script in case you're wondering).

The machine i'm generating the tarballs doesn't have pixz so i'll use xz.

Albert

> 
> It might be possible to still get some concurrency benefit by batching up
> modules to "pack" and then running 4 or 8 (or however many CPUs are around)
> separate pack scripts at once, or fire off a pack while starting on tagging
> the next module, etc.
> 
> Thoughts? I'll be very clear that I don't think this should affect creating
> the beta tarballs at all but if we choose to avoid the parallelizing
> compressors hopefully that would be in time for the release candidates.
> 
> Regards,
>  - Michael Pyne
___
release-team mailing list
release-team@kde.org
https://mail.kde.org/mailman/listinfo/release-team


Change to tarball generation?

2012-05-22 Thread Michael Pyne
Hi all,

I noticed something while we were on the topic of tagging the beta tomorrow 
that I wanted to bring up, which is a concern with tarball generation. 
Specifically, the various parallizeable tarball generators (pixz, pbzip2) seem 
to generate extraneous data.

tar is smart enough to ignore this extra data, but this can affect 
decompressing our tarballs in a pipeline (i.e. xz --decompress 
kdelibs-4.foo.tar.xz | tar xf -), as tar closing its STDIN causes xz to write 
its excess data to a broken pipe.

This probably doesn't annoy a ton of different people (except for the obvious 
problem with source-based distros like Gentoo, e.g. 
https://bugs.gentoo.org/show_bug.cgi?id=410861) but if the speedup is not very 
substantial it would be better to use xz or bzip2 to avoid the problem 
entirely.

(This is done by adjusting the value of "compressors" in the pack release 
script in case you're wondering).

It might be possible to still get some concurrency benefit by batching up 
modules to "pack" and then running 4 or 8 (or however many CPUs are around) 
separate pack scripts at once, or fire off a pack while starting on tagging 
the next module, etc.

Thoughts? I'll be very clear that I don't think this should affect creating 
the beta tarballs at all but if we choose to avoid the parallelizing 
compressors hopefully that would be in time for the release candidates.

Regards,
 - Michael Pyne

signature.asc
Description: This is a digitally signed message part.
___
release-team mailing list
release-team@kde.org
https://mail.kde.org/mailman/listinfo/release-team