Re: Change to tarball generation?
2012/5/23 Michael Pyne : > As an example, try: > > $ tar cf kdefoo-x.y.z.tar kdefoo-x.y.z/ > $ pixz kdefoo-x.y.z.tar > # resulting in kdefoo-x.y.z.tar.xz > > Because pixz is parallelized it works on whole blocks of data at a time and as > far as I can tell makes no special provision for the last bits of compressed > data being smaller than the block size. > > With a normal tar file the decompressed data you get is: > > 0* (where * is end of data and end of file) > > With a pixz-encoded tar file the decompressed data you get is: > > 0*x$ (* is end of data, $ is end of file) > > When you run a command like "tar xfJ kdefoo-x.y.z.tar.xz" everything will > still work fine: tar knows exactly where the data should really end and will > stop decompressing when it needs to. > > When you run a pipeline like "xz --decompress kdefoo-x.y.z.tar.xz | tar xf -" > though, there's no way to tell xz to stop decompressing early. It tries to > write all the decompressed data to the pipe. tar still knows exactly where to > stop, and does so at the '*', not the '$', and closes its input (a pipe!) > early. > > When xz tries to write the 'x$' (garble data) of the decompressed output it > gets sent to a now-broken pipe, which kills xz on SIGPIPE. > > Scripts trying to drive automated extraction of that data using a pipeline > just see that an error occurred, and will therefore abort. This has affected a > couple of distributions that are source-based, but is annoying even for those > manually extracting to have to figure out that their tarball actually > extracted correctly. > > So the problem is only parallelizing compressors that take advantage of the > allowance to write garbled data past the end of a file and still have the > decompressor "figure it out". It seems pretty implausible to me that a > parallelizing compressor would always do this, perhaps this only occurs when > the compressor is run with tar (e.g. tar cJf) instead of as a separate step? The "garbled data" has nothing to do with parallelization. pixz stands for "parallel and indexed xz". Apart from being parallel, it stores a custom-formatted index at the end of the tarball, apparently to allow random access. I also noticed that pixz produces larger results than standard xz, even when ignoring the extra index data. See: http://article.gmane.org/gmane.comp.kde.releases/ Please do not use pixz for KDE tarballs again... -- Nicolás ___ release-team mailing list release-team@kde.org https://mail.kde.org/mailman/listinfo/release-team
Re: Change to tarball generation?
On Wednesday, May 23, 2012 19:40:52 Allen Winter wrote: > This whole thread is confusing me. > > Maybe a command line would help? > > Is this correct? > % tar cvf kdefoo-x.y.z.tar > % xz kdefoo-xy.z.tar > => resulting in kdefoo-x.y.z.tar.xz That's fine. > if not, please tell us what a command line should be > > I take it from mpyne's original posting that: > % tar Jcvf kdefoo-x.y.z.tar.xz > isn't the way to go?? That's actually fine too, as it turns out. As an example, try: $ tar cf kdefoo-x.y.z.tar kdefoo-x.y.z/ $ pixz kdefoo-x.y.z.tar # resulting in kdefoo-x.y.z.tar.xz Because pixz is parallelized it works on whole blocks of data at a time and as far as I can tell makes no special provision for the last bits of compressed data being smaller than the block size. With a normal tar file the decompressed data you get is: 0* (where * is end of data and end of file) With a pixz-encoded tar file the decompressed data you get is: 0*x$ (* is end of data, $ is end of file) When you run a command like "tar xfJ kdefoo-x.y.z.tar.xz" everything will still work fine: tar knows exactly where the data should really end and will stop decompressing when it needs to. When you run a pipeline like "xz --decompress kdefoo-x.y.z.tar.xz | tar xf -" though, there's no way to tell xz to stop decompressing early. It tries to write all the decompressed data to the pipe. tar still knows exactly where to stop, and does so at the '*', not the '$', and closes its input (a pipe!) early. When xz tries to write the 'x$' (garble data) of the decompressed output it gets sent to a now-broken pipe, which kills xz on SIGPIPE. Scripts trying to drive automated extraction of that data using a pipeline just see that an error occurred, and will therefore abort. This has affected a couple of distributions that are source-based, but is annoying even for those manually extracting to have to figure out that their tarball actually extracted correctly. So the problem is only parallelizing compressors that take advantage of the allowance to write garbled data past the end of a file and still have the decompressor "figure it out". It seems pretty implausible to me that a parallelizing compressor would always do this, perhaps this only occurs when the compressor is run with tar (e.g. tar cJf) instead of as a separate step? I hope this makes more sense. Regards, - Michael Pyne signature.asc Description: This is a digitally signed message part. ___ release-team mailing list release-team@kde.org https://mail.kde.org/mailman/listinfo/release-team
Re: Change to tarball generation?
On Wednesday 23 May 2012 1:28:04 PM Wulf C. Krueger wrote: > Hello Albert et al, > > On 23.05.2012 18:58, Albert Astals Cid wrote: > > The machine i'm generating the tarballs doesn't have pixz so i'll > > use xz. > > Even though it's somewhat coincidental ;): Thank you. > > Personally, I think it would be an excellent idea to stick to the > standard xz format until the non-standard extensions that were used > for the last batch of tarballs get standardised themselves. > > (Those extensions caused user-visible breakage at least on Exherbo > Linux and Gentoo Linux.) > This whole thread is confusing me. Maybe a command line would help? Is this correct? % tar cvf kdefoo-x.y.z.tar % xz kdefoo-xy.z.tar => resulting in kdefoo-x.y.z.tar.xz if not, please tell us what a command line should be I take it from mpyne's original posting that: % tar Jcvf kdefoo-x.y.z.tar.xz isn't the way to go?? ___ release-team mailing list release-team@kde.org https://mail.kde.org/mailman/listinfo/release-team
Re: Change to tarball generation?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello Albert et al, On 23.05.2012 18:58, Albert Astals Cid wrote: > The machine i'm generating the tarballs doesn't have pixz so i'll > use xz. Even though it's somewhat coincidental ;): Thank you. Personally, I think it would be an excellent idea to stick to the standard xz format until the non-standard extensions that were used for the last batch of tarballs get standardised themselves. (Those extensions caused user-visible breakage at least on Exherbo Linux and Gentoo Linux.) - -- Best regards, Wulf -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk+9HiQACgkQnuVXRcSi+5pEuQCferJgyt0sn2pgnhdUcjM5JT0X PI0AoMcVThO4OlNyW4Fx9JabGQYV7GFx =mm5E -END PGP SIGNATURE- ___ release-team mailing list release-team@kde.org https://mail.kde.org/mailman/listinfo/release-team
Re: Change to tarball generation?
El Dimarts, 22 de maig de 2012, a les 23:48:25, Michael Pyne va escriure: > Hi all, > > I noticed something while we were on the topic of tagging the beta tomorrow > that I wanted to bring up, which is a concern with tarball generation. > Specifically, the various parallizeable tarball generators (pixz, pbzip2) > seem to generate extraneous data. > > tar is smart enough to ignore this extra data, but this can affect > decompressing our tarballs in a pipeline (i.e. xz --decompress > kdelibs-4.foo.tar.xz | tar xf -), as tar closing its STDIN causes xz to > write its excess data to a broken pipe. > > This probably doesn't annoy a ton of different people (except for the > obvious problem with source-based distros like Gentoo, e.g. > https://bugs.gentoo.org/show_bug.cgi?id=410861) but if the speedup is not > very substantial it would be better to use xz or bzip2 to avoid the problem > entirely. > > (This is done by adjusting the value of "compressors" in the pack release > script in case you're wondering). The machine i'm generating the tarballs doesn't have pixz so i'll use xz. Albert > > It might be possible to still get some concurrency benefit by batching up > modules to "pack" and then running 4 or 8 (or however many CPUs are around) > separate pack scripts at once, or fire off a pack while starting on tagging > the next module, etc. > > Thoughts? I'll be very clear that I don't think this should affect creating > the beta tarballs at all but if we choose to avoid the parallelizing > compressors hopefully that would be in time for the release candidates. > > Regards, > - Michael Pyne ___ release-team mailing list release-team@kde.org https://mail.kde.org/mailman/listinfo/release-team
Change to tarball generation?
Hi all, I noticed something while we were on the topic of tagging the beta tomorrow that I wanted to bring up, which is a concern with tarball generation. Specifically, the various parallizeable tarball generators (pixz, pbzip2) seem to generate extraneous data. tar is smart enough to ignore this extra data, but this can affect decompressing our tarballs in a pipeline (i.e. xz --decompress kdelibs-4.foo.tar.xz | tar xf -), as tar closing its STDIN causes xz to write its excess data to a broken pipe. This probably doesn't annoy a ton of different people (except for the obvious problem with source-based distros like Gentoo, e.g. https://bugs.gentoo.org/show_bug.cgi?id=410861) but if the speedup is not very substantial it would be better to use xz or bzip2 to avoid the problem entirely. (This is done by adjusting the value of "compressors" in the pack release script in case you're wondering). It might be possible to still get some concurrency benefit by batching up modules to "pack" and then running 4 or 8 (or however many CPUs are around) separate pack scripts at once, or fire off a pack while starting on tagging the next module, etc. Thoughts? I'll be very clear that I don't think this should affect creating the beta tarballs at all but if we choose to avoid the parallelizing compressors hopefully that would be in time for the release candidates. Regards, - Michael Pyne signature.asc Description: This is a digitally signed message part. ___ release-team mailing list release-team@kde.org https://mail.kde.org/mailman/listinfo/release-team