Re: XZ embedded bug unpacking linux-3.8.tar.xz
On 03/01/2013 11:33:46 AM, Antonio Diaz Diaz wrote: Dear Denys. The mistake here would be to reject lzip... You deny the busybox maintainer's reality, and substitute your own! The current situation looks pretty simple: lzip and xz are roughly the same feature-wise, The only feature for which lzip and xz are roughly the same is compression speed/size. Sadly it seems the only feature ever tested/cared for by most users. Gee, I wonder why? Stop and think, what is any compression code in busybox _for_? The only reason to have xz in busybox at all is because there are a lot of existing tar.xz files out there. It is an existing, deployed file format which busybox wants to be compatible with. You're saying that you've got a new super compression format called arj or zoo or stuffit or binhex or whatever it is, and you'd very much like to shoehorn it into busybox in hopes of getting it wider adoption. Denys said no. You're getting huffy about it. I await the flounce. Therefore, in their real-world use, Busybox users will need to unpack *xz* files. Such as kernel tarballs from kernel.org, distribution .rpms with internally-xz'ed cpio archives, and many other things. This sees users as consumers. What about the users who want to create their own compressed files? They might want to do so in a format that people they send it to would previously have heard of. Given how bad an ambassador you are for your preferred choice, I'm guessing lzma ain't ever gonna be it. Not counting that any Busybox user wanting to check the integrity of files will avoid xz files anyway. Kernel tarballs are also distributed in bzip2 format. Great, so we've got this compression thing covered. So we don't need your new format, ever, for any reason, at all. Good to know. You still have a way in, though. You have prepared _compression_ support too. That is something xz embedded doesn't provide. Anyone who wants to _create_ a .xz file using bbox is potentially your client. I think there is a misunderstanding here. I am not seeking clients. I am trying to be the change I wish to see in the world. No, you're trying to make busybox be the change you see in the world, by leveraging the installed base of an established project to promote your agenda, and doing so _OVER_ the maintainer's objections. If the change you wish to make in the world is annoying people, you're doing great. Hijacking a mailing list thread about a bug to promote an alternate _incompatible_ implementation is not even potentially the same as addressing the bug. It's not look, this other code has a bug, I win! That's not how it works. I've been working to replace busybox with toybox for years and I still occasionally submit bug reports (and fixes!) here. Rob ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
On 03/01/2013 02:41:39 PM, Matias A. Fonzo wrote: There are people who like to have a full compressor/decompressor in Busybox, performing better than gzip/bzip2. xz compressor then. Precisely, adding the compressor, doesn't it imply adding more code?. More code than the expected, I guess... Adding more code is the price you pay for adding more functionality. Busybox can't have _no_ code and do anything useful. So the question is whether you get functionality worth the size and complexity penality of the code you choose to include. If you've got support for one end of an existing format, there's an argument for supporting the other end because that code has to exist somewhere for your code to be useful, and busybox tends to avoid external dependencies where possible. This is similar to if we're going to have patch, it should handle applying hunks at offsets, because otherwise people ahve to rip it out and replace it with a real patch to do anything useful. And if we have patch, we should have diff that can generate those... Arguing that our find should do -xdev is this class of argument: we already have code in this space and this is part of the feature set that people actually use with that code, and it's not too much code or complexity to be worth adding. That argument isn't there for adding support for a _new_ format. When you open a new can of worms you can make arguments based on use cases or user bases, but Denys's argument was about where you draw the line based on what busybox has already got. Rob ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
Hi Rob ! Hijacking a mailing list thread about a bug to promote an alternate _incompatible_ implementation is not even potentially the same as addressing the bug. It's not look, this other code has a bug, I win! That's not how it works. I've been working to replace busybox with toybox for years and I still occasionally submit bug reports (and fixes!) here. Now you are going to far! Nobody hijacked the thread, and when it was my mistake. Due to the running thread about an decompressor bug I hooked in and tried to push somewhat an older announcement from Antonio about lzip. This started this horrible discussion, as far as I know. I do not want to go further here. Denys said no. I sill like to (quickly) have a better compressor statically linked into the Busybox binary. At least until an xz compressor for Busybox is available. I accept Denys decision, but like to ask Antonio to further provide patches to add lzip compressor/decompressor to Busybox. That way those people who like, may add lzip to there Busybox. The only thing I want to request from Denys is a link from Busybox web site to Antonios site where he provides the patches (tiny utilities section?). To have a known starting location, in case someone loses the link to Anonios site. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
Hello Roy. Roy wrote: I wonder, why not adding .lzma support in lunzip(like what pdlzip does) for deprecating decompress_unlzma? If .lzma support is going to be added to some other applet, I guess it should be to unxz, because it is the full xz the one supporting .lzma, not the full lzip. Pdlzip is a hack whose main purpose is providing a public domain implementation of lzip to those who can't distribute GPL software. It also decompresses .lzma files just as a side effect of using the LZMA SDK from Igor Pavlov. Regards, Antonio. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
Hi Antonio, On Thu, Feb 28, 2013 at 11:20 PM, Antonio Diaz Diaz ant_d...@teleline.es wrote: text+data text+rodatarwdata bss filename 29282928 0 0 archival/libarchive/decompress_lunzip.o So basically, lzip lost the race wrt adoption. xz is used more widely. Kernel tarballs are .xz, not .lz. It depends on where you get your kernels from: http://linux-libre.fsfla.org/pub/linux-libre/releases/3.8-gnu/ https://www.kernel.org/pub/linux/kernel/v3.x/ Please understand my position. It's not about preferring xz over lzip, or other way around. Maintaining three copies of LZMA (de)compressors with virtually identical performance would be a mistake. The current situation looks pretty simple: lzip and xz are roughly the same feature-wise, but xz (fairly or not) managed to get much more widely adopted in current Linux distributions. Therefore, in their real-world use, Busybox users will need to unpack *xz* files. Such as kernel tarballs from kernel.org, distribution .rpms with internally-xz'ed cpio archives, and many other things. Therefore I don't see sufficient reason to add .lzip decompression support to bbox. You still have a way in, though. You have prepared _compression_ support too. That is something xz embedded doesn't provide. Anyone who wants to _create_ a .xz file using bbox is potentially your client. Unfortunately, there won't be many people interested in creating .lzip files. If you can you change your code so that it produces valid .xz files (even if they are stupid in a sense that they are merely LZMA chunks w/o LZMA2 improvements), then I will take it. -- vda ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
On Fri, Mar 1, 2013 at 4:18 PM, Harald Becker ra...@gmx.de wrote: On 01-03-2013 15:51 Denys Vlasenko vda.li...@googlemail.com wrote: Please understand my position. Maintaining three copies of LZMA (de)compressors with virtually identical performance would be a mistake. You are still right, but there is one big difference: lzip has a compressor in Busybox not only a decompressor. lzma and xz are only decompressors and it is handy to have same available in cases where you hit one of those files and you do not have access to the full package. What percentage of bbox users would want to produce .lzip files? It isn't a widely used format. bbox didn't have even bzip2 compressor for a long time. Beside this I prefer lzip due to its full implementation in Busybox ... or are you going to add an xz compressor in Busybox? Yes, adding xz compressor is a good idea. And in addition, if you do not like to have all those decompressors in your Busybox binary, you can disable your dislikes in the config. This does not remove the need to maintain the code. More code = more bugs. Rarely used code = bugs stay unfixed for a longer time. There are people who like to have a full compressor/decompressor in Busybox, performing better than gzip/bzip2. xz compressor then. -- vda ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
Hi Denys ! On 01-03-2013 18:03 Denys Vlasenko vda.li...@googlemail.com wrote: Yes, adding xz compressor is a good idea. xz compressor then. Fine! When is it available? Is one actively working at it? lzip is there and works. SCNR -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
Dear Denys. Denys Vlasenko wrote: Maintaining three copies of LZMA (de)compressors with virtually identical performance would be a mistake. This can be easily solved by removing the deprecated one, as others are doing. The mistake here would be to reject lzip just to keep the deprecated lzma-alone. The current situation looks pretty simple: lzip and xz are roughly the same feature-wise, The only feature for which lzip and xz are roughly the same is compression speed/size. Sadly it seems the only feature ever tested/cared for by most users. Therefore, in their real-world use, Busybox users will need to unpack *xz* files. Such as kernel tarballs from kernel.org, distribution .rpms with internally-xz'ed cpio archives, and many other things. This sees users as consumers. What about the users who want to create their own compressed files? Not counting that any Busybox user wanting to check the integrity of files will avoid xz files anyway. Kernel tarballs are also distributed in bzip2 format. Therefore I don't see sufficient reason to add .lzip decompression support to bbox. All right. If you ever change your mind, just ask me for an updated patch. :-) You still have a way in, though. You have prepared _compression_ support too. That is something xz embedded doesn't provide. Anyone who wants to _create_ a .xz file using bbox is potentially your client. I think there is a misunderstanding here. I am not seeking clients. I am trying to be the change I wish to see in the world. I prefer to see my work rejected better than causing harm to humankind by working on a project I consider a mistake. The lzip applet already produces full lzip files, not dumbed-down files like the ones an hypothetical xz applet could produce. Regards, Antonio. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
On Fri, Mar 1, 2013 at 6:33 PM, Antonio Diaz Diaz ant_d...@teleline.es wrote: You still have a way in, though. You have prepared _compression_ support too. That is something xz embedded doesn't provide. Anyone who wants to _create_ a .xz file using bbox is potentially your client. I think there is a misunderstanding here. I am not seeking clients. I am trying to be the change I wish to see in the world. (1) Why do you want the world to stop using .xz and start using .lzip? Apart from xz has a fatal flaw - it is not designed/written by me. (2) What are the chances of this happening? I prefer to see my work rejected better than causing harm to humankind by working on a project I consider a mistake. A possibly suboptimal choice of the prevalent LZMA compressor is way down on the list of dangers for the humankind. -- vda ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
Why talk about rejected work, or mistakes ? Alternatives are a good thing. Also, as useful and widespread as Busybox is, it doesn't have to be the be-all, end-all of embedded software; it doesn't have to package *everything* a user might need. As I see it, Busybox exists to provide low-resource-consuming (be it disk space or RAM) implementations of existing utilities - especially GNU utilities, that are traditionally feature-oriented instead of embedded-friendly. But if some software is already small and easy to use in restricted environments, why would Busybox have to integrate it ? If the original lzip utility doesn't require a nuclear plant to run, then making a Busybox version seems redundant - embedded users who want to use lzip can simply install the original ! Getting *everything* into Busybox - one binary to rule them all - smells a bit too much like systemd. Do we want to go there ? Instead, Denys, or whoever maintains the busybox.net website, there is a tinyutils.html page that is way out of date, and that seems precisely made to list utilities that might benefit embedded users, *additionally* to Busybox. I would very much like my own execline (in the scripting language section) and s6, and even s6-linux-utils (mainly for the s6-devd netlink utility), to appear there. If the original lzip qualifies, it could certainly be listed there too, as well as other utilities I'm not thinking of atm. Less work for busybox, same benefits for the community. If I can help concretely, I will be happy to. -- Laurent ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
Denys Vlasenko wrote: I think there is a misunderstanding here. I am not seeking clients. I am trying to be the change I wish to see in the world. (1) Why do you want the world to stop using .xz and start using .lzip? Apart from xz has a fatal flaw - it is not designed/written by me. This is the most gratuitous insult I have ever received. Even more so given that the motivation of this thread was a real flaw in xz. BTW, it was Gandhi the one who said You must be the change you wish to see in the world. (2) What are the chances of this happening? Unless a better algorithm is discovered, 100%. You can not fool all of the people all of the time. A possibly suboptimal choice of the prevalent LZMA compressor is way down on the list of dangers for the humankind. Certainly, but it was also Gandhi who said, Whatever you do will be insignificant, but it is very important that you do it. Regards, Antonio. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
On Fri, Mar 1, 2013 at 9:09 PM, Antonio Diaz Diaz ant_d...@teleline.es wrote: Denys Vlasenko wrote: I think there is a misunderstanding here. I am not seeking clients. I am trying to be the change I wish to see in the world. (1) Why do you want the world to stop using .xz and start using .lzip? Apart from xz has a fatal flaw - it is not designed/written by me. This is the most gratuitous insult I have ever received. In fact, many coders (including me) are susceptible to this effect: they like their code. I guess it's a human nature. Even more so given that the motivation of this thread was a real flaw in xz. I do not see any flaws in xz. I just reread its specification and it doesn't sound bad (although I'd do a few things differently). -- vda ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
El Fri, 1 Mar 2013 18:03:55 +0100 Denys Vlasenko vda.li...@googlemail.com escribió: On Fri, Mar 1, 2013 at 4:18 PM, Harald Becker ra...@gmx.de wrote: On 01-03-2013 15:51 Denys Vlasenko vda.li...@googlemail.com wrote: Please understand my position. Maintaining three copies of LZMA (de)compressors with virtually identical performance would be a mistake. You are still right, but there is one big difference: lzip has a compressor in Busybox not only a decompressor. lzma and xz are only decompressors and it is handy to have same available in cases where you hit one of those files and you do not have access to the full package. What percentage of bbox users would want to produce .lzip files? How to know it?. It isn't a widely used format. With this thought (nothing personal), what chances have the good alternatives out there?. (xz is not more popular (or widely used) than gzip or bzip2). bbox didn't have even bzip2 compressor for a long time. Hmm.. what about the memory usage?. Beside this I prefer lzip due to its full implementation in Busybox ... or are you going to add an xz compressor in Busybox? Yes, adding xz compressor is a good idea. And in addition, if you do not like to have all those decompressors in your Busybox binary, you can disable your dislikes in the config. This does not remove the need to maintain the code. More code = more bugs. Rarely used code = bugs stay unfixed for a longer time. There are people who like to have a full compressor/decompressor in Busybox, performing better than gzip/bzip2. xz compressor then. Precisely, adding the compressor, doesn't it imply adding more code?. More code than the expected, I guess... ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
El Fri, 1 Mar 2013 21:50:44 +0100 Denys Vlasenko vda.li...@googlemail.com escribió: On Fri, Mar 1, 2013 at 9:41 PM, Matias A. Fonzo s...@dragora.org wrote: What percentage of bbox users would want to produce .lzip files? How to know it? It isn't a widely used format. With this thought (nothing personal), what chances have the good alternatives out there?. (xz is not more popular (or widely used) than gzip or bzip2). LZMA-based compressors give a better, and slower, compression than bzip2. It is not unexpected that with faster processors, we reached the point when people can use it without excessive time penalty. Kernel is released in .xz tarballs (in addition to .bz2). Distributions are using xz-compressed .rpms. I prefer to download tarballs in bzip2 format, (if there's no other option between xz or bzip2). At least, bzip2 provides a recovery tool. ;-) By the way -- RPM has lzip support[1]: [1] http://www.rpm.org/ticket/839 These are cold hard facts. I don't invent them. Try googling for kernel tarballs in .lzip.Or any tarballs in .lzip for that matter. Sure, I found them... *eventually*. Busybox has no xz compression support, but it inevitably will be requested. (As it has happened with bzip2). And if by that time it will have lzip, it ended up having *two* LZMA compressors, one widely used and another much less known. I don't thing having that extra baggage would be useful. This criteria was applied to sysvinit vs. runit, too?. :-) One can choose. Regards, Matias ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
Hi Harald, 1) I like to create small self contained initramfs systems. Systems included in a single kernel image. So all you need to run the system is that single kernel image and a boot loader to start. In those initramfs systems i like to have only a minimum of statically linked binaries with a maximum of flexibility. At best there is only one binary containing all the utilities and the special application binaries. Oh, I do not deny there are good points in favour of having one single binary; your use case is one of them. I personally build my systems from bits and pieces, busybox being a tool among others, and agree that the more pieces, the more work for me. You have to weigh the costs: my approach undeniably means more work for the administrator, but is also more flexible. The day you need a utility that is not included in Busybox, you will have to pack it by hand too, so my point is that Busybox including stuff is *convenient* more than *required*. 2) Comparing a utility collection box like Busybox with a tool like systemd really smells bad. They are two completely different things. The comparison was obviously over-the-top and provocative, and I'm glad it elicited a reaction from you - but I wanted to poke at Denys, who shares my dislike of systemd for the same reasons ;) However, I think the underlying question about Busybox's policy needs to be addressed. If Busybox starts including things that are already small and embeddable to begin with (and I think it has already started going down this path with runit), then it becomes a one-stop-shop, a kind of Linux distribution, and like every distribution, sooner or later it will have to include the whole world. I would much rather have it stick to providing replacements for standard utilities that really need rewriting, along with a collection of links to other small, high-quality utilities - do one job and do it well, as the Unix philosophy says; be a part of the community instead of trying to be the whole community, which is exactly the same kind of hubris systemd (as well as most distributions, really) is suffering from. -- Laurent ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz (was: Re: tar: short read on linux-3.8.tar.xz)
Hi Denys ! What I'm saying is that bbox project would like to have is (ideally) _one_ LZMA decoder. Unpacking the compressed stream from two formats isn't a terribly difficult thing. You are right and I don't want to start another format war. In addition to this there is one issue which lets me hop on that lzip. It is not only an LZMA decompressor it has also a compressor counterpart in Busybox. A compressor with gives better results than bzip2. And there is no small xz compressor available for Busybox, or is there any light at the horizont? And what about lzop? It is another compressor not very widely used. Why can't we have those lzip compressor in Busybox? Those who dislike that format may disable it in configuration. ... but another time: You are right, it would be very nice to have a single LZMA decompressor to uncompress lzma, lzip and xz streams. If this is possible. ... just as it would be nice to have a single zcat able to detect the format and decompress ANY compressed stream (falling back to operation of cat, if uncompressed data given to zcat). With a generalized uncompress to do zcat file temp; then rename temp and mangle name extensions (like gunzip). -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
Harald Becker wrote: ... just as it would be nice to have a single zcat able to detect the format and decompress ANY compressed stream (falling back to operation of cat, if uncompressed data given to zcat). I guess you don't know zutils. http://www.nongnu.org/zutils/zutils.html Regards, Antonio. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
On Thu, Feb 28, 2013 at 8:54 AM, Michael Tokarev m...@tls.msk.ru wrote: For some reason I haven't heard of lzip at all until now. Yes. That's the problem, maybe the main one: xz people won on this front hands down, even if technically lzip is better. -- vda ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
Hi Antonio ! On 28-02-2013 14:34 Antonio Diaz Diaz ant_d...@teleline.es wrote: Harald Becker wrote: ... just as it would be nice to have a single zcat able to detect the format and decompress ANY compressed stream (falling back to operation of cat, if uncompressed data given to zcat). Pleace read as: ... have a single zcat ... AS AN BUSBOX APPLET ... I guess you don't know zutils. http://www.nongnu.org/zutils/zutils.html On my systems sits a script in /usr/local/zcat. That one works well on on regular files, which is the 99.9% case I need. A zcat applet as part of Busybox would simplify things, especially on small systems and on rescue images. IMO a wide spread general zcat (may be as part of GNU) would be better than separate decompressors for every compression format ... but as a little fly it is difficult to move the hole world. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
Hello Harald. Harald Becker wrote: Pleace read as: ... have a single zcat ... AS AN BUSBOX APPLET ... I see. Well, perhaps the zcat from zutils could be adapted to Busybox. :-) IMO a wide spread general zcat (may be as part of GNU) would be better than separate decompressors for every compression format ... but as a little fly it is difficult to move the hole world. Zutils was not accepted in GNU because the names conflict with those in gzip. Tell me about how difficult it is for a little fly to move the whole world. :-) Regards, Antonio. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
On Thu, Feb 28, 2013 at 5:53 PM, Antonio Diaz Diaz ant_d...@teleline.es wrote: You didn't try lzip but plzip, which is beta software. And of course, parallel versions of lzip or xz compress less than standard versions because they split data in blocks before compressing it. But even so there is someting wrong with your test. Maybe your C++ compiler produces slower executables than the C compiler, or you used an old version of plzip or lzlib... I have just retried to compress gcc-4.7.2.tar (just in case) and in my single-processor machine, plzip (using the default compression level) is faster(6:16) than both lzip(6:37) and xz(7:32), just as expected. Why is this expected? Because both lzip and plzip use a default value for --match-length smaller than the equivalent option in xz (36 vs 64), and plzip sees a smaller effective dictionary size because it splits the input data in blocks. Parallel compression benchmark isn't interesting: the task is trivially parallelizable, so the speedup will be nerly exactly proportional to the number of CPUs. Compare one-thread xz against one-thread lzip. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
Denys Vlasenko wrote: On Thu, Feb 28, 2013 at 8:54 AM, Michael Tokarev wrote: For some reason I haven't heard of lzip at all until now. Yes. That's the problem, maybe the main one: xz people won on this front hands down, even if technically lzip is better. Lzip is not going to disappear. Maybe a small group of influential people already familiarized with lzma-utils helped xz to gain a head start, but lzip is much more in line with what is expected from a compressor in unix-like systems. Think about it, and remember the problem that began this thread. What LZMA compressor do you think is better for Busibox users; one that behaves essentially like gzip and bzip2, or one that users of small systems will probably never be able to make full use of? Regards, Antonio. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
Denys Vlasenko wrote: Why?. *.lzma are deprecated some time ago Because someone submitted the code: I also submitted the code[1]. :-) [1] http://lists.busybox.net/pipermail/busybox/2012-December/078750.html In fact, it is surprisingly small: archival/libarchive: textdata bss dec hex filename 2827 0 02827 b0b decompress_unlzma.o 7277 0 072771c6d decompress_unxz.o 2743 0 02743 ab7 decompress_bunzip2.o 5270 0 052701496 decompress_gunzip.o Just the same size as lunzip minus the header and integrity checkings: text+data text+rodatarwdata bss filename 29282928 0 0 archival/libarchive/decompress_lunzip.o So basically, lzip lost the race wrt adoption. xz is used more widely. Kernel tarballs are .xz, not .lz. It depends on where you get your kernels from: http://linux-libre.fsfla.org/pub/linux-libre/releases/3.8-gnu/ BTW, xz files are a bit smaller than lz files in the above directory, but you need twice the memory to decompress them. Of course lzip can achieve the same (or better) compresion if you add -s 64MiB to the command line. What I'm saying is that bbox project would like to have is (ideally) _one_ LZMA decoder. Unpacking the compressed stream from two formats isn't a terribly difficult thing. But xz is not (only) a LZMA encoder, therefore no LZMA decoder will ever be able to decode xz streams. But can lzip decompress unxz *stream*? If we have learned something from this thread is that not even all unxz's can decompress all xz streams. Much less verify their integrity. Regards, Antonio. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
On Mon, Feb 25, 2013 at 12:05 PM, Lasse Collin lasse.col...@tukaani.org wrote: liblzma in XZ Utils has a flag to decode concatenated streams to make it a bit easier to handle such files. I would prefer to not include such a flag in XZ Embedded, since I think in most embedded situations (boot loaders, kernels etc.) such a flag is useless. Busybox is an exception to this. Below is a patch to add support for concatenated .xz streams. It also handles possible padding (sequence of zero-bytes) between the streams. It probably has room for improvement, but it should be a useful starting point. Applied, thanks! -- vda ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
On Mon, Feb 25, 2013 at 12:05 PM, Lasse Collin lasse.col...@tukaani.org wrote: By the way, since Busybox' copy of XZ Embedded hasn't been updated since unxz was added, this bug fix is missing from Busybox: http://git.tukaani.org/?p=xz-embedded.git;a=commitdiff;h=4cec51e1be4797a4bd8b266a1d34cabd7fdb79fd There is also the following bug fix but I think it doesn't affect Busybox' unxz: http://git.tukaani.org/?p=xz-embedded.git;a=commitdiff;h=9690fe69dc97eb2e7fe2804e4448a5278cde5411 I incorporated these and a few other changes, thanks! -- vda ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
Hello Denys et all. Denys Vlasenko wrote: On Mon, Feb 25, 2013 at 7:20 PM, Matias A. Fonzo s...@dragora.org wrote: Can be lzip considered for inclusion in busybox?: [...] Matias, sure, this can be done. But bbox already has *two* LZMA decompressors. Feels wrong, isn't it? It certainly feels wrong, but those two are in reality the same one, which suffered a radical and bumpy transformation. (Do you remember lzma-4.42, which had a format incompatible with both lzma-4.32 and xz?) Some people think that adding lzma support to GNU tools was a mistake. I think that adding xz support was simply the continuation of the same mistake. As lzma is legacy software, I guess it will be eventually removed from Busybox, just as it is being removed from GNU packages: The deprecated 'lzma' compression format for distribution archives has been removed, in favor of 'xz' and 'lzip'[1]. [1] http://lists.gnu.org/archive/html/automake/2012-04/msg00060.html The xz decompressor included in Busybox is not able to decompress all valid xz files because it only understands the xz-embedded subset of the xz format. Therefore, any user wanting to decompress or check the integrity of real xz files needs to install the full xz! None of the other formats (bzip2, gzip, lzip) have this problem (the lunzip proposed for Busybox is able to decompress and check any .lz file, even those produced by the parallel version of lzip, plzip). And it can only get worse for xz, because It is possible and even somewhat likely that new features will be added in the future which old programs won't support[2]. [2] http://www.mail-archive.com/xz-devel@tukaani.org/msg00059.html In the long run it would be a nightmare to have two or more LZMA (de)compressors in common use on Linux. Agreed. What happened between lzip and xz? Are they incompatible? On what level? File format, or compression stream format too? The history in a nutshell: In 2008, Antonio Diaz released lzip, which uses a proper container format with checksums and magic numbers instead of the raw LZMA data stream, providing a complete Unix-style solution for using LZMA. Nevertheless, LZMA Utils was extended to have similar features and then renamed to XZ Utils[3]. [3] http://en.wikipedia.org/wiki/Lzip Lzip and xz are totally incompatible. Lzip uses the same stream format that .lzma files, just with proper header and trailer. Xz is a complex container format derived from 7-zip (or at least inspired by it) and without any resemblance to the old .lzma format. Lzip is a compressor, just like gzip and bzip2. Xz is much more complex than that. Even the stripped-down version of unxz included in Busybox is already larger than any of the other decompressors. IMHO all this leaves lzip as the LZMA compressor most suitable for Unix-like systems in general, and for Busybox in particular. Best regards, Antonio. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
28.02.2013 04:22, Antonio Diaz Diaz wrote: [] The history in a nutshell: In 2008, Antonio Diaz released lzip, which uses a proper container format with checksums and magic numbers instead of the raw LZMA data stream, providing a complete Unix-style solution for using LZMA. Nevertheless, LZMA Utils was extended to have similar features and then renamed to XZ Utils[3]. Oh. I remember that 2008 year (or a bit before) when kernel folks discussed which format to use for kernel.org archives and leaned towards lzma, and I pointed out that it does not have any checksums. I guess it was a starting point for xz and lzip. For some reason I haven't heard of lzip at all until now. I remember when xz come out, I looked at it and noticed its complexity and lack of stable format, exactly as you describe, but that didn't rang any bells for me and eventually it become a widely known and accepted format. So, I become curious how lzip behaves. And I immediately gave it a very quick try. CPU: AMD AthlonII X2 260, 3.2GHz (2 cores) file: 1Gb (1073741824 bytes), an image of a small linux virtual machine. .lz: 273684804, real 11m53.112s, user 20m30.563s .xz: 266670056, real 11m8.190s, user 10m45.835s This is the default compression level. WOW. So, 2-thread plzip is about TWO TIMES solwer than single-thread xz when compressing, making parallel plzip on 2 cores to be as fast as xz. lz produces slightly larger result. Are you sure the stream and compression algorithm are the same? :) Thanks, /mjt ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
On 02/26/2013 07:43 AM, Michael Tokarev wrote: 26.02.2013 03:21, John Spencer wrote: [ quoting the full mail of lasse since it didnt make its way into the bb maillist yet ] Additionally there has been a discussion and attempts to cook up a patch in Debian, see http://bugs.debian.org/686502 , which I submitted as a bug to busybox bugzilla -- https://bugs.busybox.net/show_bug.cgi?id=5804 . Cc'ing the Debian bugreport. I like the below patch better :) the patches for busybox 1.20.2 are available in this commit https://github.com/rofl0r/sabotage/commit/c03ddd39878473939bda6b574bc8854c533b4b00 (so that you dont have to backport them yourselves again) i.e. https://raw.github.com/rofl0r/sabotage/c03ddd39878473939bda6b574bc8854c533b4b00/KEEP/busybox-xz-bugfix1.patch https://raw.github.com/rofl0r/sabotage/c03ddd39878473939bda6b574bc8854c533b4b00/KEEP/busybox-xz-bugfix2.patch https://raw.github.com/rofl0r/sabotage/c03ddd39878473939bda6b574bc8854c533b4b00/KEEP/busybox-xz-bugfix3.patch /mjt --JS ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz (was: Re: tar: short read on linux-3.8.tar.xz)
El Mon, 25 Feb 2013 07:14:28 +0100 Denys Vlasenko vda.li...@googlemail.com escribió: [CC'ing XZ embedded author] On Sunday 24 February 2013 22:37, John Spencer wrote: http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.8.tar.xz using busybox 1.20.2 and xz 5.0.3 or xz 5.0.4: $ tar xf linux-3.8.tar.xz i get: short read and exit status 1. however the data seems to be there (at least partial). the culprit is the file linux-3.8/drivers/media/tuners/mt2063.c after doing xzcat linux-3.8.tar.xz linux-3.8.tar , that file is truncated after 4096*2+512 bytes. xzcat is from busybox (not from xz, as i assumed earlier) the .tar file is truncated at this point as well, it is only 200 MB, but with xzcat from xz package, it is 500 MB. Apparently XZ embedded has a bug :( Not only our in-tree one, but the latest git of it is buggy too: $ git clone http://git.tukaani.org/xz-embedded.git $ cd xz-embedded/userspace $ make $ ./xzminidec /tmp/linux-3.8.tar.xz | wc -c ./xzminidec: Unsupported check; not verifying file integrity ..working for some time... 201330688 (xzminidec doesn't crash: exit code is zero). The peculiar thing is that 201330688 is exactly 0x0c001000. Lack of integrity checking, (de)compressor. Can be lzip considered for inclusion in busybox?: [1] http://lzip.nongnu.org [2] http://en.wikipedia.org/wiki/Lzip [3] http://lists.busybox.net/pipermail/busybox/2012-December/078750.html [4] http://ur1.ca/810mp ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz (was: Re: tar: short read on linux-3.8.tar.xz)
Hi ! Can be lzip considered for inclusion in busybox?: I'm also interested to get that lzip into Busybox. I used the provided patch on current snapshot and it works fine for me. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: XZ embedded bug unpacking linux-3.8.tar.xz
[ quoting the full mail of lasse since it didnt make its way into the bb maillist yet ] On 02/25/2013 12:05 PM, Lasse Collin wrote: On 2013-02-25 Denys Vlasenko wrote: On Sunday 24 February 2013 22:37, John Spencer wrote: http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.8.tar.xz using busybox 1.20.2 and xz 5.0.3 or xz 5.0.4: $ tar xf linux-3.8.tar.xz i get: short read and exit status 1. however the data seems to be there (at least partial). the culprit is the file linux-3.8/drivers/media/tuners/mt2063.c after doing xzcat linux-3.8.tar.xz linux-3.8.tar , that file is truncated after 4096*2+512 bytes. xzcat is from busybox (not from xz, as i assumed earlier) the .tar file is truncated at this point as well, it is only 200 MB, but with xzcat from xz package, it is 500 MB. Apparently XZ embedded has a bug :( Not only our in-tree one, but the latest git of it is buggy too: $ git clone http://git.tukaani.org/xz-embedded.git $ cd xz-embedded/userspace $ make $ ./xzminidec/tmp/linux-3.8.tar.xz | wc -c ./xzminidec: Unsupported check; not verifying file integrity ..working for some time... 201330688 (xzminidec doesn't crash: exit code is zero). The peculiar thing is that 201330688 is exactly 0x0c001000. linux-3.8.tar.xz from kernel.org has three concatenated .xz streams. You can see this with e.g. xz -l or xz -lvv. At least pxz creates such .xz files. Such files are valid and fine. xzminidec is a limited example program. It doesn't support concatenated streams. This is mentioned in the comment in the beginning of xzminidec.c. One may argue if it is a bug or a feature, but at least the limitation has been documented. Busybox' xzcat lacks support for concatenated .xz streams. For comparison, Busybox' zcat and bzcat do support concatenated streams. $ echo foo | gzip test.gz $ echo bar | gzip test.gz $ busybox zcat test.gz foo bar $ echo foo | xz test.xz $ echo bar | xz test.xz $ busybox xzcat test.xz foo liblzma in XZ Utils has a flag to decode concatenated streams to make it a bit easier to handle such files. I would prefer to not include such a flag in XZ Embedded, since I think in most embedded situations (boot loaders, kernels etc.) such a flag is useless. Busybox is an exception to this. Below is a patch to add support for concatenated .xz streams. It also handles possible padding (sequence of zero-bytes) between the streams. It probably has room for improvement, but it should be a useful starting point. diff --git a/archival/libarchive/decompress_unxz.c b/archival/libarchive/decompress_unxz.c index 79b48a1..5ebbd28 100644 --- a/archival/libarchive/decompress_unxz.c +++ b/archival/libarchive/decompress_unxz.c @@ -86,8 +86,40 @@ unpack_xz_stream(transformer_aux_data_t *aux, int src_fd, int dst_fd) IF_DESKTOP(total += iobuf.out_pos;) iobuf.out_pos = 0; } - if (r == XZ_STREAM_END) { - break; + while (r == XZ_STREAM_END) { + /* Handle concatenated .xz Streams including possible +* Stream Padding. +*/ + if (iobuf.in_pos == iobuf.in_size) { + int rd = safe_read(src_fd, membuf, BUFSIZ); + if (rd 0) { + bb_error_msg(bb_msg_read_error); + total = -1; + goto out; + } + if (rd == 0) + goto out; + + iobuf.in_size = rd; + iobuf.in_pos = 0; + } + + /* Stream Padding must always be a multiple of four +* bytes to preserve four-byte alignment. To keep the +* code slightly smaller, we aren't as strict here as +* the .xz spec requires. We just skip all zero-bytes +* without checking the alignment and thus can accept +* files that aren't valid, e.g. the XZ Utils test +* files bad-0pad-empty.xz and bad-0catpad-empty.xz. +*/ + while (iobuf.in_pos iobuf.in_size) { + if (membuf[iobuf.in_pos] != 0) { + xz_dec_reset(state); + r = XZ_OK; + break; + } + ++iobuf.in_pos; + } } if (r != XZ_OK r != XZ_UNSUPPORTED_CHECK) { bb_error_msg(corrupted data); @@ -95,6 +127,8 @@ unpack_xz_stream(transformer_aux_data_t *aux,
Re: XZ embedded bug unpacking linux-3.8.tar.xz
26.02.2013 03:21, John Spencer wrote: [ quoting the full mail of lasse since it didnt make its way into the bb maillist yet ] Additionally there has been a discussion and attempts to cook up a patch in Debian, see http://bugs.debian.org/686502 , which I submitted as a bug to busybox bugzilla -- https://bugs.busybox.net/show_bug.cgi?id=5804 . Cc'ing the Debian bugreport. I like the below patch better :) /mjt On 02/25/2013 12:05 PM, Lasse Collin wrote: On 2013-02-25 Denys Vlasenko wrote: On Sunday 24 February 2013 22:37, John Spencer wrote: http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.8.tar.xz using busybox 1.20.2 and xz 5.0.3 or xz 5.0.4: $ tar xf linux-3.8.tar.xz i get: short read and exit status 1. however the data seems to be there (at least partial). the culprit is the file linux-3.8/drivers/media/tuners/mt2063.c after doing xzcat linux-3.8.tar.xz linux-3.8.tar , that file is truncated after 4096*2+512 bytes. xzcat is from busybox (not from xz, as i assumed earlier) the .tar file is truncated at this point as well, it is only 200 MB, but with xzcat from xz package, it is 500 MB. Apparently XZ embedded has a bug :( Not only our in-tree one, but the latest git of it is buggy too: $ git clone http://git.tukaani.org/xz-embedded.git $ cd xz-embedded/userspace $ make $ ./xzminidec/tmp/linux-3.8.tar.xz | wc -c ./xzminidec: Unsupported check; not verifying file integrity ..working for some time... 201330688 (xzminidec doesn't crash: exit code is zero). The peculiar thing is that 201330688 is exactly 0x0c001000. linux-3.8.tar.xz from kernel.org has three concatenated .xz streams. You can see this with e.g. xz -l or xz -lvv. At least pxz creates such .xz files. Such files are valid and fine. xzminidec is a limited example program. It doesn't support concatenated streams. This is mentioned in the comment in the beginning of xzminidec.c. One may argue if it is a bug or a feature, but at least the limitation has been documented. Busybox' xzcat lacks support for concatenated .xz streams. For comparison, Busybox' zcat and bzcat do support concatenated streams. $ echo foo | gzip test.gz $ echo bar | gzip test.gz $ busybox zcat test.gz foo bar $ echo foo | xz test.xz $ echo bar | xz test.xz $ busybox xzcat test.xz foo liblzma in XZ Utils has a flag to decode concatenated streams to make it a bit easier to handle such files. I would prefer to not include such a flag in XZ Embedded, since I think in most embedded situations (boot loaders, kernels etc.) such a flag is useless. Busybox is an exception to this. Below is a patch to add support for concatenated .xz streams. It also handles possible padding (sequence of zero-bytes) between the streams. It probably has room for improvement, but it should be a useful starting point. diff --git a/archival/libarchive/decompress_unxz.c b/archival/libarchive/decompress_unxz.c index 79b48a1..5ebbd28 100644 --- a/archival/libarchive/decompress_unxz.c +++ b/archival/libarchive/decompress_unxz.c @@ -86,8 +86,40 @@ unpack_xz_stream(transformer_aux_data_t *aux, int src_fd, int dst_fd) IF_DESKTOP(total += iobuf.out_pos;) iobuf.out_pos = 0; } -if (r == XZ_STREAM_END) { -break; +while (r == XZ_STREAM_END) { +/* Handle concatenated .xz Streams including possible + * Stream Padding. + */ +if (iobuf.in_pos == iobuf.in_size) { +int rd = safe_read(src_fd, membuf, BUFSIZ); +if (rd 0) { +bb_error_msg(bb_msg_read_error); +total = -1; +goto out; +} +if (rd == 0) +goto out; + +iobuf.in_size = rd; +iobuf.in_pos = 0; +} + +/* Stream Padding must always be a multiple of four + * bytes to preserve four-byte alignment. To keep the + * code slightly smaller, we aren't as strict here as + * the .xz spec requires. We just skip all zero-bytes + * without checking the alignment and thus can accept + * files that aren't valid, e.g. the XZ Utils test + * files bad-0pad-empty.xz and bad-0catpad-empty.xz. + */ +while (iobuf.in_pos iobuf.in_size) { +if (membuf[iobuf.in_pos] != 0) { +xz_dec_reset(state); +r = XZ_OK; +break; +} +++iobuf.in_pos; +} } if (r != XZ_OK r != XZ_UNSUPPORTED_CHECK) { bb_error_msg(corrupted data); @@ -95,6 +127,8 @@ unpack_xz_stream(transformer_aux_data_t *aux, int src_fd, int dst_fd) break; } } + +out: