Bug#42158: a FreeBSD reference in disagreement with pax's behavior
My documentation for newc is based primarily on studying the implementation of GNU cpio. I've not found any good references for the history of this format. OK, this is good to know. I'm not saying one or the other program is wrong, but having a piece of documentation describing an implementation is of course not the same as a standard. POSIX considers cpio to be deprecated, so there's no chance that POSIX will ever formally standardize any cpio format variant other than the odc variant documented under pax. LSB documents this format since it's used by RPM. That's the only de jure standard I've found that discusses this particular cpio variant. Unfortunately, the LSB documentation for this format is pretty incomplete. It certainly doesn't discuss hardlink handling. The de facto standard for this format would be the implementation of cpio that originally shipped with SVr4. I don't know if SVr4 includes any documentation for the format apart from the implementation itself. I don't have access to SVr4 source code. Cheers, Tim Kientzle -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#42158: a FreeBSD reference in disagreement with pax's behavior
On Fri, Aug 15, 2008 at 01:59:41PM -0700, Tim Kientzle wrote: My documentation for newc is based primarily on studying the implementation of GNU cpio. I've not found any good references for the history of this format. OK, this is good to know. I'm not saying one or the other program is wrong, but having a piece of documentation describing an implementation is of course not the same as a standard. I'm a little unclear what pax implementation you're discussing. Based on the description below, I would The discusison started with the OpenBSD pax implementation, which also does cpio. OpenBSD pax has the same roots as the FreeBSD one, so I suspect some of the problems are shared. suggest you test whether this program duplicates bodies for each hardlink it stores. This is easy to test: Make two hardlinks to the same large file, archive them and see if the resulting archive is twice as big as the file. The odc (POSIX-1988) format should duplicate bodies for hardlinks. GNU cpio's implementation of newc format does not. Tar formats (including the POSIX-2001 pax extended format) do not as a rule, though the pax extended format does permit it as an option. OpenBSD pax does store multiple copies of hard linked files (at least when using the sv4cpio (SVR4 hex cpio) format, as you may have notited, the divergence between gnu cpcio and bsd cpio already starts with the names of the formats :-( My sympathies for the maintainers of the pax you're discussing; it is surprisingly difficult to correctly handle all three common approaches for hardlink management within a single program. yep, thanks ;-) Tim Kientzle Daniel Kahn Gillmor wrote: Tim Kientzle of FreeBSD (author of libarchive, attempting to CC here) describes the cpio format here: http://people.freebsd.org/~kientzle/libarchive/man/cpio.5.txt This document states about the SRV4 (newc) format (magic 070701, which is what we're dealing with): In this format, hardlinked files are handled by setting the filesize to zero for each entry except the last one that appears in the archive. So this is interpretation is shared by at least GNU and FreeBSD, afaict. I am not sure if libarchive and pax on FreeBSD share this. Which implementation of cpio is used on FreeBSD by default? pax appears to be in disagreement with these systems as far as its creation of SRV4/newc archives goes, since it stores a non-zero filesize for each entry of a hardlinked file. It's in dangerous disagreement with GNU and FreeBSD during the unpacking stage, because it re-creates hardlinked files as 0 bytes in length if it encounters archives created by the other utilities. Hope this is a useful reference, --dkg For Tim's reference: we're discussing pax here: http://bugs.debian.org/42158 I think it would be good to compare to OpenSolaris cpio, being a third independent implementation of cpio. At the moment I do not have access to one, but I'll try to setup something today. -Otto -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#42158: a FreeBSD reference in disagreement with pax's behavior
The discusison started with the OpenBSD pax implementation, which also does cpio. OpenBSD pax has the same roots as the FreeBSD one, so I suspect some of the problems are shared. This would be Keith Muller's old combined implementation of pax/cpio/tar. Here's the situation as I understand it: NetBSD and OpenBSD both use Keith Muller's old implementation for pax, cpio, and tar. I understand that both projects have done a lot of work on it over the years. FreeBSD's situation is in transition: * Uses my libarchive-based bsdtar implementation since FreeBSD 6.0. (Used GNU tar prior to that.) * Uses GNU cpio today, but might switch to my libarchive-based bsdpcio in FreeBSD 8.0 * Uses Keith Muller's pax implementation. (A libarchive-based pax is still a year or two out.) I should test this bug against the FreeBSD pax (another divergent tree based on Keith Muller's work). I think it would be good to compare to OpenSolaris cpio, being a third independent implementation of cpio. At the moment I do not have access to one, but I'll try to setup something today. Let us know what you find. Cheers, Tim Kientzle -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#42158: a FreeBSD reference in disagreement with pax's behavior
For Tim's reference: we're discussing pax here: http://bugs.debian.org/42158 I think it would be good to compare to OpenSolaris cpio, being a third independent implementation of cpio. At the moment I do not have access to one, but I'll try to setup something today. Oh, yeah. Gunnar Ritter's Heirloom toolchest (based on open-sourced ATT code) is also a good comparison point: http://heirloom.sourceforge.net/tools.html -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#42158: a FreeBSD reference in disagreement with pax's behavior
Tim Kientzle wrote: For Tim's reference: we're discussing pax here: http://bugs.debian.org/42158 I think it would be good to compare to OpenSolaris cpio, being a third independent implementation of cpio. At the moment I do not have access to one, but I'll try to setup something today. Oh, yeah. Gunnar Ritter's Heirloom toolchest (based on open-sourced ATT code) is also a good comparison point: http://heirloom.sourceforge.net/tools.html From Gunnar Ritter's cpio.1 manpage: The -c format was introduced with System V Release 4. Except for the file size, it imposes no practical limitations on files archived. The original SVR4 implementation stores the contents of hard linked files only once and with the last archived link. This cpio ensures compatibility with SVR4. With archives created by implementations that employ other methods for storing hard linked files, each file is extracted as a single link, and some of these files may be empty. I'm not sure what exactly this last sentence is supposed to mean. Tim -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#42158: a FreeBSD reference in disagreement with pax's behavior
Tim Kientzle of FreeBSD (author of libarchive, attempting to CC here) describes the cpio format here: http://people.freebsd.org/~kientzle/libarchive/man/cpio.5.txt This document states about the SRV4 (newc) format (magic 070701, which is what we're dealing with): In this format, hardlinked files are handled by setting the filesize to zero for each entry except the last one that appears in the archive. So this is interpretation is shared by at least GNU and FreeBSD, afaict. pax appears to be in disagreement with these systems as far as its creation of SRV4/newc archives goes, since it stores a non-zero filesize for each entry of a hardlinked file. It's in dangerous disagreement with GNU and FreeBSD during the unpacking stage, because it re-creates hardlinked files as 0 bytes in length if it encounters archives created by the other utilities. Hope this is a useful reference, --dkg For Tim's reference: we're discussing pax here: http://bugs.debian.org/42158 pgpL0sJXOozeq.pgp Description: PGP signature
Bug#42158: a FreeBSD reference in disagreement with pax's behavior
My documentation for newc is based primarily on studying the implementation of GNU cpio. I've not found any good references for the history of this format. I'm a little unclear what pax implementation you're discussing. Based on the description below, I would suggest you test whether this program duplicates bodies for each hardlink it stores. This is easy to test: Make two hardlinks to the same large file, archive them and see if the resulting archive is twice as big as the file. The odc (POSIX-1988) format should duplicate bodies for hardlinks. GNU cpio's implementation of newc format does not. Tar formats (including the POSIX-2001 pax extended format) do not as a rule, though the pax extended format does permit it as an option. My sympathies for the maintainers of the pax you're discussing; it is surprisingly difficult to correctly handle all three common approaches for hardlink management within a single program. Tim Kientzle Daniel Kahn Gillmor wrote: Tim Kientzle of FreeBSD (author of libarchive, attempting to CC here) describes the cpio format here: http://people.freebsd.org/~kientzle/libarchive/man/cpio.5.txt This document states about the SRV4 (newc) format (magic 070701, which is what we're dealing with): In this format, hardlinked files are handled by setting the filesize to zero for each entry except the last one that appears in the archive. So this is interpretation is shared by at least GNU and FreeBSD, afaict. pax appears to be in disagreement with these systems as far as its creation of SRV4/newc archives goes, since it stores a non-zero filesize for each entry of a hardlinked file. It's in dangerous disagreement with GNU and FreeBSD during the unpacking stage, because it re-creates hardlinked files as 0 bytes in length if it encounters archives created by the other utilities. Hope this is a useful reference, --dkg For Tim's reference: we're discussing pax here: http://bugs.debian.org/42158 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]