Bug#42158: a FreeBSD reference in disagreement with pax's behavior

2008-08-24 Thread Tim Kientzle

My documentation for newc is based primarily on studying the
implementation of GNU cpio.  I've not found any good
references for the history of this format.


OK, this is good to know. I'm not saying one or the other program is
wrong, but having a piece of documentation describing an
implementation is of course not the same as a standard. 


POSIX considers cpio to be deprecated, so there's
no chance that POSIX will ever formally standardize
any cpio format variant other than the odc variant
documented under pax.

LSB documents this format since it's used by RPM.
That's the only de jure standard I've found that
discusses this particular cpio variant.  Unfortunately,
the LSB documentation for this format is pretty
incomplete.  It certainly doesn't discuss hardlink
handling.

The de facto standard for this format would be
the implementation of cpio that originally shipped
with SVr4.  I don't know if SVr4 includes any
documentation for the format apart from the implementation
itself.  I don't have access to SVr4 source code.

Cheers,

Tim Kientzle




--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#42158: a FreeBSD reference in disagreement with pax's behavior

2008-08-18 Thread Otto Moerbeek
On Fri, Aug 15, 2008 at 01:59:41PM -0700, Tim Kientzle wrote:

 My documentation for newc is based primarily on studying the
 implementation of GNU cpio.  I've not found any good
 references for the history of this format.

OK, this is good to know. I'm not saying one or the other program is
wrong, but having a piece of documentation describing an
implementation is of course not the same as a standard. 


 I'm a little unclear what pax implementation you're
 discussing.   Based on the description below, I would

The discusison started with the OpenBSD pax implementation, which also
does cpio. OpenBSD pax has the same roots as the FreeBSD one, so I
suspect some of the problems are shared. 

 suggest you test whether this program duplicates bodies
 for each hardlink it stores.  This is easy to test:  Make
 two hardlinks to the same large file, archive them
 and see if the resulting archive is twice as big
 as the file.  The odc (POSIX-1988) format should
 duplicate bodies for hardlinks.  GNU cpio's implementation
 of newc format does not.  Tar formats (including
 the POSIX-2001 pax extended format) do not as a rule,
 though the pax extended format does permit it as an
 option.

OpenBSD pax does store multiple copies of hard linked files (at least
when using the sv4cpio (SVR4 hex cpio) format, as you may have
notited, the divergence between gnu cpcio and bsd cpio already starts
with the names of the formats :-(


 My sympathies for the maintainers of the pax you're
 discussing; it is surprisingly difficult to correctly
 handle all three common approaches for hardlink management
 within a single program.

yep, thanks ;-)


 Tim Kientzle


 Daniel Kahn Gillmor wrote:
 Tim Kientzle of FreeBSD (author of libarchive, attempting to CC here)
 describes the cpio format here:

  http://people.freebsd.org/~kientzle/libarchive/man/cpio.5.txt

 This document states about the SRV4 (newc) format (magic 070701, which
 is what we're dealing with):

  In this format, hardlinked files are handled by setting the
  filesize to zero for each entry except the last one that appears
  in the archive.

 So this is interpretation is shared by at least GNU and FreeBSD,
 afaict.

I am not sure if libarchive and pax on FreeBSD share this. Which
implementation of cpio is used on FreeBSD by default?


 pax appears to be in disagreement with these systems as far as its
 creation of SRV4/newc archives goes, since it stores a non-zero
 filesize for each entry of a hardlinked file.  It's in dangerous
 disagreement with GNU and FreeBSD during the unpacking stage, because
 it re-creates hardlinked files as 0 bytes in length if it encounters
 archives created by the other utilities.

 Hope this is a useful reference,

 --dkg

 For Tim's reference: we're discussing pax here:
 http://bugs.debian.org/42158

I think it would be good to compare to OpenSolaris cpio, being a third
independent implementation of cpio. At the moment I do not have access
to one, but I'll try to setup something today. 

-Otto




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#42158: a FreeBSD reference in disagreement with pax's behavior

2008-08-18 Thread Tim Kientzle

The discusison started with the OpenBSD pax implementation, which also
does cpio. OpenBSD pax has the same roots as the FreeBSD one, so I
suspect some of the problems are shared. 


This would be Keith Muller's old combined implementation
of pax/cpio/tar.  Here's the situation as I understand it:

NetBSD and OpenBSD both use Keith Muller's old implementation
for pax, cpio, and tar.  I understand that both projects
have done a lot of work on it over the years.

FreeBSD's situation is in transition:
  * Uses my libarchive-based bsdtar implementation since
FreeBSD 6.0.  (Used GNU tar prior to that.)
  * Uses GNU cpio today, but might switch to my libarchive-based
bsdpcio in FreeBSD 8.0
  * Uses Keith Muller's pax implementation.  (A libarchive-based
pax is still a year or two out.)

I should test this bug against the FreeBSD pax (another divergent
tree based on Keith Muller's work).


I think it would be good to compare to OpenSolaris cpio, being a third
independent implementation of cpio. At the moment I do not have access
to one, but I'll try to setup something today. 


Let us know what you find.

Cheers,

Tim Kientzle



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#42158: a FreeBSD reference in disagreement with pax's behavior

2008-08-18 Thread Tim Kientzle

For Tim's reference: we're discussing pax here:
http://bugs.debian.org/42158


I think it would be good to compare to OpenSolaris cpio, being a third
independent implementation of cpio. At the moment I do not have access
to one, but I'll try to setup something today. 


Oh, yeah.  Gunnar Ritter's Heirloom toolchest
(based on open-sourced ATT code) is also a good
comparison point:

 http://heirloom.sourceforge.net/tools.html




--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#42158: a FreeBSD reference in disagreement with pax's behavior

2008-08-18 Thread Tim Kientzle

Tim Kientzle wrote:

For Tim's reference: we're discussing pax here:
http://bugs.debian.org/42158


I think it would be good to compare to OpenSolaris cpio, being a third
independent implementation of cpio. At the moment I do not have access
to one, but I'll try to setup something today. 


Oh, yeah.  Gunnar Ritter's Heirloom toolchest
(based on open-sourced ATT code) is also a good
comparison point:

 http://heirloom.sourceforge.net/tools.html


From Gunnar Ritter's cpio.1 manpage:

The -c format was introduced with System V Release 4. Except
for the file size, it imposes no practical limitations on
files archived. The original SVR4 implementation stores the
contents of hard linked files only once and with the last
archived link. This cpio ensures compatibility with SVR4.
With archives created by implementations that employ other
methods for storing hard linked files, each file is extracted
as a single link, and some of these files may be empty.

I'm not sure what exactly this last sentence is supposed to
mean.

Tim



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#42158: a FreeBSD reference in disagreement with pax's behavior

2008-08-15 Thread Daniel Kahn Gillmor
Tim Kientzle of FreeBSD (author of libarchive, attempting to CC here)
describes the cpio format here:

 http://people.freebsd.org/~kientzle/libarchive/man/cpio.5.txt

This document states about the SRV4 (newc) format (magic 070701, which
is what we're dealing with):

 In this format, hardlinked files are handled by setting the
 filesize to zero for each entry except the last one that appears
 in the archive.

So this is interpretation is shared by at least GNU and FreeBSD,
afaict.

pax appears to be in disagreement with these systems as far as its
creation of SRV4/newc archives goes, since it stores a non-zero
filesize for each entry of a hardlinked file.  It's in dangerous
disagreement with GNU and FreeBSD during the unpacking stage, because
it re-creates hardlinked files as 0 bytes in length if it encounters
archives created by the other utilities.

Hope this is a useful reference,

--dkg

For Tim's reference: we're discussing pax here:
http://bugs.debian.org/42158


pgpL0sJXOozeq.pgp
Description: PGP signature


Bug#42158: a FreeBSD reference in disagreement with pax's behavior

2008-08-15 Thread Tim Kientzle

My documentation for newc is based primarily on studying the
implementation of GNU cpio.  I've not found any good
references for the history of this format.

I'm a little unclear what pax implementation you're
discussing.   Based on the description below, I would
suggest you test whether this program duplicates bodies
for each hardlink it stores.  This is easy to test:  Make
two hardlinks to the same large file, archive them
and see if the resulting archive is twice as big
as the file.  The odc (POSIX-1988) format should
duplicate bodies for hardlinks.  GNU cpio's implementation
of newc format does not.  Tar formats (including
the POSIX-2001 pax extended format) do not as a rule,
though the pax extended format does permit it as an
option.

My sympathies for the maintainers of the pax you're
discussing; it is surprisingly difficult to correctly
handle all three common approaches for hardlink management
within a single program.

Tim Kientzle


Daniel Kahn Gillmor wrote:

Tim Kientzle of FreeBSD (author of libarchive, attempting to CC here)
describes the cpio format here:

 http://people.freebsd.org/~kientzle/libarchive/man/cpio.5.txt

This document states about the SRV4 (newc) format (magic 070701, which
is what we're dealing with):

 In this format, hardlinked files are handled by setting the
 filesize to zero for each entry except the last one that appears
 in the archive.

So this is interpretation is shared by at least GNU and FreeBSD,
afaict.

pax appears to be in disagreement with these systems as far as its
creation of SRV4/newc archives goes, since it stores a non-zero
filesize for each entry of a hardlinked file.  It's in dangerous
disagreement with GNU and FreeBSD during the unpacking stage, because
it re-creates hardlinked files as 0 bytes in length if it encounters
archives created by the other utilities.

Hope this is a useful reference,

--dkg

For Tim's reference: we're discussing pax here:
http://bugs.debian.org/42158





--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]