>Date: Sun, 23 Nov 2008 15:01:41 +0100
>From: Joerg.Schilling at fokus.fraunhofer.de (Joerg Schilling)
>
>Don Cragun <don.cragun at sun.com> wrote:
>
>> I'm sponsoring this case for Cynthia Eastham.
>>
>> Since this case follows the same general practices used when sparse
>> file support was added to the pax archiving utility, I'm marking this
>
>???
>
>AFAIK, the "pax" implementation that comes with Solaris does not support
>sparse files.
It does.
>
>> case as close approved automatic. If any members believe this needs to
>> be promoted to a fast track let me know.
>
>I see many small problems that need to be adressed before an implementation
>starts.
>
>> Template Version: @(#)sac_nextcase %I% %G% SMI
>> This information is Copyright 2008 Sun Microsystems
>> 1. Introduction
>> 1.1. Project/Component Working Name:
>> Add sparse file support to cpio
>> 1.2. Name of Document Author/Supplier:
>> Author: Cynthia Eastham
>> 1.3 Date of This Document:
>> 21 November, 2008
>> 4. Technical Description
>> 4.1 Details
>>
>> PSARC case 2006/331 (Add holey file support to pax) created
>
>I did never see such a case and the Sun pax man page does neither
>include "hole" nor "sparse".
The case was approved before PSARC cases were handled in cases open to
people who are not Sun employees. But, all of the important
information is included on Sun's current pax(1) man page. The pax
utility added an extended header to ustar and pax archive format
archives as described in the USAGE section, where it says:
"When using the -x xustar and -x -pax archive formats, if the
underlying file system reports that the file being archived
contains holes, the Solaris pax utility records the presence
of holes in an extended header record when the file is
archived. If this extended header record is associated with
a file in the archive, those holes are recreated whenever
that file is extracted from the archive. See the SEEK_DATA
and SEEK_HOLE whence values in lseek(2). In all other cases,
any NUL (\0) characters found in the archive is written to
the file when it is extracted."
and in the EXTENDED DESCRIPTION, where it says:
"SUN.holesdata A Solaris extension to pax extended header
keywords. Specifies the data and hole pairs
for a sparse file.
"In write or copy modes and when the xustar
or pax format (see -x format) is specified,
pax includes a SUN.holesdate extended
header record if the underlying file system
supports the detection of files with holes
(see fpathconf(2)) and reports that there
is at least one hole in the file being
archived. value consists of two or more
consecutive entries of the following form:
SPACEdata_offsetSPACEhole_offset
"where the data and hole offsets are the
long values returned by passing SEEK_DATA
and SEEK_HOLE to lseek(2), respectively.
For example, the following entry is an
example of the SUN.holesdata entry in the
extended header for a file with data
offsets at bytes 0, 24576, and 49152, and
hole offsets at bytes 8192, 32768, and
49159: 49 SUN.holesdata= 0 8192 24576 32768
49152 49159:
49 SUN.holesdata= 0 8192 24576 32768 49152 49159
"When extracting a file from an archive in
read or copy modes, if a SUN.holesdata =
pair is found in the extended header for
the file, then the file is restored with
the holes identified using this data. For
example, for the SUN.holesdata provided in
the example above, bytes from 0 to 8192 are
restored as data, a hole is created up to
the next data position (24576), bytes 24576
to 32768 is restored as data, and so forth."
>
>> This case adds similar sparse file support to the cpio utility.
>
>Similar to what?
Similar to pax.
>
>
>> In pass mode, (cpio -p), sparse files will be recreated at the
>> destination with the same holes that were present in the source
>> file, as long as the source file system supports reporting
>> holes (as described by PSARC case 2004/770) and the destination
>> file is seekable. Otherwise, holes in sparse files will be
>> filled with '\0' btyes in corresponding destination files as
>> they are now.
>
>How do you intend to switch between the sparse support mode and the non-sparse
>mode in "copy mode"?
There is no switch in copy mode. If the source filesystem reports
holes in a file, the holes will be duplicated in the destination file
as long as the destination file is seekable.
>
>> In copy out mode (-o) the following new option arguments to the
>> cpio -H option will be added to provide sparse file support:
>> ascii_sparse - assumes -c is specified. Only available
>> in copy out (-o) mode.
>> odc_sparse - assumes -H odc is specified Only available
>> in copy out (-o) mode.
>
>Adding sparse file support does not introduce a new archive format unless you
>create a new archive format that may be detected by reading the first archive
>header from a random archive.
Correct. When using -H ascii_sparse and -H odc_sparse, cpio uses ascii
and odc format archives, respectively; but it uses a different file
type when adding a sparse file to the archive. If an archiver
understands cpio ascii and odc format archives, it will understand the
archives. If an archiver doesn't recognize the extended file types,
the standards require that it extract the file data as a regular file
(which wlll contain the data needed to recreate the file contents with
holes in the proper positions).
>
>If you like to avoid to to introduce a new option, you would need to document
>this as a dirty hack. BTW: Where is the new man page?
Quoting from the references section of this case:
5.4 PSARC/2008/727/materials/cpio.1: Updated cpio.1 man page
>
>
>...
>
>> The following will apply when either '-H ascii_sparse' or
>> '-H odc_sparse' is specified with -o:
>> - The c_mode field will in the archive header will
>> indicate that the file is a sparse file. In the old
>> stat structure, the mode field is an unsigned short
>> (16 bit) field. To avoid conflicts with other file
>> types, a high order bit (17) in the c_mode field of
>> the header will be set.
>
>This is beyond the cpio specs. How do you plan to mark the archives
>as "Sun cpio" specific to allow to avoid incorrect behavior for non-Sun
>archives?
It is indicated by the file type.
>
>> - the file size field of the header will be the size of
>> the compressed sparse file (i.e., the size of the
>> header below plus the size of the file contents after
>> removing the holes).
>
>OK
>
>> - A string of the following format will be prepended to
>> the compressed file data:
>> "%lu %llu%s", prepended_info_size,
>> expanded_file_size, data/hole_offsets
>
>Is this data _inside_ the file data area or is it in conflict with the
>cpio extensions from David Korn and Glenn Fowler?
It is inside the file data area as indicated above. (The file size
field is the size of this header plus the size of the file contents
after removing the holes.)
>
>
>> where data/hole_offsets contains 2 or more entries of the
>> following format:
>> " %llu %llu", data_offset, hole_offset
>
>If you ever like to debug this, I would recommend to use:
>
> " %llu,%llu", data_offset, hole_offset
>
>to make the data parsable by human eyes..
Maybe to European human eyes. In the U.S., some possible data offset,
hole offset pairs could look like a single number with a the "," being
a thousands separator instead of as a pair separator. Besides that it
matches the string given as the data in a ustar/pax SUN.holesdata
extended header.
>
>But why don't you follow existing other implementations that use
>offset/numbytes pairs for data chunks? This results in a lower archive size.
I'm not going to argue decisions that were agreed upon for PSARC
2006/361. But, it follows naturally from the data provided by the
lseek(2) SEEK_HOLE and SEEK_DATA operations.
>
>
>> When the c_mode field is set, cpio will detect the sparse file
>> upon file extraction, and use the prepended sparse file
>> information to restore the holes in the file if the
>> destiination file is seekable. If the destination file is not
>> seekable, the sparse file information will be used to fill the
>> holes with '\0' bytes. Archivers that do not recognize the
>> sparse file mode bit will restore the compressed file and its
>> prepended data as a regular file.
>
>As it is unlikely that the first file in an archive is a sparse file, how
>do you intend to detect an archive that contain this Sun specific cpio
>extension?
By the file type.
>
>How do you intend to switch between the sparse support mode and the non-sparse
>mode in "extract mode"?
There is no switching. If a file is archived as a sparse file, it will
be extracted as a sparse file.
- Don
>
>
>J?rg