hello,

my goal is to eliminate (or split out) the last file from a tar
archive, and i'm having trouble understanding the extra bytes added
and zero padding that is done to tar files.  let me illustrate what
i'm doing.

first, i'll create some sample files

$ dd if=/dev/urandom of=file1 count=100 bs=1000
$ dd if=/dev/urandom of=file2 count=100 bs=1000
$ dd if=/dev/urandom of=file3 count=100 bs=1000
$ dd if=/dev/urandom of=file4 count=100 bs=1000

and add them to a tar file

$ tar cf test.tar file1 file2 file3 file4
$ tar tf test.tar
-rw-r--r-- a/a 100000 2007-06-12 23:07 file1
-rw-r--r-- a/a 100000 2007-06-12 23:07 file2
-rw-r--r-- a/a 100000 2007-06-12 23:07 file3
-rw-r--r-- a/a 100000 2007-06-12 23:06 file4

i can now use a hex editor to find that file2 starts at byte 100864
(864 bytes more than file1 originally occupied), so file4 should start
at 302592, and if i want to eliminate it, i can use gnu split

$ split -b 302592 test.tar
$ tar tf xaa
-rw-r--r-- a/a 100000 2007-06-12 23:07 file1
-rw-r--r-- a/a 100000 2007-06-12 23:07 file2
-rw-r--r-- a/a 100000 2007-06-12 23:07 file3

which is exactly what i want.  the last file is no longer a part of
the tar archive.  as a check, if i extract these files and diff them
with the originals, i see that they are the same, so that is good.

my question is, how can i automate this?  at first i assumed that all
files would have 864 extra bytes, but that isn't true.  if i start
with 1000 byte files, the extra size is instead 536 bytes

$ dd if=/dev/urandom of=file1 count=1 bs=1000
$ dd if=/dev/urandom of=file2 count=1 bs=1000
$ dd if=/dev/urandom of=file3 count=1 bs=1000
$ dd if=/dev/urandom of=file4 count=1 bs=1000

$ tar cf test.tar file1 file2 file3 file4

to get the first three files, i split at 1536*3 = 4608

$ split -b 4608 test.tar
$ tar tvf xaa
-rw-r--r-- a/a  1000 2007-06-12 23:42 file1
-rw-r--r-- a/a  1000 2007-06-12 23:42 file2
-rw-r--r-- a/a  1000 2007-06-12 23:42 file3

again, i can extract and diff these files and find that they match.

another strange observation is that this file should only be of size
1536*4 = 6144 bytes, but instead it is zero padded to 10240 bytes.

$ ls -l test.tar
-rw-r--r-- 1  a   a   10240 2007-06-12 23:07 test.tar

maybe the tar file block size is 10240 and the hence the total size
must always be a multiple of that?  this is confirmed for the larger
file -- the tar file ended up being 409600 bytes, which is a multiple
of 10240, even though it only needed to be 100864*4 = 403456 bytes.

i don't understand what is going on here with the zero padding.  why
are there extra zeros padded to the end of the tar file?  are they
necessary?

but the bigger concern is why do 1k files have 536 extra bytes, but
100k files have 864 extra bytes?  is there a linear relation?  and how
can i automate the process of splitting out the last file in the tar
archive?

i know that this isn't really debian-specific, but any ideas or
suggestions would be greatly appreciated.  thanks.

mike


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to