Re: [Dorset] Use of cat to concatenate .gz files
On 17/01/2011 12:30, d-...@hadrian-way.co.uk wrote: I just spotted this http://en.wikipedia.org/wiki/Cat_%28Unix%29#Binary_use. Is this working because the .gz file is treated as a binary? Cat cares not one jot the file format, it just sticks the bits together AFAIK. As I said earlier, gzip files can be concatenated as they are a sequential list of members. If you stitch two together, when being processed, gzip just happily finds the next member header after the end of the original file and continues processing. -- Next meeting: Bournemouth, Tuesday 2011-02-01 20:00 Meets, Mailing list, IRC, LinkedIn, ... http://dorset.lug.org.uk/ How to Report Bugs Effectively: http://goo.gl/4Xue
Re: [Dorset] Use of cat to concatenate .gz files
Hi Terry, Now my understanding of cat goes back to a noddy Unix course about 15-20 years ago, but I always thought the 'cat' stood for 'catalogue' and was used to list the content of a text file. Having looked at http://unixhelp.ed.ac.uk/CGI/man-cgi?cat though, I now know that it stands for concatenate That man page is wrong! It doesn't stand for concatenate, else it would be con(1), not cat(1). :-) It stands for catenate, always has done. Here's the man page from the 7th Edition of Unix from Bell Labs. wget -qO- http://www.cs.bell-labs.com/7thEdMan/vol1/man1.bun | sed -n '/^-\.TH CAT/,/GO\.SYSIN DD/s/-//p' | nroff -man So now I've got that out of the way, can someone explain what cat actually does with a compressed archive? I assume it doesn't understand the the content, so is it simply stitching the two together in dumb fashion? Yes. cat simply catenates all the files specified to its standard output, or reads standard input if no files are given. *With no options* it doesn't care about, look at, or interpret the files' content. If so, how would it be used to 'overlay' the tinycore.gz contents as is being suggested? Simon P Smith wrote: AFAIK gzip can have additional bits added since it is a collection of members which have a header and trailer which are sequentially processed. ... Concatenating gz files should result in a valid file. Simon's right. See ADVANCED USAGE in gzip(1); it's documented. $ (gzip hello printf 'world!\n' | gzip) | gunzip hello world! $ Tim Waugh wrote: The secret to this is that the gzip format is clever enough to work correctly when two gzip files are simply concatenated. I don't know the details of why it works, but I believe it's something to do with the streaming nature of gzip, compared with e.g. block-based compression such as bzip2. http://www.ietf.org/rfc/rfc1952.txt gives the file format; see 2.2. It's simply that a decompressor can tell when it's got to the end of the current compressed file, realises it's not yet at the end of the input, reads a little bit more and expects it to be the header for another whole compressed file. There is no carry-over of the dictionary from the first compression to the second so the nature of the compression method compared to bzip2 isn't the reason this is possible. In fact, bzip2(1) does it too. $ (bzip2 hello printf 'world!\n' | bzip2) | bunzip2 hello world! $ That lack of carry-over is why catenating compressed files gives worse overall compression than giving them all to one compressor to do, preferable with similar files sorted to be near one another. $ f=/etc/passwd $ (gzip $f gzip$f) | wc -c 1578 $ cat $f $f | gzip | wc -c 817 $ Back to Terry: My flawed experience told me that cat was used with text files and the man page I found earlier certainly didn't make it clear that any files could be cat'd together. Whether that makes any sense clearly depends on what those files are, but having understood that fact I was able to get to the next step; gzip files wouldn't be broken because of the way they are structured. Unix doesn't fundamentally distinguish between text and binary files at the kernel level as other OSes do. They're just a sequence of zero or more bytes. Unless stated otherwise, assume a command doesn't care whether the bytes could be considered as a LF-terminated sequence of zero or more lines of printable bytes. Not having the text/binary distinction is quite an advantage compared to, e.g. DOS, which also has its ASCII SUB, Ctrl-Z, file terminating byte; awful, mixing data and metadata. Cheers, Ralph. -- Next meeting: Bournemouth, Tuesday 2011-02-01 20:00 Meets, Mailing list, IRC, LinkedIn, ... http://dorset.lug.org.uk/ How to Report Bugs Effectively: http://goo.gl/4Xue
Re: [Dorset] Use of cat to concatenate .gz files
On Monday 17 Jan 2011, Ralph Corderoy wrote: Now my understanding of cat goes back to a noddy Unix course about 15-20 years ago, but I always thought the 'cat' stood for 'catalogue' and was used to list the content of a text file. Having looked at http://unixhelp.ed.ac.uk/CGI/man-cgi?cat though, I now know that it stands for concatenate That man page is wrong! It doesn't stand for concatenate, else it would be con(1), not cat(1). :-) It stands for catenate, always has done. Here's the man page from the 7th Edition of Unix from Bell Labs. wget -qO- http://www.cs.bell-labs.com/7thEdMan/vol1/man1.bun | sed -n '/^-\.TH CAT/,/GO\.SYSIN DD/s/-//p' | nroff -man I never even knew that catenate was a word ;-) Back to Terry: My flawed experience told me that cat was used with text files and the man page I found earlier certainly didn't make it clear that any files could be cat'd together. Whether that makes any sense clearly depends on what those files are, but having understood that fact I was able to get to the next step; gzip files wouldn't be broken because of the way they are structured. Unix doesn't fundamentally distinguish between text and binary files at the kernel level as other OSes do. They're just a sequence of zero or more bytes. Unless stated otherwise, assume a command doesn't care whether the bytes could be considered as a LF-terminated sequence of zero or more lines of printable bytes. Not having the text/binary distinction is quite an advantage compared to, e.g. DOS, which also has its ASCII SUB, Ctrl-Z, file terminating byte; awful, mixing data and metadata. I knew that really :-) The trouble is that I had only ever used cat with text files, so it never even occured to me, in this context, that a file is a file and it doesn't matter what is in it. -- Terry Coles 64 bit computing with Kubuntu Linux -- Next meeting: Bournemouth, Tuesday 2011-02-01 20:00 Meets, Mailing list, IRC, LinkedIn, ... http://dorset.lug.org.uk/ How to Report Bugs Effectively: http://goo.gl/4Xue
Re: [Dorset] Use of cat to concatenate .gz files
On 17/01/11 17:22, Terry Coles wrote: On Monday 17 Jan 2011, Ralph Corderoy wrote: Now my understanding of cat goes back to a noddy Unix course about 15-20 years ago, but I always thought the 'cat' stood for 'catalogue' and was used to list the content of a text file. Having looked at http://unixhelp.ed.ac.uk/CGI/man-cgi?cat though, I now know that it stands for concatenate That man page is wrong! It doesn't stand for concatenate, else it would be con(1), not cat(1). :-) It stands for catenate, always has done. Here's the man page from the 7th Edition of Unix from Bell Labs. wget -qO- http://www.cs.bell-labs.com/7thEdMan/vol1/man1.bun | sed -n '/^-\.TH CAT/,/GO\.SYSIN DD/s/-//p' | nroff -man I never even knew that catenate was a word ;-) It means the same as concatenate :) Tim -- Next meeting: Bournemouth, Tuesday 2011-02-01 20:00 Meets, Mailing list, IRC, LinkedIn, ... http://dorset.lug.org.uk/ How to Report Bugs Effectively: http://goo.gl/4Xue
Re: [Dorset] Use of cat to concatenate .gz files
On 17 January 2011 18:01, Tim Allen t...@ls83.eclipse.co.uk wrote: On 17/01/11 17:22, Terry Coles wrote: I never even knew that catenate was a word ;-) It means the same as concatenate :) Tim Moi non plus. I had always assumed that 'cat' was a perverted short-form of 'concatenate' and that calling it that was just 'humour' or laziness on the part of the early Unix designers. (BTW, regarding the use of cat on non-text files ... fastest way to clone a disk: cat /dev/hda2 /dev/hdb2 ) -- best regards, Victor Churchill, Bournemouth -- Next meeting: Bournemouth, Tuesday 2011-02-01 20:00 Meets, Mailing list, IRC, LinkedIn, ... http://dorset.lug.org.uk/ How to Report Bugs Effectively: http://goo.gl/4Xue
Re: [Dorset] Use of cat to concatenate .gz files
Hi Tim, I never even knew that catenate was a word ;-) It means the same as concatenate :) But it might make more sense to people, be more mnemonical, if its proper name of catenate was used. Then we wouldn't have things like I thought it stood for catalogue. :-) dict(1) says L. catenatus for catenate and L. concatenatus for concatenate. Perhaps there's more of a distinction in Latin? Anyway, given I often hear people complain about the difficulty in remembering command names and options I think it's worth being pedantic for the mnemonic benefit. :-) Cheers, Ralph. -- Next meeting: Bournemouth, Tuesday 2011-02-01 20:00 Meets, Mailing list, IRC, LinkedIn, ... http://dorset.lug.org.uk/ How to Report Bugs Effectively: http://goo.gl/4Xue
Re: [Dorset] Use of cat to concatenate .gz files
cat works with mpg files too: cat file1.mpg file2.mpg file3.mpg file3.mpg is a valid mpg file! -- Next meeting: Bournemouth, Tuesday 2011-02-01 20:00 Meets, Mailing list, IRC, LinkedIn, ... http://dorset.lug.org.uk/ How to Report Bugs Effectively: http://goo.gl/4Xue