add another wow from here :o 2009/7/1 jeremy jozwik <jerjoz.for...@gmail.com>: > wow > > On Wed, Jul 1, 2009 at 2:20 PM, Laszlo > KREKACS<laszlo.krekacs.l...@gmail.com> wrote: >>> I dont want to compress at all. The 118MB for me is perfect. I only >>> want to pack the directory into a file. But not compressing. >>> Im thinking about tar or ar. >> >> Hi! >> >> I have studied all the available archive and compression options. >> Most notably tar[1][2][4][6] and zip file format [3]. >> They are the most common archive types. I read also ar (dpkg >> and ipkg uses it) and cpio format. So I did my homework, and >> made some researches. >> >> Our requirements: >> - no compression (no wasted cpu time) >> - random access (no slow waiting time and memory issue) >> - readily available module/library for easy of integrating >> (best: no additional package is required to install on the phone) >> >> Tar completely fail at random access, simply it lacks the >> table of content, so accessing the last file in the archive >> requires reading the whole content before it. >> >> Zip support accessing each files in the archive, although >> it compress the file by default. >> >> There are dar[5] and xar[7], which meets our random access >> criteria. However dar needs to be ported to the device, and >> xar is still in development (that means limited python support >> for example). >> >> So I wrote down the most dumb archive fileformat ever;) >> When I wrote the specification, I only had one goal: >> make it so simple, that everybody can implement it, >> so no need to wait for ready-made library. >> >> It is called KISS fileformat (keep it simple and stupid), >> the preferred extension would be filename.kiss >> >> You can read it here, I also included it (at the end of mail) >> for reference: >> http://pastebin.com/m608acaeb >> >> I think it is suitable for our map tile usage. >> >> What do you think? >> >> Best regards, >> Laszlo >> >> [1]: http://en.wikipedia.org/wiki/Tar_(file_format) >> [2]: http://www.python.org/doc/2.5.2/lib/module-tarfile.html >> [3]: http://www.python.org/doc/2.5.2/lib/module-zipfile.html >> [4]: http://en.wikipedia.org/wiki/Comparison_of_file_archivers >> [5]: http://en.wikipedia.org/wiki/DAR_(Disk_Archiver) >> [6]: http://en.wikipedia.org/wiki/Archive_formats >> [7]: http://code.google.com/p/xar/ >> >> KISS archive fileformat specification: >> >> # KISS archive format (Keep It Simple and Stupid) >> >> ## General properties >> - blocksize: 512 bytes >> - only store filename (and directory if any) and content >> - first file contains the filenames >> - header: start block, end block, position of last block >> >> ## Overall file structure >> [header][filenames][1. file][2. file][3. file] >> >> ## [header] >> [SB][EB][POS] [SB][EB][POS] [SB][EB][POS] etc.. >> [ 4][ 4][ 2] [ 4][ 4][ 2] [ 4][ 4][ 2] etc.. >> [ header ] [ filenames ] [1. file ] etc.. >> >> SB (start block): 4 byte >> EB (end block): 4 byte >> POS (position of last block): 2 byte >> >> All numbers are stored big-endian. That means most significant bit first. >> Example: >> 613 dec = 265 hex = \00 \00 \02 \65 (4 bytes) >> 130411 dec = 1FD6B hex = \00 \01 \FD \6B (4 bytes) >> >> Note: >> The remaining part of the header block MUST be filled with zero bytes. >> You will always have remaining part in the block, simply each file >> takes 10 bytes. (512/10 = 51 and 2 bytes left) >> >> ## [filenames] >> UTF-8 text for each filename, delimited with '\n' byte. >> The directory structure is preserved too. >> [name of 1. file]['\n'][name of 2. file]['\n'][name of 3. file] etc.. >> >> Some examples: >> this is a file.txt >> this2.tar.gz >> this3.html >> images/loller.html >> weird_dir/this\/files contains\/several\\ slashes.txt >> >> Special characters: >> '\n': You cant have '\n' character in the filename. It is preserved. >> (it is not supported in most filesystems anyway) >> '/': directory delimiter. To save directory structure. >> '\/': if the filename itself contains an / character >> '\\': if the filename itself contains a \ character >> >> >> ## [X. file] >> The file content as is. >> >> >> ## FAQ: >> Q: Why another archive format? >> A: Because it is the most dumb format ever;) >> >> Q: Why not tar, ar, zip, [name archive type here]? >> A: Short answer: widely used archive format are not suited for random access >> with no compression. >> Long answer: tar: there is no index, reading the last file of the archive >> requires reading the whole file before it. >> zip: individual files are compressed, which means: >> processortime >> xar: it would fit the requirements, but it is not widely >> supported, and not in every language. >> >> Q: I use X language does KISS supported there? >> A: The fileformat is so simple, it is intented, every programmer >> could implement it in "no time". >> >> Q: Does compression supported? >> A: No. But you can compress the whole file, >> just like in tar case: filename.kiss.bz2. Use it for file sharing. >> >> Q: Do advanced features (rights, symlinks, hardlinks, user/group/other) are >> preserved? >> A: No. It was not the goal of this archive. Although you can implement it, >> just >> write those informations in the first file. It is not recommended. >> >> Q: If the original file is not multiple of 512 bytes, how it will look in the >> archive, how many bytes will it take? >> A: Lets have an example. We have three files: >> 768bytes file, 1024 bytes, 2047 bytes >> First file (768 bytes) will take two blocks: 2*512 = 1024 bytes >> Second file (1024 bytes) will take two blocks too: 2*512 = 1024 bytes >> Third file (2047 bytes) will take four blocks: 4*512 = 2048 bytes >> Lets name the files: >> - "first filename.extension", (24 bytes) >> - "second try", (10 bytes) >> - "I want a sexy name.txt", (22 bytes) >> The [filenames] section: >> [24 bytes][1 byte][10 bytes][1 byte][22 bytes] = 58bytes = 1 block >> >> The header ([SB][EB][POS]): >> [00 00 00 00][00 00 00 00][00 31] (this is the header itself, start at the >> 0. block, ends at 0. block, and >> the header is 50 bytes long. That means >> start at the 0. byte and >> ends at the at the 49. byte. >> (0..49 = 50 bytes, which is 0x31)) >> [00 00 00 01][00 00 00 01][00 39] (58 byte long, that means 0..57 bytes >> and 57 dec = 0x39 hexa) >> [00 00 00 02][00 00 00 03][00 FF] (768-512 = 256, so 0..255 bytes. >> 255dec = FF hexa) >> [00 00 00 04][00 00 00 05][01 FF] (1024 bytes = 2 blocks, the >> second is full) >> [00 00 00 06][00 00 00 09][01 FE] ( 2047-(3*512) = 511. 510dec = 1FE hexa) >> >> >> The overall filesize: >> [header][filenames][1. file][2. file][3. file] >> [ 1 ][ 1 ][ 2 ][ 2 ][ 4 ] = 10 blocks = 5120B = 5kB >> >> Q: How is it filled the unused part of the block (if the file is >> not multiple of 512 bytes) ? >> A: It can be random bytes. But should be zero bytes. Or checksum if there is >> enough space left (see section "An insane idea for checksums"). >> >> ## Implementation advices >> >> 1. Count the files what you want to archive -> you know how much >> space is required by the header. 1-49 files requires one block for the >> header >> (10 bytes for the header, 10 bytes for the filenames section, x*10 bytes >> for the files itself. Maximum x is 49 for one block) >> 2. dump 0xFF for the header (look at the "tape archiving" to understand why >> FF) >> 3. Generate the filenames section and write the filenames section. >> 4. Attache each file to the archive, and generate the header >> on-the-fly (in memory) >> 4. Overwrite the header with valid data. >> >> ## An insane idea for checksums (integrity checking) >> >> Here is the idea, write the checksum at the remaining space, if the >> file is multiple of 512 byte, write two checksums at the next end of file, >> if there is no enough space, write it at the next file, and so on. >> If each file was multiple of 512 bytes, or there are not enough space >> at the end of each files. There will be no checksum for some of the last >> files >> (but it is always better then having no checksums at all). >> Which should be rare, but if you are worried about it, you can always add >> a new file with all the necessary informations. >> >> This section is not mandatory for the fileformat. So if you are brave enough, >> implement it! If you dont care, no worries. >> >> CRC32 is 4 bytes (32 bits) long. >> >> I think 4bytes should be safe enough;) >> A little example code in python how to calculate it: >> import binascii >> >> def crc2hex(crc): >> res='' >> for i in range(4): >> t=crc & 0xFF >> crc >>= 8 >> res='%02X%s' % (t, res) >> return res >> >> if __name__=='__main__': >> test='hello world! and Python too ;)' >> crc=binascii.crc32(test) >> print 'CRC:', crc >> hex_str = crc2hex(crc) >> print 'CRC in hex:', hex_str >> print 'in byte representation: ', hex_str.decode("hex") >> >> MD5sum is 16bytes long. >> >> The CRC32 (4bytes) is recommended. It is enough to detect inconsistencies. >> >> >> ## An another insane idea for tape archiving >> >> Who the hell use tapes these days?;) >> So if the first block is filled with FF hexa, it means the header is at the >> end of the archive file, the tailer is more right term;) >> So when you archive the tape, you cant reverse and >> write the header at the beginning of file. >> In that case, the header (at the end of file) is in REVERSE order. >> So the last 4 bytes tells where the header begins. So no need to search for >> it. >> Simply read the last 4 bytes, determine where the header >> begins(reverse order!), >> and read those blocks. Reverse the byte orders, and thats way you can process >> it normally. >> >> _______________________________________________ >> Openmoko community mailing list >> community@lists.openmoko.org >> http://lists.openmoko.org/mailman/listinfo/community >> > > _______________________________________________ > Openmoko community mailing list > community@lists.openmoko.org > http://lists.openmoko.org/mailman/listinfo/community >
-- David Reyes Samblas Martinez http://www.tuxbrain.com Open ultraportable & embedded solutions Openmoko, Openpandora, Arduino Hey, watch out!!! There's a linux in your pocket!!! _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community