Re: New archive file format (was: [omgps] collect feature requests)
Hi Bilk: Don't worry :) I've said that I'm afraid of the corruption. So this feature will be configurable if it can be integrated. 2009/7/2 William Kenworthy (via Nabble) : > I hope not - I have over 2 million tiles stored on SD card - if file > corruption or disaster occurs, it may affect only one tile if its being > accessed at the time - imagine the effect of file system corruption on > one large archive ... you will most likely lose the lot. > > Then there is the extra overhead needed - Ive gotta ask "why"? - if you > can justify the extra cpu needed for this, why not do vector maps? > > BillK > > > On Thu, 2009-07-02 at 00:42 -0700, mqy wrote: >> x and y are tile no in tile coordinate system within range of [0.. >> 2^zoom). >> just do it if you have time, since proof of concept is necessary :) keep >> in >> mind clear APIs. >> it's likely that, the final version to be integrated into omgps is >> rewritten >> in C. >> >> >> Laszlo KREKACS wrote: >> > >> > If I understand right the OSM tiles, they have the following directory >> > ... >> > >> > -- > William Kenworthy > Home in Perth! > > > ___ > Openmoko community mailing list > commun...@... > http://lists.openmoko.org/mailman/listinfo/community > > > > This email is a reply to your post @ > http://n2.nabble.com/New-archive-file-format-%28was%3A--omgps--collect-feature-requests%29-tp3191899p3193977.html > You can reply by email or by visting the link above. > > -- View this message in context: http://n2.nabble.com/New-archive-file-format-%28was%3A--omgps--collect-feature-requests%29-tp3191899p3205707.html Sent from the Openmoko Community mailing list archive at Nabble.com. ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
On Thu, Jul 2, 2009 at 10:24 AM, Laszlo KREKACS wrote: > Need a serious benchmark here, if the extra overhead is true or not. Ok, I have written the python implementation of the file archive maker. I need to finish (ie. write the unpacking part) of it. I compiled few benchmarks... I compressed the whole OSM maps tiles on my laptop (I repeated it 10 times): l...@buldergep:~/Maps/OSM$ echo -e "\noutput.kiss"; time python ../../Asztal/down/openmoko/paroli/data/kiss/kiss.py >> ../report.txt;mv output.kiss ..; echo -e "\noutput.tar"; time tar -cf ../output.tar .; echo -e "\noutput.zip"; time zip -0 -r output * >> ../report.txt; mv output.zip ..; echo -e "\noutput_comp.zip"; time zip -r output_comp * >> ../report.txt; mv output_comp.zip ..; rm ../output*; rm ../report.txt output.kiss real0m4.447s user0m2.748s sys 0m1.520s output.tar real0m4.039s user0m0.236s sys 0m1.188s output.zip real0m5.556s user0m1.276s sys 0m2.632s output_comp.zip real0m12.438s user0m8.437s sys 0m2.620s So the speed is about the same as in .tar file case. And it beats the zip. File sizes: -rw-r--r-- 1 lol lol 109M 2009-07-02 19:11 output_comp.zip -rw-r--r-- 1 lol lol 125M 2009-07-02 19:11 output.kiss -rw-r--r-- 1 lol lol 156M 2009-07-02 19:11 output.tar -rw-r--r-- 1 lol lol 113M 2009-07-02 19:11 output.zip -rw-r--r-- 1 lol lol 93M 2009-07-02 19:11 output.kiss.bz2 -rw-r--r-- 1 lol lol 94M 2009-07-02 19:11 output.tar.bz2 Total size of invidual files: l...@buldergep:~/Maps/OSM$ du -hs . 290M. Pretty strange, it reserves half the size I think this file format worth the effort. Best regards, Laszlo ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
El Wednesday, 1 de July de 2009 23:20:40 Laszlo KREKACS va escriure: > ## General properties > - blocksize: 512 bytes > - only store filename (and directory if any) and content > - first file contains the filenames > - header: start block, end block, position of last block > > ## Overall file structure > [header][filenames][1. file][2. file][3. file] > > ## [header] > [SB][EB][POS] [SB][EB][POS] [SB][EB][POS] etc.. My first reaction to this was: Why do you need this? My points are: 1- With this format the resulting archive is near read only (every few inserts need the whole file should be rewrote. One could use a loop mounted filesystem and use well tested tools. 2- To make it usefull with every app I think we need to mount it with fuse. 3- Not enogh metadata. I think it could be simpler that way Metadata Block [0..511] [0..3] Previus # metadata block (last block for fist metadata block) [4..7] Next#Metadata Block (First on last metadata block [8..] Metadata_items #list of Metadata_item Metadata_item [0..1] Metadata_size #Bytes; [2] Kind # of metadata (Name, Block,Size, Date,CRC, ...) [3..6] file Id [8..Metadata_size-1] Value; Block Value [0..3] Start Block [4..7] End Block The example on QA soud could have the folowing metadata be: 00 00 00 00 # Previous 00 00 00 00 # Next 00 1F 01 00 00 00 01 "first filename.extension" #31 Bytes Name id 1 00 11 01 00 00 00 02 "second try" # 17 Bytes Name id 2 00 1D 01 00 00 00 03 "I want a sexy name.txt" #29 Bytes Name id 4 00 0F 02 00 00 00 01 00 00 00 01 00 00 00 02 # id 1 blocks 1-2 00 0F 02 00 00 00 02 00 00 00 03 00 00 00 04 # id 2 blocks 3-4 00 0F 02 00 00 00 03 00 00 00 05 00 00 00 08 # id 3 blocks 5-8 00 0B 03 00 00 00 01 00 00 03 00 #id 1 768 bytes 00 0B 03 00 00 00 02 00 00 04 00 #id 2 1024 bytes 00 0B 03 00 00 00 03 00 00 07 FF #id 3 2047 bytes 00 00 #end of metadata And a total file size of 9 blocks or 4608 bytes but with the same disk usage of 8kb. ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
On Thu, Jul 2, 2009 at 10:15, Laszlo KREKACS wrote: > On Thu, Jul 2, 2009 at 8:42 AM, Alexander Shulgin > wrote: >> I fail to see how is this true for normal tar files (vs. data read >> from pipe). Can you elaborate please? > > Yepp, of course;) > [snip] > > Simplification of tar archive: > [1. file header][1. file][2.file header][2. file][3. file header][3. file] > > So how you read the third file from the archive? You read the file until the > [3. file header], your test is successfull (is it the right file?), > and you read the > file itself. You see? You have read the whole file, just accessing the > last item inside. Yes, but is lseek(2) banned on neo? This is what I was talking about then mentioned normal files (i.e. not pipes). :) >> Pardon my ignorance, but wouldn't zip -0 do the trick for your purpose? :) > > It will do more or less, however there are three main problems with it: > > 1. you can only obtain the whole file from the archive. So you cant > read a part of the file. So if you packed lets say a 700MB file to zip, > you run out of memory on neo. > At least this is the case on standard python zipfile module. > > 2. There is no random access feature, at > least not in standard python modules. > 3. There are significant processor time wasted when accessing to a file > (many computation required). Btw, it needs to benchmark on the neo, how > worse is it. OK, I see now. Thanks for explanation. -- Cheers, Alex ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
2009/7/2 Dr. H. Nikolaus Schaller > I do not completely understand the reasons why there is a need for > (once again) a new file format. > As far as I understand the proposal, it is just a file system running > in an image file. Like mounting an ISO or any other file system > residing not on a raw disk but within a file. > Good point. You can divide the archive into 100kb blocks and use mount -o loop on them. When you run out of space, just create a new block. ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
I do not completely understand the reasons why there is a need for (once again) a new file format. As far as I understand the proposal, it is just a file system running in an image file. Like mounting an ISO or any other file system residing not on a raw disk but within a file. So what problem does it solve better than just using the existing file system hierarchy directly (/tiles/z/y/x.png)? If it does not compress, has no directories, is not faster and is not more reliable as William pointed out. I see only one benefit - you can copy the whole archive as a single object instead of copying a file tree. New file formats usually create more problems than they solve... Am 02.07.2009 um 10:08 schrieb William Kenworthy: > I hope not - I have over 2 million tiles stored on SD card - if file > corruption or disaster occurs, it may affect only one tile if its > being > accessed at the time - imagine the effect of file system corruption on > one large archive ... you will most likely lose the lot. > > Then there is the extra overhead needed - Ive gotta ask "why"? - if > you > can justify the extra cpu needed for this, why not do vector maps? > > BillK > > > On Thu, 2009-07-02 at 00:42 -0700, mqy wrote: >> x and y are tile no in tile coordinate system within range of [0.. >> 2^zoom). >> just do it if you have time, since proof of concept is necessary :) >> keep in >> mind clear APIs. >> it's likely that, the final version to be integrated into omgps is >> rewritten >> in C. >> >> >> Laszlo KREKACS wrote: >>> >>> If I understand right the OSM tiles, they have the following >>> directory >>> ... >>> >> > -- > William Kenworthy > Home in Perth! > > > ___ > Openmoko community mailing list > community@lists.openmoko.org > http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
On Thu, Jul 2, 2009 at 10:08 AM, William Kenworthy wrote: > I hope not - I have over 2 million tiles stored on SD card - if file > corruption or disaster occurs, it may affect only one tile if its being > accessed at the time My experience differs completely from yours. >- imagine the effect of file system corruption on > one large archive ... you will most likely lose the lot. I would even prefer to loosing my map files (backups?), than crashing the whole filesystem. However this is not the case. I dont intent to push "pack everything into a single file". Instead have about 1MB files. (or pack subdirs only) So if you want to loose something, loose 1MB. But this fileformat should be safe enough, if the header is untouched, you can recover files from the archive (and there are the checksum options too) > Then there is the extra overhead needed - Ive gotta ask "why"? - if you > can justify the extra cpu needed for this, why not do vector maps? Need a serious benchmark here, if the extra overhead is true or not. In opposite of your opinion, I expect speed improvements. ;) Laszlo ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
I hope not - I have over 2 million tiles stored on SD card - if file corruption or disaster occurs, it may affect only one tile if its being accessed at the time - imagine the effect of file system corruption on one large archive ... you will most likely lose the lot. Then there is the extra overhead needed - Ive gotta ask "why"? - if you can justify the extra cpu needed for this, why not do vector maps? BillK On Thu, 2009-07-02 at 00:42 -0700, mqy wrote: > x and y are tile no in tile coordinate system within range of [0.. 2^zoom). > just do it if you have time, since proof of concept is necessary :) keep in > mind clear APIs. > it's likely that, the final version to be integrated into omgps is rewritten > in C. > > > Laszlo KREKACS wrote: > > > > If I understand right the OSM tiles, they have the following directory > > ... > > > -- William Kenworthy Home in Perth! ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
> Here is the updated specification (I added two more faq entries, > max filesize and max filename length): > http://pastebin.com/f51927121 I made a small mistake (header structure), here we go: http://pastebin.com/f5feafd7a Laszlo ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
On Thu, Jul 2, 2009 at 9:42 AM, mqy wrote: > > x and y are tile no in tile coordinate system within range of [0.. 2^zoom). > just do it if you have time, since proof of concept is necessary :) keep in > mind clear APIs. > it's likely that, the final version to be integrated into omgps is rewritten > in C. Ok. I'll do it. I will put the [filenames] section at the end of file. That way appending to the file is dead simple. The header structure will be the same, so between 10-20 bytes are always the [filenames] position. I was thinking more about the metadata stuff. If we agree on a filename, like .metadata-kiss, and attache it as a simple file, there is no importance where in the archive should be placed (but I think should be placed at the end): [header] [1. file] [2. file] [3. file = metadata file] [filenames] But metadata is really for future consideration (if people find this archive format useful). I will also make some test archive file along with the kiss and unkiss program, to easy implementing. Here is the updated specification (I added two more faq entries, max filesize and max filename length): http://pastebin.com/f51927121 Laszlo ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
x and y are tile no in tile coordinate system within range of [0.. 2^zoom). just do it if you have time, since proof of concept is necessary :) keep in mind clear APIs. it's likely that, the final version to be integrated into omgps is rewritten in C. Laszlo KREKACS wrote: > > If I understand right the OSM tiles, they have the following directory > ... > -- View this message in context: http://n2.nabble.com/New-archive-file-format-%28was%3A--omgps--collect-feature-requests%29-tp3191899p3193890.html Sent from the Openmoko Community mailing list archive at Nabble.com. ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
On Thu, Jul 2, 2009 at 9:15 AM, Patryk Benderz wrote: > - only store filename (and directory if any) and content" > It might be convenient for future to store file properties like time of > modification. This way you could implement automatic update of tiles > that have been modified since last update or archive creation. This is a neverending game. You store one properties, others want other property to store. You finally ends something overcomplicated like an xml structure. And for accessing the files inside the archive, you dont need this infos at all. But this fileformat is flexible enough, just attach the metainformation of files as a file into the archive! And your problem is solved, and it is future-proof. So the file structure would be something like this in your case: [header] [filenames] [1. file = metadata file] [2. file] [3. file] [4. file] However this fileformat is not final, Im open for suggestions;) I didnt decided if the filenames section should go at the end, or right after the header. And where to store the metadata, if any? (start vs. end of file) So [header] [filenames] [1. file] [2. file] [3. file] vs. [header] [1. file] [2. file] [3. file] [filenames] The same with metadata: [header] [filenames] [1. file = metadata file] [2. file] [3. file] vs. [header] [1. file] [2. file] [3. file = metadata file] [filenames] Need a bit of thinking here. Laszlo ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
On Thu, Jul 2, 2009 at 8:42 AM, Alexander Shulgin wrote: > I fail to see how is this true for normal tar files (vs. data read > from pipe). Can you elaborate please? Yepp, of course;) Tar archive does not contain the byte positions of files inside the archive. That means accessing a file inside the archive needs to read the whole content before it, and determine where each file ends. (and you test if you are at the desired file by reading its header). It simply lacks of a TOC (table of content). So accessing the last file in the archive reuires to reading the whole archive. You can read it here: http://en.wikipedia.org/wiki/Tar_(file_format)#Format_details Simplification of tar archive: [1. file header][1. file][2.file header][2. file][3. file header][3. file] So how you read the third file from the archive? You read the file until the [3. file header], your test is successfull (is it the right file?), and you read the file itself. You see? You have read the whole file, just accessing the last item inside. >> Zip support accessing each files in the archive, although >> it compress the file by default. > > Pardon my ignorance, but wouldn't zip -0 do the trick for your purpose? :) It will do more or less, however there are three main problems with it: 1. you can only obtain the whole file from the archive. So you cant read a part of the file. So if you packed lets say a 700MB file to zip, you run out of memory on neo. At least this is the case on standard python zipfile module. 2. There is no random access feature, at least not in standard python modules. 3. There are significant processor time wasted when accessing to a file (many computation required). Btw, it needs to benchmark on the neo, how worse is it. Best regards, Laszlo ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
> You can read it here, I also included it (at the end of mail) > for reference: > http://pastebin.com/m608acaeb >From your reference: "## General properties - blocksize: 512 bytes - only store filename (and directory if any) and content" It might be convenient for future to store file properties like time of modification. This way you could implement automatic update of tiles that have been modified since last update or archive creation. -- Kind Regards Patryk Benderz IT Specialist Linux Registered User #377521 +48 22 538 6292 ERSTE Securities Polska S.A. ul. Królewska 16 Warszawa 00-103 KRS 065121 NIP 526-10-27-638 REGON 011136053 Kapitał akcyjny: 15.500.000 złotych (w pełni opłacony) This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the material. The sender does not accept liability for any errors or omissions as a result of the transmission. Email secured by Check Point ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
On Thu, Jul 2, 2009 at 7:22 AM, mqy wrote: > > XML is used as a database, elements can be easily added, modified, removed. > xor tends to be overkilled as of map tile usage -- we don't need iterating, > delete, and that much meta information. With my suggested design, we can > even add newly downloaded tiles: > insert record into meta database, and append tile content into "heap" file. If I understand right the OSM tiles, they have the following directory structure: XX/YYY/ZZZ.png 10/558/357.png All the information is obtained from this info (position, zoom level), am I right? XX: zoom level YYY: position x? (or I dont know how to call it;) ZZZ: position y? So I think we should only pack the files into the KISS archive file. For more in depth explanation see the end of this mail. So something like this: XX/YY1.kiss XX/YY2.kiss XX/YYY.kiss Im willing to implement a simple kiss/unkiss program (just like tar/untar), for easy archiving. I will use python with no non-standard modules. Best regards, Laszlo ps: Some statistical data: Number of all tiles # cd ~/Maps/OSM; find . -name *.png |wc -l 63818 Subdirs in zoom level dirs (YYY), and total number of files. for i in *; do echo -n $i; echo -n " "; cd $i; ls -1|wc -l; cd ..; done for f in *; do cd $f; for i in *; do cd $i; for k in *; do echo "$i/$k" >> ~/Maps/OSM/$f.txt; done; cd ..; done; cd ..; done cd ~/Maps/OSM for i in *.txt; do echo -n "$i "; cat $i|wc -l ; done 2: 4 dirs, 16 files 3: 8 dirs, 64 files 4: 11 dirs,77 files 5: 17 dirs,83 files 6: 22 dirs, 265 files 7: 22 dirs, 217 files 8: 17 dirs,75 files 9: 26 dirs, 152 files 10: 39 dirs, 426 files 11: 71 dirs, 1484 files 12: 100 dirs, 1046 files 13: 78 dirs, 2902 files 14: 193 dirs, 23400 files 15: 86 dirs, 1941 files 16: 119 dirs, 4033 files 17: 277 dirs, 27637 files Count the files in the subdirs(ZZZ): for i in *; do echo -n $i; echo -n "f "; cd $i; for j in *; do cd $j; echo -n "$i # $j @"; ls -1|wc -l; cd ..; done; cd ..; done The number of files is in general 20-30, and the maximum was 180. ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
On Thu, Jul 2, 2009 at 00:20, Laszlo KREKACS wrote: >> I dont want to compress at all. The 118MB for me is perfect. I only >> want to pack the directory into a file. But not compressing. >> Im thinking about tar or ar. > > Tar completely fail at random access, simply it lacks the > table of content, so accessing the last file in the archive > requires reading the whole content before it. I fail to see how is this true for normal tar files (vs. data read from pipe). Can you elaborate please? > Zip support accessing each files in the archive, although > it compress the file by default. Pardon my ignorance, but wouldn't zip -0 do the trick for your purpose? :) -- Regards, Alex ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
XML is used as a database, elements can be easily added, modified, removed. xor tends to be overkilled as of map tile usage -- we don't need iterating, delete, and that much meta information. With my suggested design, we can even add newly downloaded tiles: insert record into meta database, and append tile content into "heap" file. Laszlo KREKACS wrote: > > ... > I have only one problem with xar: xml. > It complicates things unnecessary. > ... > Laszlo > -- View this message in context: http://n2.nabble.com/New-archive-file-format-%28was%3A--omgps--collect-feature-requests%29-tp3191899p3193580.html Sent from the Openmoko Community mailing list archive at Nabble.com. ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
Hi! Thank you for the kind words. > I'd like to see if xar works well too. I have only one problem with xar: xml. It complicates things unnecessary. Laszlo ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
Thumb up for your effort:) A simpler design choice would be: 1. int get_tile_meta(int zoom, int x, int y, TileMeta *tm) -- fill in , ; return -1 if tile not found -- implemented by collecting per map meta info into a sqlite database 2. int get_tile_bytes(char* buf, TileMeta *tm) -- read tile content into of -- implemented by collecting tiles into a big file. Where TileMeta is defined as: struct TileMeta { int offset; int size; U4 crc; char *name; // optional }; This kind of data source is abstracted as a tile provider, in addition to the default standard file system based one. I'd like to see if xar works well too. regards, mqy Laszlo KREKACS wrote: > > Hi! > > I have studied all the available archive and compression options. > ... > Best regards, > Laszlo > ... > -- View this message in context: http://n2.nabble.com/New-archive-file-format-%28was%3A--omgps--collect-feature-requests%29-tp3191899p3193471.html Sent from the Openmoko Community mailing list archive at Nabble.com. ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: New archive file format (was: [omgps] collect feature requests)
add another wow from here :o 2009/7/1 jeremy jozwik : > wow > > On Wed, Jul 1, 2009 at 2:20 PM, Laszlo > KREKACS wrote: >>> I dont want to compress at all. The 118MB for me is perfect. I only >>> want to pack the directory into a file. But not compressing. >>> Im thinking about tar or ar. >> >> Hi! >> >> I have studied all the available archive and compression options. >> Most notably tar[1][2][4][6] and zip file format [3]. >> They are the most common archive types. I read also ar (dpkg >> and ipkg uses it) and cpio format. So I did my homework, and >> made some researches. >> >> Our requirements: >> - no compression (no wasted cpu time) >> - random access (no slow waiting time and memory issue) >> - readily available module/library for easy of integrating >> (best: no additional package is required to install on the phone) >> >> Tar completely fail at random access, simply it lacks the >> table of content, so accessing the last file in the archive >> requires reading the whole content before it. >> >> Zip support accessing each files in the archive, although >> it compress the file by default. >> >> There are dar[5] and xar[7], which meets our random access >> criteria. However dar needs to be ported to the device, and >> xar is still in development (that means limited python support >> for example). >> >> So I wrote down the most dumb archive fileformat ever;) >> When I wrote the specification, I only had one goal: >> make it so simple, that everybody can implement it, >> so no need to wait for ready-made library. >> >> It is called KISS fileformat (keep it simple and stupid), >> the preferred extension would be filename.kiss >> >> You can read it here, I also included it (at the end of mail) >> for reference: >> http://pastebin.com/m608acaeb >> >> I think it is suitable for our map tile usage. >> >> What do you think? >> >> Best regards, >> Laszlo >> >> [1]: http://en.wikipedia.org/wiki/Tar_(file_format) >> [2]: http://www.python.org/doc/2.5.2/lib/module-tarfile.html >> [3]: http://www.python.org/doc/2.5.2/lib/module-zipfile.html >> [4]: http://en.wikipedia.org/wiki/Comparison_of_file_archivers >> [5]: http://en.wikipedia.org/wiki/DAR_(Disk_Archiver) >> [6]: http://en.wikipedia.org/wiki/Archive_formats >> [7]: http://code.google.com/p/xar/ >> >> KISS archive fileformat specification: >> >> # KISS archive format (Keep It Simple and Stupid) >> >> ## General properties >> - blocksize: 512 bytes >> - only store filename (and directory if any) and content >> - first file contains the filenames >> - header: start block, end block, position of last block >> >> ## Overall file structure >> [header][filenames][1. file][2. file][3. file] >> >> ## [header] >> [SB][EB][POS] [SB][EB][POS] [SB][EB][POS] etc.. >> [ 4][ 4][ 2] [ 4][ 4][ 2] [ 4][ 4][ 2] etc.. >> [ header ] [ filenames ] [1. file ] etc.. >> >> SB (start block): 4 byte >> EB (end block): 4 byte >> POS (position of last block): 2 byte >> >> All numbers are stored big-endian. That means most significant bit first. >> Example: >> 613 dec = 265 hex = \00 \00 \02 \65 (4 bytes) >> 130411 dec = 1FD6B hex = \00 \01 \FD \6B (4 bytes) >> >> Note: >> The remaining part of the header block MUST be filled with zero bytes. >> You will always have remaining part in the block, simply each file >> takes 10 bytes. (512/10 = 51 and 2 bytes left) >> >> ## [filenames] >> UTF-8 text for each filename, delimited with '\n' byte. >> The directory structure is preserved too. >> [name of 1. file]['\n'][name of 2. file]['\n'][name of 3. file] etc.. >> >> Some examples: >> this is a file.txt >> this2.tar.gz >> this3.html >> images/loller.html >> weird_dir/this\/files contains\/several\\ slashes.txt >> >> Special characters: >> '\n': You cant have '\n' character in the filename. It is preserved. >> (it is not supported in most filesystems anyway) >> '/': directory delimiter. To save directory structure. >> '\/': if the filename itself contains an / character >> '\\': if the filename itself contains a \ character >> >> >> ## [X. file] >> The file content as is. >> >> >> ## FAQ: >> Q: Why another archive format? >> A: Because it is the most dumb format ever;) >> >> Q: Why not tar, ar, zip, [name archive type here]? >> A: Short answer: widely used archive format are not suited for random access >> with no compression. >> Long answer: tar: there is no index, reading the last file of the archive >> requires reading the whole file before it. >> zip: individual files are compressed, which means: >> processortime >> xar: it would fit the requirements, but it is not widely >> supported, and not in every language. >> >> Q: I use X language does KISS supported there? >> A: The fileformat is so simple, it is intented, every programmer >> could implement it in "no time". >> >> Q: Does compression supported? >> A: No. But you can compress the whole file, >> just like in tar case: filename
Re: New archive file format (was: [omgps] collect feature requests)
wow On Wed, Jul 1, 2009 at 2:20 PM, Laszlo KREKACS wrote: >> I dont want to compress at all. The 118MB for me is perfect. I only >> want to pack the directory into a file. But not compressing. >> Im thinking about tar or ar. > > Hi! > > I have studied all the available archive and compression options. > Most notably tar[1][2][4][6] and zip file format [3]. > They are the most common archive types. I read also ar (dpkg > and ipkg uses it) and cpio format. So I did my homework, and > made some researches. > > Our requirements: > - no compression (no wasted cpu time) > - random access (no slow waiting time and memory issue) > - readily available module/library for easy of integrating > (best: no additional package is required to install on the phone) > > Tar completely fail at random access, simply it lacks the > table of content, so accessing the last file in the archive > requires reading the whole content before it. > > Zip support accessing each files in the archive, although > it compress the file by default. > > There are dar[5] and xar[7], which meets our random access > criteria. However dar needs to be ported to the device, and > xar is still in development (that means limited python support > for example). > > So I wrote down the most dumb archive fileformat ever;) > When I wrote the specification, I only had one goal: > make it so simple, that everybody can implement it, > so no need to wait for ready-made library. > > It is called KISS fileformat (keep it simple and stupid), > the preferred extension would be filename.kiss > > You can read it here, I also included it (at the end of mail) > for reference: > http://pastebin.com/m608acaeb > > I think it is suitable for our map tile usage. > > What do you think? > > Best regards, > Laszlo > > [1]: http://en.wikipedia.org/wiki/Tar_(file_format) > [2]: http://www.python.org/doc/2.5.2/lib/module-tarfile.html > [3]: http://www.python.org/doc/2.5.2/lib/module-zipfile.html > [4]: http://en.wikipedia.org/wiki/Comparison_of_file_archivers > [5]: http://en.wikipedia.org/wiki/DAR_(Disk_Archiver) > [6]: http://en.wikipedia.org/wiki/Archive_formats > [7]: http://code.google.com/p/xar/ > > KISS archive fileformat specification: > > # KISS archive format (Keep It Simple and Stupid) > > ## General properties > - blocksize: 512 bytes > - only store filename (and directory if any) and content > - first file contains the filenames > - header: start block, end block, position of last block > > ## Overall file structure > [header][filenames][1. file][2. file][3. file] > > ## [header] > [SB][EB][POS] [SB][EB][POS] [SB][EB][POS] etc.. > [ 4][ 4][ 2] [ 4][ 4][ 2] [ 4][ 4][ 2] etc.. > [ header ] [ filenames ] [1. file] etc.. > > SB (start block): 4 byte > EB (end block): 4 byte > POS (position of last block): 2 byte > > All numbers are stored big-endian. That means most significant bit first. > Example: > 613 dec = 265 hex = \00 \00 \02 \65 (4 bytes) > 130411 dec = 1FD6B hex = \00 \01 \FD \6B (4 bytes) > > Note: > The remaining part of the header block MUST be filled with zero bytes. > You will always have remaining part in the block, simply each file > takes 10 bytes. (512/10 = 51 and 2 bytes left) > > ## [filenames] > UTF-8 text for each filename, delimited with '\n' byte. > The directory structure is preserved too. > [name of 1. file]['\n'][name of 2. file]['\n'][name of 3. file] etc.. > > Some examples: > this is a file.txt > this2.tar.gz > this3.html > images/loller.html > weird_dir/this\/files contains\/several\\ slashes.txt > > Special characters: > '\n': You cant have '\n' character in the filename. It is preserved. > (it is not supported in most filesystems anyway) > '/': directory delimiter. To save directory structure. > '\/': if the filename itself contains an / character > '\\': if the filename itself contains a \ character > > > ## [X. file] > The file content as is. > > > ## FAQ: > Q: Why another archive format? > A: Because it is the most dumb format ever;) > > Q: Why not tar, ar, zip, [name archive type here]? > A: Short answer: widely used archive format are not suited for random access > with no compression. > Long answer: tar: there is no index, reading the last file of the archive > requires reading the whole file before it. >zip: individual files are compressed, which means: > processortime >xar: it would fit the requirements, but it is not widely > supported, and not in every language. > > Q: I use X language does KISS supported there? > A: The fileformat is so simple, it is intented, every programmer > could implement it in "no time". > > Q: Does compression supported? > A: No. But you can compress the whole file, > just like in tar case: filename.kiss.bz2. Use it for file sharing. > > Q: Do advanced features (rights, symlinks, hardlinks, user/group/other) are > preserved? > A: No. It was not the goal of this archive. Although you can im
New archive file format (was: [omgps] collect feature requests)
> I dont want to compress at all. The 118MB for me is perfect. I only > want to pack the directory into a file. But not compressing. > Im thinking about tar or ar. Hi! I have studied all the available archive and compression options. Most notably tar[1][2][4][6] and zip file format [3]. They are the most common archive types. I read also ar (dpkg and ipkg uses it) and cpio format. So I did my homework, and made some researches. Our requirements: - no compression (no wasted cpu time) - random access (no slow waiting time and memory issue) - readily available module/library for easy of integrating (best: no additional package is required to install on the phone) Tar completely fail at random access, simply it lacks the table of content, so accessing the last file in the archive requires reading the whole content before it. Zip support accessing each files in the archive, although it compress the file by default. There are dar[5] and xar[7], which meets our random access criteria. However dar needs to be ported to the device, and xar is still in development (that means limited python support for example). So I wrote down the most dumb archive fileformat ever;) When I wrote the specification, I only had one goal: make it so simple, that everybody can implement it, so no need to wait for ready-made library. It is called KISS fileformat (keep it simple and stupid), the preferred extension would be filename.kiss You can read it here, I also included it (at the end of mail) for reference: http://pastebin.com/m608acaeb I think it is suitable for our map tile usage. What do you think? Best regards, Laszlo [1]: http://en.wikipedia.org/wiki/Tar_(file_format) [2]: http://www.python.org/doc/2.5.2/lib/module-tarfile.html [3]: http://www.python.org/doc/2.5.2/lib/module-zipfile.html [4]: http://en.wikipedia.org/wiki/Comparison_of_file_archivers [5]: http://en.wikipedia.org/wiki/DAR_(Disk_Archiver) [6]: http://en.wikipedia.org/wiki/Archive_formats [7]: http://code.google.com/p/xar/ KISS archive fileformat specification: # KISS archive format (Keep It Simple and Stupid) ## General properties - blocksize: 512 bytes - only store filename (and directory if any) and content - first file contains the filenames - header: start block, end block, position of last block ## Overall file structure [header][filenames][1. file][2. file][3. file] ## [header] [SB][EB][POS] [SB][EB][POS] [SB][EB][POS] etc.. [ 4][ 4][ 2] [ 4][ 4][ 2] [ 4][ 4][ 2] etc.. [ header ] [ filenames ] [1. file] etc.. SB (start block): 4 byte EB (end block): 4 byte POS (position of last block): 2 byte All numbers are stored big-endian. That means most significant bit first. Example: 613 dec = 265 hex = \00 \00 \02 \65 (4 bytes) 130411 dec = 1FD6B hex = \00 \01 \FD \6B (4 bytes) Note: The remaining part of the header block MUST be filled with zero bytes. You will always have remaining part in the block, simply each file takes 10 bytes. (512/10 = 51 and 2 bytes left) ## [filenames] UTF-8 text for each filename, delimited with '\n' byte. The directory structure is preserved too. [name of 1. file]['\n'][name of 2. file]['\n'][name of 3. file] etc.. Some examples: this is a file.txt this2.tar.gz this3.html images/loller.html weird_dir/this\/files contains\/several\\ slashes.txt Special characters: '\n': You cant have '\n' character in the filename. It is preserved. (it is not supported in most filesystems anyway) '/': directory delimiter. To save directory structure. '\/': if the filename itself contains an / character '\\': if the filename itself contains a \ character ## [X. file] The file content as is. ## FAQ: Q: Why another archive format? A: Because it is the most dumb format ever;) Q: Why not tar, ar, zip, [name archive type here]? A: Short answer: widely used archive format are not suited for random access with no compression. Long answer: tar: there is no index, reading the last file of the archive requires reading the whole file before it. zip: individual files are compressed, which means: processortime xar: it would fit the requirements, but it is not widely supported, and not in every language. Q: I use X language does KISS supported there? A: The fileformat is so simple, it is intented, every programmer could implement it in "no time". Q: Does compression supported? A: No. But you can compress the whole file, just like in tar case: filename.kiss.bz2. Use it for file sharing. Q: Do advanced features (rights, symlinks, hardlinks, user/group/other) are preserved? A: No. It was not the goal of this archive. Although you can implement it, just write those informations in the first file. It is not recommended. Q: If the original file is not multiple of 512 bytes, how it will look in the archive, how many bytes will it take? A: Lets have an example. We have three files: 768bytes file, 1024 bytes, 2047 by