Hello, I appologize if this topic comes up often, but the current patch format is very simple and I think we could do better. (I've been bitten by the efficiency of the current way many times.)
My propsal is to use a format similar to ar (or we could maybe use ar as is). Currently, binary files are stored ASCII-enarmored inside the patch file which is then gzip'd. This can be extremly slow to process if the binary file is large. And it's not only a problem when the file needs to be extracted from the patch, but when darcs needs information about entries in the patch file that come after the large entries. My proposed way of dealing with this is to have some sort of length encoding so that file system calls such as lseek can be used to jump to the next file in what amounts to essentially constant time (all the OS needs to do is change an offset associated with the file descriptor and then issue a read). Another feature I hear people asking about from time to time is storing of permission information (or more generally meta-data). This could also be easy to do using the ar format which has fields for storing a small bit of meta-data. The easiest way of taking advantage of this new format is as follows: 1) The currenty patch format is kept the way it is for textual patches. Then this gzip'd file is put in the archive, with the appropriate header. This creates a small amount of overhead (should be very small). 2) Binary files can be stored as-is or compressed (usually compressing binary files doesn't help much, so why bother?) inside the ar file taking full advantage of the ar format. Allowing easy copying to disk or seeking past the contents to look at other data in the archive. And probably the better way to use ar: 1) Patches that are related to a given file are gzip'd and stored together as an entry in the ar file along with permission data. 2) Binary files are stored as above. There are other variations, such as storing each hunk as a different entry in the patch file, or using a hybrid approach, where there is one entry in the ar file that serves as a table of contents and then use something like the first approach. And I'd hate to hear this idea get shot down on the basis that ar is not flexible enough or because it is tied to unix, so just let me say this. We don't have to exactly implement ar, but I think it provides a good definition to start from. As a quick reference I found this website: http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/files/aixfiles/ar_IA64.htm Or, if you prefer tiny-urls: http://tinyurl.com/73wl9 Thanks, Jason _______________________________________________ darcs-devel mailing list [email protected] http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel
