Thomas Lord <[EMAIL PROTECTED]> wrote: > Eric> If work were to continue on revc (and I really hope it does > Eric> even though I can't afford to fund it), I'd like to see it > Eric> moved to an archive format that: > > Eric> 1. does not rely on binary metadata formats > Eric> 2. uses gzip instead of raw zlib compression > > Eric> Since most files are pretty small, I highly doubt that any > Eric> performance gains by using raw zlib or a binary metadata format > Eric> would have a noticeable impact on performance. > > Eric> I (and I suspect many Arch users) really like the transparent > Eric> and orthogonal aspects of Arch, and would like to see Arch 2.0 > Eric> continue that tradition. > > I chose the binary metadata formats because: > > a) they cut down code size and speed performance (look ma, no > parsing!) > b) they elegantly solve "whitespace in _____ names" without escaping > > Note that although he metadata formats are partly binary, they > internally use plain text wherever to do otherwise would make the > formats platform-specific. > > Also note that, with only one exception (the format of directory > blobs), the formats are strings that can't contain a nul character, > separated by nul characters. There is at least a minor convention > in some GNU tools that recognize that format as an alternative to > newline-separated lines.
From hexdumping an unzipped prereqs file, I can see that it has a list of project-name.<commit>/<sha1>+<sha1> text in them, and that it's zero-padded, but it also has a few arbitrary characters scattered inside the zero-padded area that I haven't gotten around to understanding (perhaps you could tell me what they are). Also, why the extra zero-padding in all those binary files for alignment? URL-encoding is a fairly well supported (by other applications/libraries) way to fix the whitespace in names issue. It can break shell scripts, but shell scripters generally have enough sense to not name files with spaces in them :) Code size: The tickets already have a parser, and there's also hackerlab in the code base. Performance: These are small files we're talking about here. > I'm unclear what advantage you see (or even what exactly you mean) > about gzip vs. zlib. Do you mean fork/exec the gzip program? Why > would that be an improvement? No, there's no need to fork/exec the gzip binary in revc, but you can create gzip-readable files using the zlib library. The gzip format adds some extra headers on top of raw zlib compression (what revc and git are currently using), so it's slightly less efficient, but still hardly noticeable. My apologies if you feel I'm being nitpicky, but simple plain-text data formats give me warm fuzzy feelings :) -- Eric Wong
signature.asc
Description: Digital signature
_______________________________________________ Gnu-arch-users mailing list [email protected] http://lists.gnu.org/mailman/listinfo/gnu-arch-users GNU arch home page: http://savannah.gnu.org/projects/gnu-arch/
