On Wed, 17 Feb 2010 11:21:51 +1100, Stewart Smith <stewart at flamingspork.com> wrote: > Using fast-import is interesting. Does it update the working tree? The > big thing I wanted to avoid was creating a working tree (another million > inodes being created is not ever what I need) > > Also interesting is the mention of creating packs on the fly... this > could save the time in first writing the object and then packing it (as > my script does). > > I'm going to play with this....
and I did. good news... on my mailstore (which, as I've previously mentioned, takes about 10 minutes to run 'du' over, about the same time as 'notmuch new' takes): using the (attached) evenless.pl to create a single commit with everything in it: $ du -sh .git 3.4G .git Down from a whopping 14-15GB!!! My previous effort (git-write-object, create pack every 1000 messages, rinse, repeat) took all night and got to 3.7GB. This took only 108 minutes. In both cases, i was creating the repository on another spindle (USB2.0 disk attached to my laptop). git-ls-tree and git-cat-file both work for listing and getting objects. The next thing to think about is adding objects as they come in... creating a new commit with just an added file should be pretty simple and easy... but this means we get to keep a "revision history" of the mailstore, which is *possibly* not ideal in terms of storage efficiency (i'll do a trial with mine of doing one message at a time and seeing what the end size is). however... commit per added mail (or mails) does give us the advantage of a really well documented and tested backup system :) Deleting could be hard.. if we actually want the objects to go away in a "permanent" way (not just no longer be referenced). for the stats nerds: $ time perl /home/stewart/evenless/evenless.pl /home/stewart/Maildir/INBOX git-fast-import statistics: --------------------------------------------------------------------- Alloc'd objects: 785000 Total objects: 781813 ( 79023 duplicates ) blobs : 781363 ( 79023 duplicates 708627 deltas) trees : 449 ( 0 duplicates 0 deltas) commits: 1 ( 0 duplicates 0 deltas) tags : 0 ( 0 duplicates 0 deltas) Total branches: 1 ( 1 loads ) marks: 1048576 ( 860386 unique ) atoms: 860557 Memory total: 182780 KiB pools: 152116 KiB objects: 30664 KiB --------------------------------------------------------------------- pack_report: getpagesize() = 4096 pack_report: core.packedGitWindowSize = 1073741824 pack_report: core.packedGitLimit = 8589934592 pack_report: pack_used_ctr = 1 pack_report: pack_mmap_calls = 1 pack_report: pack_open_windows = 1 / 1 pack_report: pack_mapped = 388496447 / 388496447 --------------------------------------------------------------------- real 107m43.130s user 45m25.430s sys 2m49.440s -------------- next part -------------- A non-text attachment was scrubbed... Name: evenless.pl Type: text/x-perl Size: 1413 bytes Desc: evenless.pl: maildir to git using fast-import URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20100217/bc1a3f34/attachment.pl> -------------- next part -------------- -- Stewart Smith