Ludovic Courtès <[email protected]> skribis: > As reported by Tobias on IRC (in the context of ‘hpcguix-web’), > checkouts managed by Guile-Git appear to grow beyond reason. As an > example, here’s the same ‘.git’ managed with Guile-Git and with Git: > > $ du -hs > ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq > 6.7G > /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq > $ du -hs .git > 517M .git
Unsurprisingly, GC makes a big difference: --8<---------------cut here---------------start------------->8--- $ cp -r ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq /tmp/checkout $ (cd /tmp/checkout/; git gc) Enumerating objects: 717785, done. Counting objects: 100% (717785/717785), done. Delta compression using up to 4 threads Compressing objects: 100% (154644/154644), done. Writing objects: 100% (717785/717785), done. Total 717785 (delta 569440), reused 710535 (delta 562274), pack-reused 0 Enumerating cruft objects: 103412, done. Traversing cruft objects: 81753, done. Counting objects: 100% (64171/64171), done. Delta compression using up to 4 threads Compressing objects: 100% (17379/17379), done. Writing objects: 100% (64171/64171), done. Total 64171 (delta 52330), reused 58296 (delta 46792), pack-reused 0 Expanding reachable commits in commit graph: 133730, done. $ du -hs /tmp/checkout 539M /tmp/checkout --8<---------------cut here---------------end--------------->8--- > It would seem that libgit2 doesn’t do the equivalent of ‘git gc’. Confirmed: <https://github.com/libgit2/libgit2/issues/3247>. My inclination for the short term would be to work around this limitation by (1) finding a heuristic to determine is a checkout has likely accumulated too much cruft, and (2) considering such checkouts as expired (thereby forcing a re-clone) or running ‘git gc’ on them if ‘git’ is available. I can’t think of a good heuristic for (1). Birth time could be one, but we’d need statx(2): --8<---------------cut here---------------start------------->8--- $ stat ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq | tail -4 Access: 2023-09-04 23:13:54.668279105 +0200 Modify: 2023-09-04 11:34:41.665385000 +0200 Change: 2023-09-04 11:34:41.661629102 +0200 Birth: 2021-08-09 10:48:17.748722151 +0200 --8<---------------cut here---------------end--------------->8--- Lacking statx(2), we can approximate creation time by looking at ‘.git/config’: --8<---------------cut here---------------start------------->8--- $ stat ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/config | tail -3 Modify: 2021-08-09 10:50:28.031760953 +0200 Change: 2021-08-09 10:50:28.031760953 +0200 Birth: 2021-08-09 10:50:28.031760953 +0200 --8<---------------cut here---------------end--------------->8--- This strategy can be implemented like this:
diff --git a/guix/git.scm b/guix/git.scm index ebe2600209..ed3fa56bc8 100644 --- a/guix/git.scm +++ b/guix/git.scm @@ -405,7 +405,16 @@ (define cached-checkout-expiration ;; Use the mtime rather than the atime to cope with file systems mounted ;; with 'noatime'. - (file-expiration-time (* 90 24 3600) stat:mtime)) + (let ((ttl (* 90 24 3600)) + (max-checkout-retention (* 9 30 24 3600))) + (lambda (file) + (match (false-if-exception (lstat file)) + (#f 0) ;FILE may have been deleted in the meantime + (st (min (pk 'ttl (+ (stat:mtime st) ttl)) + (pk 'maxttl (match (false-if-exception + (lstat (in-vicinity file ".git/config"))) + (#f +inf.0) + (st (+ (stat:mtime st) max-checkout-retention)))))))))) (define %checkout-cache-cleanup-period ;; Period for the removal of expired cached checkouts.
Namely, a cached checkout as considered as “expired” after 9 months. In my case, it gives this: --8<---------------cut here---------------start------------->8--- scheme@(guix git)> (cached-checkout-expiration "/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/") ;;; (ttl 1701596081) ;;; (maxttl 1651827028) $6 = 1651827028 --8<---------------cut here---------------end--------------->8--- Of course having to re-clone entire repositories every 9 months is ridiculous, but storing gigabytes of packs is worse IMO (I’m specifically thinking about the Guix repo, which every users copies via ‘guix pull’). Thoughts? Thanks, Ludo’.
