bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary.

2023-10-20 Thread Ludovic Courtès
Fixes .

This fixes a bug whereby libgit2-managed checkouts would keep growing as
we fetch.

* guix/git.scm (packs-in-git-repository, maybe-run-git-gc): New
procedures.
(update-cached-checkout): Use it.
---
 guix/git.scm | 39 ---
 1 file changed, 36 insertions(+), 3 deletions(-)

Hi!

This is a radical fix/workaround for the unbounded Git checkout growth
problem, shelling out to ‘git gc’ when it’s likely needed (“too many”
pack files around).

I thought we might be able to implement a ‘git gc’ approximation using
the libgit2 “packbuilder” interface, but I haven’t got around to doing
it: .

Once again, shelling out is not my favorite option, but it’s a bug we
should fix sooner rather than later, hence this compromise.

Thoughts?

Ludo’.

diff --git a/guix/git.scm b/guix/git.scm
index b7182305cf..d704b62333 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -1,6 +1,6 @@
 ;;; GNU Guix --- Functional package management for GNU
 ;;; Copyright © 2017, 2020 Mathieu Othacehe 
-;;; Copyright © 2018-2022 Ludovic Courtès 
+;;; Copyright © 2018-2023 Ludovic Courtès 
 ;;; Copyright © 2021 Kyle Meyer 
 ;;; Copyright © 2021 Marius Bakke 
 ;;; Copyright © 2022 Maxime Devos 
@@ -29,15 +29,16 @@ (define-module (guix git)
   #:use-module (guix cache)
   #:use-module (gcrypt hash)
   #:use-module ((guix build utils)
-#:select (mkdir-p delete-file-recursively))
+#:select (mkdir-p delete-file-recursively invoke/quiet))
   #:use-module (guix store)
   #:use-module (guix utils)
   #:use-module (guix records)
   #:use-module (guix gexp)
   #:autoload   (guix git-download)
   (git-reference-url git-reference-commit git-reference-recursive?)
+  #:autoload   (guix config) (%git)
   #:use-module (guix sets)
-  #:use-module ((guix diagnostics) #:select (leave warning))
+  #:use-module ((guix diagnostics) #:select (leave warning info))
   #:use-module (guix progress)
   #:autoload   (guix swh) (swh-download commit-id?)
   #:use-module (rnrs bytevectors)
@@ -428,6 +429,35 @@ (define (delete-checkout directory)
 (rename-file directory trashed)
 (delete-file-recursively trashed)))
 
+(define (packs-in-git-repository directory)
+  "Return the number of pack files under DIRECTORY, a Git checkout."
+  (catch 'system-error
+(lambda ()
+  (let ((directory (opendir (in-vicinity directory ".git/objects/pack"
+(let loop ((count 0))
+  (match (readdir directory)
+((? eof-object?)
+ (closedir directory)
+ count)
+(str
+ (loop (if (string-suffix? ".pack" str)
+   (+ 1 count)
+   count)))
+(const 0)))
+
+(define (maybe-run-git-gc directory)
+  "Run 'git gc' in DIRECTORY if needed."
+  ;; XXX: As of libgit2 1.3.x (used by Guile-Git), there's no support for GC.
+  ;; Each time a checkout is pulled, a new pack is created, which eventually
+  ;; takes up a lot of space (lots of small, poorly-compressed packs).  As a
+  ;; workaround, shell out to 'git gc' when the number of packs in a
+  ;; repository has become "too large", potentially wasting a lot of space.
+  ;; See .
+  (when (> (packs-in-git-repository directory) 25)
+(info (G_ "compressing cached Git repository at '~a'...~%")
+  directory)
+(invoke/quiet %git "-C" directory "gc")))
+
 (define* (update-cached-checkout url
  #:key
  (ref '())
@@ -515,6 +545,9 @@ (define* (update-cached-checkout url
seconds seconds
nanoseconds nanoseconds
 
+   ;; Run 'git gc' if needed.
+   (maybe-run-git-gc cache-directory)
+
;; When CACHE-DIRECTORY is a sub-directory of the default cache
;; directory, remove expired checkouts that are next to it.
(let ((parent (dirname cache-directory)))

base-commit: 6b0a32196982a0a2f4dbb59d35e55833a5545ac6
-- 
2.41.0






bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary.

2023-10-23 Thread Simon Tournier
Hi Ludo,

On Fri, 20 Oct 2023 at 18:15, Ludovic Courtès  wrote:

> * guix/git.scm (packs-in-git-repository, maybe-run-git-gc): New
> procedures.
> (update-cached-checkout): Use it.
> ---
>  guix/git.scm | 39 ---
>  1 file changed, 36 insertions(+), 3 deletions(-)

LGTM.  Just two colors for the bikeshed. :-)


> +  (when (> (packs-in-git-repository directory) 25)

Why 25?  And not 10 or 50 or 100?


>  (define* (update-cached-checkout url
>   #:key
>   (ref '())
> @@ -515,6 +545,9 @@ (define* (update-cached-checkout url
> seconds seconds
> nanoseconds nanoseconds
>  
> +   ;; Run 'git gc' if needed.
> +   (maybe-run-git-gc cache-directory)

Why not trigger it by “guix gc”?

Well, I expect “guix gc” to take some time and I choose when.  However,
I want “guix pull” or “guix time-machine” to be as fast as possible and
here some extra time is added, and I cannot control exactly when.

Cheers,
simon





bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary.

2023-10-23 Thread Tobias Geerinckx-Rice via Bug reports for GNU Guix
>Why not trigger it by “guix gc”?

Unless there's a new option I missed, guix gc doesn't handle this.

>Well, I expect “guix gc” to take some time and I choose when.  However,
>I want “guix pull” or “guix time-machine” to be as fast as possible

I don't think that things should be pushed into guix gc merely because they are 
slow.

This is not a great post (I'd look at the git code if I were at a computer) but 
I remember git printing something like 'optimising repository in the 
background'.  Maybe something similar would be appropriate here, to better hide 
such housekeeping from the user.


Kind regards,

T G-R

Sent on the go.  Excuse or enjoy my brevity.