On Sat, Sep 01, 2018 at 02:53:28PM +0200, Ævar Arnfjörð Bjarmason wrote:

> With this we'll get output like:
> 
>     $ ~/g/git/git  -C ~/g/2015-04-03-1M-git/  --exec-path=$PWD fsck
>     Checking object directories: 100% (256/256), done.
>     Hashing: 100% (452634108/452634108), done.
>     Hashing: 100% (1073741824/1073741824), done.
>     Hashing: 100% (1073741824/1073741824), done.
>     Hashing: 100% (1008001572/1008001572), done.
>     Checking objects:   2% (262144/13064614)
>     ^C
> 
> All tests pass with this. Isn't it awesome? Except it's of course a
> massive hack, we wouldn't want to just hook into SHA1DC like this.

I still consider that output so-so; the byte counts are big and there's
no indication how many "hashing" lines we're going to see. It's also
broken up in a weird way (it's not one per file; it's one per giant
chunk we fed to sha1).

> The problem comes down to us needing to call git_hash_sha1_update() with
> some really huge input, that function is going to take a *long* time,
> and the only way we're getting incremental progress is:
> 
>  1) If we ourselves split the input into N chunks
>  2) If we hack into the SHA1 library itself
> 
> This patch does #2, but for this to be acceptable we'd need to do
> something like #1.

I think we could just do the chunking in verify_packfile(), couldn't we?
(And the .idx hash, if we really want to cover that case, but IMHO
that's way less interesting).

Something like this, which chunks it there, uses a per-packfile meter
(though still does not give any clue how many packfiles there are), and
shows a throughput meter.

diff --git a/pack-check.c b/pack-check.c
index d3a57df34f..c94223664f 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -62,10 +62,25 @@ static int verify_packfile(struct packed_git *p,
        uint32_t nr_objects, i;
        int err = 0;
        struct idx_entry *entries;
+       struct progress *hashing_progress;
+       char *title;
+       off_t total_hashed = 0;
 
        if (!is_pack_valid(p))
                return error("packfile %s cannot be accessed", p->pack_name);
 
+       if (progress) {
+               /* Probably too long... */
+               title = xstrfmt("Hashing %s", p->pack_name);
+
+               /*
+                * I don't think it actually works to have two progresses going
+                * at the same time, because when the first one ends, we'll
+                * cancel the alarm. But hey, this is a hacky proof of concept.
+                */
+               hashing_progress = start_progress(title, 0);
+       }
+
        the_hash_algo->init_fn(&ctx);
        do {
                unsigned long remaining;
@@ -75,9 +90,25 @@ static int verify_packfile(struct packed_git *p,
                        pack_sig_ofs = p->pack_size - the_hash_algo->rawsz;
                if (offset > pack_sig_ofs)
                        remaining -= (unsigned int)(offset - pack_sig_ofs);
-               the_hash_algo->update_fn(&ctx, in, remaining);
+               while (remaining) {
+                       int chunk = remaining < 4096 ? remaining : 4096;
+                       the_hash_algo->update_fn(&ctx, in, chunk);
+                       in += chunk;
+                       remaining -= chunk;
+                       total_hashed += chunk;
+                       /*
+                        * The progress code needs tweaking to show throughputs
+                        * better for open-ended meters.
+                        */
+                       display_throughput(hashing_progress, total_hashed);
+                       display_progress(hashing_progress, 0);
+               }
        } while (offset < pack_sig_ofs);
+
        the_hash_algo->final_fn(hash, &ctx);
+       stop_progress(&hashing_progress);
+       free(title);
+
        pack_sig = use_pack(p, w_curs, pack_sig_ofs, NULL);
        if (hashcmp(hash, pack_sig))
                err = error("%s pack checksum mismatch",

Reply via email to