Re: [PATCH 4/9] get_short_oid: sort ambiguous objects by type, then SHA-1

Derrick Stolee Tue, 01 May 2018 05:27:08 -0700

On 5/1/2018 7:27 AM, Ævar Arnfjörð Bjarmason wrote:

On Tue, May 01 2018, Derrick Stolee wrote:

On 4/30/2018 6:07 PM, Ævar Arnfjörð Bjarmason wrote:

Since we show the commit data in the output that's nicely aligned once
we sort by object type. The decision to show tags before commits is
pretty arbitrary, but it's much less likely that we'll display a tag,
so if there is one it makes sense to show it first.

Here's a non-arbitrary reason: the object types are ordered
topologically (ignoring self-references):

tag -> commit, tree, blob
commit -> tree
tree -> blob

Thanks. I'll add a patch with that comment to v2.

@@ -421,7 +451,12 @@ static int get_short_oid(const char *name, int len, struct 
object_id *oid,
                        ds.fn = NULL;
                advise(_("The candidates are:"));
-               for_each_abbrev(ds.hex_pfx, show_ambiguous_object, &ds);
+               for_each_abbrev(ds.hex_pfx, collect_ambiguous, &collect);
+               QSORT(collect.oid, collect.nr, sort_ambiguous);

I was wondering how the old code sorted by SHA even when the ambiguous
objects were loaded from different sources (multiple pack-files, loose
objects). Turns out that for_each_abbrev() does its own sort after
collecting the SHAs and then calls the given function pointer only
once per distinct object. This avoids multiple instances of the same
object, which may appear multiple times across pack-files.

I only ask because now we are doing two sorts. I wonder if it would be
more elegant to provide your sorting algorithm to for_each_abbrev()
and let it call show_ambiguous_object as before.

Another question is if we should use this sort generally for all calls
to for_each_abbrev(). The only other case I see is in
builtin/revparse.c.

When preparing v2 I realized how confusing this was, so I'd added this
to the commit message of my WIP re-roll which should explain this:

     A note on the implementation: I started out with something much
     simpler which just replaced oid_array_sort() in sha1-array.c with a
     custom sort function before calling oid_array_for_each_unique(). But
     then dumbly noticed that it doesn't work because the output function
     was tangled up with the code added in fad6b9e590 ("for_each_abbrev:
     drop duplicate objects", 2016-09-26) to ensure we don't display
     duplicate objects.

That's why we're doing two passes here, first we need to sort the list

     and de-duplicate the objects, then sort them in our custom order, and
     finally output them without re-sorting them. I suppose we could also
     make oid_array_for_each_unique() maintain a hashmap of emitted
     objects, but that would increase its memory profile and wouldn't be
     worth the complexity for this one-off use-case,
     oid_array_for_each_unique() is used in many other places.

How would sorting in our custom order before de-duplicating fail thede-duplication? We will still pair identical OIDs as consecutiveelements and oid_array_for_each_unique only cares about consecutiveelements having distinct OIDs, not lex-ordered OIDs.

Perhaps the noise is because we rely on oid_array_sort() to mark thearray as sorted inside oid_array_for_each_unique(), but that could beremedied by calling our QSORT() inside for_each_abbrev() and marking thearray as sorted before calling oid_array_for_each_unique().


(Again, my comments are not meant to block this series.)

Thanks,
-Stolee

Re: [PATCH 4/9] get_short_oid: sort ambiguous objects by type, then SHA-1

Reply via email to