On Thu, Sep 29, 2016 at 11:37 AM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
>
> I'm playing with an early patch to make the default more dynamic.
> Let's see how well it works in practice, but it looks fairly
> promising. Let me test a bit more and send out an RFC patch..

Ok, this is *very* rough, and it doesn't actuall pass all the tests,
and I didn't even try to look at why. But it passes the trivial
smell-test, and in particular it actually makes mathematical sense...

I think the patch can speak for itself, but the basic core is this
section in get_short_sha1():

  +       if (len < 16 && !status && (flags & GET_SHA1_AUTOMATIC)) {
  +               unsigned int expect_collision = 1 << (len * 2);
  +               if (ds.nrobjects > expect_collision)
  +                       return SHORT_NAME_AMBIGUOUS;
  +       }

basically, what it says is that we will consider a sha1 ambiguous even
if it was *technically* unique (that's the '!status' part of the test)
if:

 - the length was 15 or less

*and*

 - the number of objects we have is larger than the expected point
where statistically we should start to expect to get one collision.

That "expect_collision" math is actually very simple: each hex
character adds four bits of range, but since we expect collisions at
the square root of the maximum number of objects, we shift by just two
bits per hex digits instead.

The rest of the patch is a trivial change to just initialize the
default short size to -1, and consider that to mean "enable the
automatic size checking with a minimum of 7". And the trivial code to
estimate the number of objects (which ignores duplicates between packs
etc _entirely_).

For the kernel, just the *math* right now actually gives 12
characters. For current git it actually seems to say that 8 is the
correct number. For small projects, you'll still see 7.

ANYWAY. This patch is on top of Jeff's patches in 'pu' (I think those
are great regardless of this patch!), and as mentioned, it fails some
tests. I suspect that the failures might be due to the abbrev_default
being -1, and some other code finds that surprising now. But as
mentioned, I didn't really even look at it.

What do you think? It's actually a fairly simple patch and I really do
think it makes sense and it seems to just DTRT automatically.

              Linus
 cache.h       |  1 +
 environment.c |  2 +-
 sha1_name.c   | 21 ++++++++++++++++++++-
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 6e33f2f..d2da6d1 100644
--- a/cache.h
+++ b/cache.h
@@ -1207,6 +1207,7 @@ struct object_context {
 #define GET_SHA1_TREEISH          020
 #define GET_SHA1_BLOB             040
 #define GET_SHA1_FOLLOW_SYMLINKS 0100
+#define GET_SHA1_AUTOMATIC      0200
 #define GET_SHA1_ONLY_TO_DIE    04000
 
 #define GET_SHA1_DISAMBIGUATORS \
diff --git a/environment.c b/environment.c
index c1442df..fd6681e 100644
--- a/environment.c
+++ b/environment.c
@@ -16,7 +16,7 @@ int trust_executable_bit = 1;
 int trust_ctime = 1;
 int check_stat = 1;
 int has_symlinks = 1;
-int minimum_abbrev = 4, default_abbrev = 7;
+int minimum_abbrev = 4, default_abbrev = -1;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/sha1_name.c b/sha1_name.c
index 3b647fd..8791ff3 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -15,6 +15,7 @@ typedef int (*disambiguate_hint_fn)(const unsigned char *, 
void *);
 
 struct disambiguate_state {
        int len; /* length of prefix in hex chars */
+       unsigned int nrobjects;
        char hex_pfx[GIT_SHA1_HEXSZ + 1];
        unsigned char bin_pfx[GIT_SHA1_RAWSZ];
 
@@ -118,6 +119,12 @@ static void find_short_object_filename(struct 
disambiguate_state *ds)
 
                        if (strlen(de->d_name) != 38)
                                continue;
+
+                       // We only look at the one subdirectory, and we assume
+                       // each subdirectory is roughly similar, so each object
+                       // we find probably has 255 other objects in the other
+                       // fan-out directories
+                       ds->nrobjects += 256;
                        if (memcmp(de->d_name, ds->hex_pfx + 2, ds->len - 2))
                                continue;
                        memcpy(hex + 2, de->d_name, 38);
@@ -151,6 +158,7 @@ static void unique_in_pack(struct packed_git *p,
 
        open_pack_index(p);
        num = p->num_objects;
+       ds->nrobjects += num;
        last = num;
        while (first < last) {
                uint32_t mid = (first + last) / 2;
@@ -426,6 +434,12 @@ static int get_short_sha1(const char *name, int len, 
unsigned char *sha1,
                for_each_abbrev(ds.hex_pfx, show_ambiguous_object, &ds);
        }
 
+       if (len < 16 && !status && (flags & GET_SHA1_AUTOMATIC)) {
+               unsigned int expect_collision = 1 << (len * 2);
+               if (ds.nrobjects > expect_collision)
+                       return SHORT_NAME_AMBIGUOUS;
+       }
+
        return status;
 }
 
@@ -458,14 +472,19 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn 
fn, void *cb_data)
 int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 {
        int status, exists;
+       int flags = GET_SHA1_QUIETLY;
 
+       if (len < 0) {
+               flags |= GET_SHA1_AUTOMATIC;
+               len = 7;
+       }
        sha1_to_hex_r(hex, sha1);
        if (len == 40 || !len)
                return 40;
        exists = has_sha1_file(sha1);
        while (len < 40) {
                unsigned char sha1_ret[20];
-               status = get_short_sha1(hex, len, sha1_ret, GET_SHA1_QUIETLY);
+               status = get_short_sha1(hex, len, sha1_ret, flags);
                if (exists
                    ? !status
                    : status == SHORT_NAME_NOT_FOUND) {

Reply via email to