On Thu, Sep 29, 2016 at 12:06:23PM -0700, Linus Torvalds wrote:
> On Thu, Sep 29, 2016 at 11:55 AM, Linus Torvalds
> <torva...@linux-foundation.org> wrote:
> >
> > For the kernel, just the *math* right now actually gives 12
> > characters. For current git it actually seems to say that 8 is the
> > correct number. For small projects, you'll still see 7.
> 
> Sorry, the git number is 9, not 8. The reason is that git has roughly
> 212k objects, and 9 hex digits gets expected collisions at about 256k
> objects.
> 
> So the logic means that we'll see 7 hex digits for projects with less
> than 16k objects, 8 hex digits if there are less than 64k objects, and
> 9 hex digits for projects like git that currently have fewer than 256k
> objects.
> 
> But git itself might not be *that* far from going to 10 hex digits
> with my patch.
> 
> The kernel uses 12 he digits because the collision math says that's
> the right thing for a project with between 4M and 16M objects (with
> the kernel being at 5M).

OTOH, how often does one refer to trees or blobs with abbreviated sha1s?
Most of the time, you'd use abbreviated sha1s for commits. And the number
of commits in git and the kernel repositories are much lower than the
number of overall objects.

rev-list --all --count on the git repo gives me 46790. On the kernel, it
gives 618078.

Now, the interesting thing is looking at the *actual* collisions in
those spaces.

At 9 digits, there's only one commit collision in the kernel repo:
  45f014c5264f5e68ef0e51b36f4ef5ede3d18397
  45f014c52eef022873b19d6a20eb0ec9668f2b09

And two commit collisions at 8 digits in the git repo:
  1536dd9c1df0b7167b139f6666080cc4774ef63f
  1536dd9c61b5582cf079999057cb715dd6dc6620

  2e6e3e82ee36b3e1bec1db8db24817270080424e
  2e6e3e829f3759823d70e7af511bc04cd05ad0af

At 7 digits, there are 5 actual commit collisions in the git repo and
718 in the kernel repo only one of those collisions involve more than 2
commits.

Mike

Reply via email to