Re: [PATCH] technical doc: add a design doc for hash function transition

2017-09-26 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> From: Jonathan Nieder 

I go by jrnie...@gmail.com upstream. :)

> This is "RFC v3: Another proposed hash function transition plan" from
> the git mailing list.
>
> Signed-off-by: Jonathan Nieder 
> Signed-off-by: Jonathan Tan 
> Signed-off-by: Brandon Williams 
> Signed-off-by: Stefan Beller 

I hadn't signed-off on this version, but it's not a big deal.

[...]
> ---
>
>  This takes the original Google Doc[1] and adds it to our history,
>  such that the discussion can be on on list and in the commit messages.
>
>  * replaced SHA3-256 with NEWHASH, sha3 with newhash
>  * added section 'Implementation plan'
>  * added section 'Future work'
>  * added section 'Agreed-upon criteria for selecting NewHash'

Thanks for sending this out.  I had let it stall too long.

As a tiny nit, I think NewHash is easier to read than NEWHASH.  Not a
big deal.  More importantly, we need some text describing it and
saying it's a placeholder.

The implementation plan included here is out of date.  It comes from
an email where I was answering a question about what people can do to
make progress, before this design had been agreed on.  In the context
of this design there are other steps we'd want to describe (having to
do with implementing the translation table, etc).

I also planned to add a description of the translation table based on
what was discussed previously in this thread.

Jonathan


[PATCH] technical doc: add a design doc for hash function transition

2017-09-26 Thread Stefan Beller
From: Jonathan Nieder 

This is "RFC v3: Another proposed hash function transition plan" from
the git mailing list.

Signed-off-by: Jonathan Nieder 
Signed-off-by: Jonathan Tan 
Signed-off-by: Brandon Williams 
Signed-off-by: Stefan Beller 
---

 This takes the original Google Doc[1] and adds it to our history,
 such that the discussion can be on on list and in the commit messages.
 
 * replaced SHA3-256 with NEWHASH, sha3 with newhash
 * added section 'Implementation plan'
 * added section 'Future work'
 * added section 'Agreed-upon criteria for selecting NewHash'
 
 As the discussion restarts again, here is our attempt
 to add value to the discussion, we planned to polish it more, but as the
 discussion is restarting, we might just post it as-is.
  
 Thanks.

[1] 
https://docs.google.com/document/d/18hYAQCTsDgaFUo-VJGhT0UqyetL2LbAzkWNK1fYS8R0/edit

 Documentation/Makefile |   1 +
 .../technical/hash-function-transition.txt | 571 +
 2 files changed, 572 insertions(+)
 create mode 100644 Documentation/technical/hash-function-transition.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 2415e0d657..471bb29725 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -67,6 +67,7 @@ SP_ARTICLES += howto/maintain-git
 API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt 
technical/api-index.txt, $(wildcard technical/api-*.txt)))
 SP_ARTICLES += $(API_DOCS)
 
+TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
 TECH_DOCS += technical/pack-format
diff --git a/Documentation/technical/hash-function-transition.txt 
b/Documentation/technical/hash-function-transition.txt
new file mode 100644
index 00..0ac751d600
--- /dev/null
+++ b/Documentation/technical/hash-function-transition.txt
@@ -0,0 +1,571 @@
+Git hash function transition
+
+
+Objective
+-
+Migrate Git from SHA-1 to a stronger hash function.
+
+Background
+--
+At its core, the Git version control system is a content addressable
+filesystem. It uses the SHA-1 hash function to name content. For
+example, files, directories, and revisions are referred to by hash
+values unlike in other traditional version control systems where files
+or versions are referred to via sequential numbers. The use of a hash
+function to address its content delivers a few advantages:
+
+* Integrity checking is easy. Bit flips, for example, are easily
+  detected, as the hash of corrupted content does not match its name.
+* Lookup of objects is fast.
+
+Using a cryptographically secure hash function brings additional
+advantages:
+
+* Object names can be signed and third parties can trust the hash to
+  address the signed object and all objects it references.
+* Communication using Git protocol and out of band communication
+  methods have a short reliable string that can be used to reliably
+  address stored content.
+
+Over time some flaws in SHA-1 have been discovered by security
+researchers. https://shattered.io demonstrated a practical SHA-1 hash
+collision. As a result, SHA-1 cannot be considered cryptographically
+secure any more. This impacts the communication of hash values because
+we cannot trust that a given hash value represents the known good
+version of content that the speaker intended.
+
+SHA-1 still possesses the other properties such as fast object lookup
+and safe error checking, but other hash functions are equally suitable
+that are believed to be cryptographically secure.
+
+Goals
+-
+1. The transition to NEWHASH can be done one local repository at a time.
+   a. Requiring no action by any other party.
+   b. A NEWHASH repository can communicate with SHA-1 Git servers
+  (push/fetch).
+   c. Users can use SHA-1 and NEWHASH identifiers for objects
+  interchangeably.
+   d. New signed objects make use of a stronger hash function than
+  SHA-1 for their security guarantees.
+2. Allow a complete transition away from SHA-1.
+   a. Local metadata for SHA-1 compatibility can be removed from a
+  repository if compatibility with SHA-1 is no longer needed.
+3. Maintainability throughout the process.
+   a. The object format is kept simple and consistent.
+   b. Creation of a generalized repository conversion tool.
+
+Non-Goals
+-
+1. Add NEWHASH support to Git protocol. This is valuable and the
+   logical next step but it is out of scope for this initial design.
+2. Transparently improving the security of existing SHA-1 signed
+   objects.
+3. Intermixing objects using multiple hash functions in a single
+   repository.
+4. Taking the opportunity to fix other bugs in git's formats and
+   protocols.
+5. Shallow clones and fetches into a NEWHASH repository. (This will
+   change when we add NEWHASH support to Git