Re: [PATCH v2 2/2] doc hash-function-transition: pick SHA-256 as NewHash
Subject: doc hash-function-transition: pick SHA-256 as NewHash >From a security perspective, it seems that SHA-256, BLAKE2, SHA3-256, K12, and so on are all believed to have similar security properties. All are good options from a security point of view. SHA-256 has a number of advantages: * It has been around for a while, is widely used, and is supported by just about every single crypto library (OpenSSL, mbedTLS, CryptoNG, SecureTransport, etc). * When you compare against SHA1DC, most vectorized SHA-256 implementations are indeed faster, even without acceleration. * If we're doing signatures with OpenPGP (or even, I suppose, CMS), we're going to be using SHA-2, so it doesn't make sense to have our security depend on two separate algorithms when either one of them alone could break the security when we could just depend on one. So SHA-256 it is. Update the hash-function-transition design doc to say so. After this patch, there are no remaining instances of the string "NewHash", except for an unrelated use from 2008 as a variable name in t/t9700/test.pl. Signed-off-by: Ævar Arnfjörð Bjarmason Acked-by: Linus Torvalds Acked-by: brian m. carlson Acked-by: Johannes Schindelin Acked-by: Dan Shumow Signed-off-by: Jonathan Nieder --- Hi, Ævar Arnfjörð Bjarmason wrote: > I think it makes if you just take over 2/2 of this series (or even the > whole thing), since the meat of it is already something I copy/pasted > from you, and you've got more of an opinion / idea about how to proceed > (which is good!); it's more efficient than me trying to fix various > stuff you're pointing out at this point, I also think it makes sense > that you change the "Author" line for that, since the rest of the > changes will mainly be search-replace by me. Fair enough. Here's that updated patch 2/2. I'll try to make a more comprehensive set of proposed edits tomorrow, in a fresh thread (dealing with the cksum-trailer, etc). Brian, is your latest work in progress available somewhere (e.g. a branch on https://git.crustytoothpaste.net/git/bmc/git) so I can make sure any edits I make match well with it? Thanks, Jonathan .../technical/hash-function-transition.txt| 196 +- 1 file changed, 98 insertions(+), 98 deletions(-) diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt index 5ee4754adb..bc2ace2a6e 100644 --- a/Documentation/technical/hash-function-transition.txt +++ b/Documentation/technical/hash-function-transition.txt @@ -59,14 +59,11 @@ that are believed to be cryptographically secure. Goals - -Where NewHash is a strong 256-bit hash function to replace SHA-1 (see -"Selection of a New Hash", below): - -1. The transition to NewHash can be done one local repository at a time. +1. The transition to SHA-256 can be done one local repository at a time. a. Requiring no action by any other party. - b. A NewHash repository can communicate with SHA-1 Git servers + b. A SHA-256 repository can communicate with SHA-1 Git servers (push/fetch). - c. Users can use SHA-1 and NewHash identifiers for objects + c. Users can use SHA-1 and SHA-256 identifiers for objects interchangeably (see "Object names on the command line", below). d. New signed objects make use of a stronger hash function than SHA-1 for their security guarantees. @@ -79,7 +76,7 @@ Where NewHash is a strong 256-bit hash function to replace SHA-1 (see Non-Goals - -1. Add NewHash support to Git protocol. This is valuable and the +1. Add SHA-256 support to Git protocol. This is valuable and the logical next step but it is out of scope for this initial design. 2. Transparently improving the security of existing SHA-1 signed objects. @@ -87,26 +84,26 @@ Non-Goals repository. 4. Taking the opportunity to fix other bugs in Git's formats and protocols. -5. Shallow clones and fetches into a NewHash repository. (This will - change when we add NewHash support to Git protocol.) -6. Skip fetching some submodules of a project into a NewHash - repository. (This also depends on NewHash support in Git +5. Shallow clones and fetches into a SHA-256 repository. (This will + change when we add SHA-256 support to Git protocol.) +6. Skip fetching some submodules of a project into a SHA-256 + repository. (This also depends on SHA-256 support in Git protocol.) Overview We introduce a new repository format extension. Repositories with this -extension enabled use NewHash instead of SHA-1 to name their objects. +extension enabled use SHA-256 instead of SHA-1 to name their objects. This affects both object names and object content --- both the names of objects and all references to other objects within an object are switched to the new hash function. -NewHash repositories cannot be read by older versions of Git. +SHA-256 repositories cannot be read by older versions of Git.
Re: [PATCH v2 2/2] doc hash-function-transition: pick SHA-256 as NewHash
On Fri, Aug 03, 2018 at 12:20:14AM -0700, Jonathan Nieder wrote: > Ævar Arnfjörð Bjarmason wrote: > > Object format > > ~ > > The content as a byte sequence of a tag, commit, or tree object named > > -by sha1 and newhash differ because an object named by newhash-name refers > > to > > +by sha1 and sha256 differ because an object named by sha256-name refers to > > Not about this patch: this should say SHA-1 and SHA-256, I think. > Leaving it as is in this patch as you did is the right thing. > > [...] > > @@ -255,10 +252,10 @@ network byte order): > >up to and not including the table of CRC32 values. > > - Zero or more NUL bytes. > > - The trailer consists of the following: > > - - A copy of the 20-byte NewHash checksum at the end of the > > + - A copy of the 20-byte SHA-256 checksum at the end of the > > Not about this patch: a SHA-256 is 32 bytes. Leaving that for a > separate patch as you did is the right thing, though. > > [...] > > - - 20-byte NewHash checksum of all of the above. > > + - 20-byte SHA-256 checksum of all of the above. > > Likewise. For the record, my code for these does use 32 bytes. I'm fine with this being a separate patch, though. > [...] > > @@ -351,12 +348,12 @@ the following steps: > > (This list only contains objects reachable from the "wants". If the > > pack from the server contained additional extraneous objects, then > > they will be discarded.) > > -3. convert to newhash: open a new (newhash) packfile. Read the > > topologically > > +3. convert to sha256: open a new (sha256) packfile. Read the topologically > > Not about this patch: this one's more murky, since it's talking about > the object names instead of the hash function. I guess "sha256" > instead of "SHA-256" for this could be right, but I worry it's going > to take time for me to figure out the exact distinction. That seems > like a reason to just call it SHA-256 (but in a separate patch). My assumption has been that when we are referring to the algorithm, we'll use SHA-1 and SHA-256, and when we're referring to the input to Git (in config files or in ^{sha256} notation), we use "sha1" and "sha256". I see this as analogous to "Git" and "git". Otherwise, I'm fine with this document as it is. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [PATCH v2 2/2] doc hash-function-transition: pick SHA-256 as NewHash
On Fri, Aug 03 2018, Jonathan Nieder wrote: > Hi again, > > Sorry for the slow review. I finally got a chance to look this over > again. > > My main nits are about the commit message: I think it still focuses > too much on the process instead of the usual "knowing what I know now, > here's the clearest explanation for why we need this patch" approach. > I can send a patch illustrating what I mean tomorrow morning. I think it makes if you just take over 2/2 of this series (or even the whole thing), since the meat of it is already something I copy/pasted from you, and you've got more of an opinion / idea about how to proceed (which is good!); it's more efficient than me trying to fix various stuff you're pointing out at this point, I also think it makes sense that you change the "Author" line for that, since the rest of the changes will mainly be search-replace by me. Perhaps it's better for readability if those search-replace changes go into their own change, i.e. make it a three-part where 2/3 does the search-replace, and promises that 3/3 has the full rationale etc. > Ævar Arnfjörð Bjarmason wrote: > >> From a security perspective, it seems that SHA-256, BLAKE2, SHA3-256, >> K12, and so on are all believed to have similar security properties. >> All are good options from a security point of view. >> >> SHA-256 has a number of advantages: >> >> * It has been around for a while, is widely used, and is supported by >> just about every single crypto library (OpenSSL, mbedTLS, CryptoNG, >> SecureTransport, etc). >> >> * When you compare against SHA1DC, most vectorized SHA-256 >> implementations are indeed faster, even without acceleration. >> >> * If we're doing signatures with OpenPGP (or even, I suppose, CMS), >> we're going to be using SHA-2, so it doesn't make sense to have our >> security depend on two separate algorithms when either one of them >> alone could break the security when we could just depend on one. >> >> So SHA-256 it is. > > The above is what I wrote, so of course I'd like it. ;-) > >>See the "Hash algorithm analysis" thread as of >> [1]. Linus has come around to this choice and suggested Junio make the >> final pick, and he's endorsed SHA-256 [3]. > > The above paragraph has the same problem as before of (1) not being > self-contained and (2) focusing on the process that led to this patch > instead of the benefit of the patch itself. I think we should omit it. > >> This follow-up change changes occurrences of "NewHash" to >> "SHA-256" (or "sha256", depending on the context). The "Selection of a >> New Hash" section has also been changed to note that historically we >> used the the "NewHash" name while we didn't know what the new hash >> function would be. > > nit: Commit messages are usually in the imperative tense. This is in > the past tense, I think because it is a continuation of that > discussion about process. > > For this part, I think we can let the patch speak for itself. > >> This leaves no use of "NewHash" anywhere in git.git except in the >> aforementioned section (and as a variable name in t/t9700/test.pl, but >> that use from 2008 has nothing to do with this transition plan). > > This part is helpful --- good. > >> 1. >> https://public-inbox.org/git/20180720215220.gb18...@genre.crustytoothpaste.net/ >> 2. >> https://public-inbox.org/git/ca+55afwse9bf8e0hlk9pp3fvd5lavy5grdsv3fbntgzekja...@mail.gmail.com/ >> 3. https://public-inbox.org/git/xmqqzhygwd5o@gitster-ct.c.googlers.com/ > > Footnotes to the historical part --- I'd recommend removing these. > >> Helped-by: Jonathan Nieder >> Helped-by: Junio C Hamano >> Signed-off-by: Ævar Arnfjörð Bjarmason > > Here I'd want to put a pile of acks, e.g.: > > Acked-by: Linus Torvalds > Acked-by: brian m. carlson > Acked-by: Johannes Schindelin > Acked-by: Dan Shumow > Acked-by: Junio C Hamano > > [...] >> --- a/Documentation/technical/hash-function-transition.txt >> +++ b/Documentation/technical/hash-function-transition.txt >> @@ -59,14 +59,11 @@ that are believed to be cryptographically secure. >> >> Goals >> - >> -Where NewHash is a strong 256-bit hash function to replace SHA-1 (see >> -"Selection of a New Hash", below): >> - >> -1. The transition to NewHash can be done one local repository at a time. >> +1. The transition to SHA-256 can be done one local repository at a time. > > Yay! > > [...] >> [extensions] >> -objectFormat = newhash >> +objectFormat = sha256 >> compatObjectFormat = sha1 > > Yes, makes sense. > > [...] >> @@ -155,36 +152,36 @@ repository extensions. >> Object names >> >> Objects can be named by their 40 hexadecimal digit sha1-name or 64 >> -hexadecimal digit newhash-name, plus names derived from those (see >> +hexadecimal digit sha256-name, plus names derived from those (see >> gitrevisions(7)). >> >> The sha1-name of an object is the SHA-1 of the concatenation of its >> type, length, a
Re: [PATCH v2 2/2] doc hash-function-transition: pick SHA-256 as NewHash
On Fri, Aug 3, 2018 at 9:40 AM Junio C Hamano wrote: > > > [...] > >> - - 20-byte NewHash checksum of all of the above. > >> + - 20-byte SHA-256 checksum of all of the above. > > > > Likewise. > > Hmph, I've always assumed since NewHash plan was written that this > part was not about tamper resistance but was about bit-flipping > detection and it was deliberate to stick to 20-byte sum, truncating > as necessary. Yeah, in fact, since this was one area where people actually had performance issues with the hash, it might be worth considering _weakening_ the hash. Things like the pack index files (and just the regular file index) are entirely local, and the SHA1 in those is really just a fancy CRC. It's really just "good protection against disk corruption" (it happens), and in fact you cannot use it as protection against active tampering, since there's no secret there and any active attacker that has access to your local filesystem could just recompute the hash anyway. And because they are local anyway and aren't really transported (modulo "shared repositories" where you use them across users or legacy rsync-like synchronization), they can be handled separately from any hashing changes. The pack and index file formats have in fact been changed before. It might make sense to either keep it as SHA1 (just to minimize any changes) or if there are still issues with index file performance it could even be made to use something fast-but-not-cryptographic like just making it use crc32(). Remember: one of the original core git design requirements was "corruption detection". For normal hashed objects, that came naturally, and the normal object store additionally has active tamper protection thanks to the interconnected nature of the hashes and the distribution of the objects. But for things like packfiles and the file index, it is just a separate checksum. There is no tamper protection anyway, since anybody who can modify them directly can just recompute the hash checksum. The fact that both of these things used SHA1 was more of a convenience issue than anything else, and the pack/index file checksum is fundamentally not cryptographic (but a cryptographic hash is obviously by definition also a very good corruption detector). Linus
Re: [PATCH v2 2/2] doc hash-function-transition: pick SHA-256 as NewHash
On Fri, Aug 3, 2018 at 12:20 AM Jonathan Nieder wrote: > > > Here I'd want to put a pile of acks, e.g.: > > Acked-by: Linus Torvalds > .. Sure, feel free to add my Ack for this. Linus
Re: [PATCH v2 2/2] doc hash-function-transition: pick SHA-256 as NewHash
Jonathan Nieder writes: > Sorry for the slow review. I finally got a chance to look this over > again. > > My main nits are about the commit message: I think it still focuses > too much on the process instead of the usual "knowing what I know now, > here's the clearest explanation for why we need this patch" approach. > I can send a patch illustrating what I mean tomorrow morning. I'll turn 'Will merge to next' to 'Hold' for now. >> Object format >> ~ >> The content as a byte sequence of a tag, commit, or tree object named >> -by sha1 and newhash differ because an object named by newhash-name refers to >> +by sha1 and sha256 differ because an object named by sha256-name refers to > > Not about this patch: this should say SHA-1 and SHA-256, I think. > Leaving it as is in this patch as you did is the right thing. Perhaps deserves a "NEEDSWORK" comment, though. > [...] >> @@ -255,10 +252,10 @@ network byte order): >>up to and not including the table of CRC32 values. >> - Zero or more NUL bytes. >> - The trailer consists of the following: >> - - A copy of the 20-byte NewHash checksum at the end of the >> + - A copy of the 20-byte SHA-256 checksum at the end of the > > Not about this patch: a SHA-256 is 32 bytes. Leaving that for a > separate patch as you did is the right thing, though. > > [...] >> - - 20-byte NewHash checksum of all of the above. >> + - 20-byte SHA-256 checksum of all of the above. > > Likewise. Hmph, I've always assumed since NewHash plan was written that this part was not about tamper resistance but was about bit-flipping detection and it was deliberate to stick to 20-byte sum, truncating as necessary. It definitely is a good idea to leave it for a separate patch to update this part. Thanks.
Re: [PATCH v2 2/2] doc hash-function-transition: pick SHA-256 as NewHash
Hi again, Sorry for the slow review. I finally got a chance to look this over again. My main nits are about the commit message: I think it still focuses too much on the process instead of the usual "knowing what I know now, here's the clearest explanation for why we need this patch" approach. I can send a patch illustrating what I mean tomorrow morning. Ævar Arnfjörð Bjarmason wrote: > From a security perspective, it seems that SHA-256, BLAKE2, SHA3-256, > K12, and so on are all believed to have similar security properties. > All are good options from a security point of view. > > SHA-256 has a number of advantages: > > * It has been around for a while, is widely used, and is supported by > just about every single crypto library (OpenSSL, mbedTLS, CryptoNG, > SecureTransport, etc). > > * When you compare against SHA1DC, most vectorized SHA-256 > implementations are indeed faster, even without acceleration. > > * If we're doing signatures with OpenPGP (or even, I suppose, CMS), > we're going to be using SHA-2, so it doesn't make sense to have our > security depend on two separate algorithms when either one of them > alone could break the security when we could just depend on one. > > So SHA-256 it is. The above is what I wrote, so of course I'd like it. ;-) >See the "Hash algorithm analysis" thread as of > [1]. Linus has come around to this choice and suggested Junio make the > final pick, and he's endorsed SHA-256 [3]. The above paragraph has the same problem as before of (1) not being self-contained and (2) focusing on the process that led to this patch instead of the benefit of the patch itself. I think we should omit it. > This follow-up change changes occurrences of "NewHash" to > "SHA-256" (or "sha256", depending on the context). The "Selection of a > New Hash" section has also been changed to note that historically we > used the the "NewHash" name while we didn't know what the new hash > function would be. nit: Commit messages are usually in the imperative tense. This is in the past tense, I think because it is a continuation of that discussion about process. For this part, I think we can let the patch speak for itself. > This leaves no use of "NewHash" anywhere in git.git except in the > aforementioned section (and as a variable name in t/t9700/test.pl, but > that use from 2008 has nothing to do with this transition plan). This part is helpful --- good. > 1. > https://public-inbox.org/git/20180720215220.gb18...@genre.crustytoothpaste.net/ > 2. > https://public-inbox.org/git/ca+55afwse9bf8e0hlk9pp3fvd5lavy5grdsv3fbntgzekja...@mail.gmail.com/ > 3. https://public-inbox.org/git/xmqqzhygwd5o@gitster-ct.c.googlers.com/ Footnotes to the historical part --- I'd recommend removing these. > Helped-by: Jonathan Nieder > Helped-by: Junio C Hamano > Signed-off-by: Ævar Arnfjörð Bjarmason Here I'd want to put a pile of acks, e.g.: Acked-by: Linus Torvalds Acked-by: brian m. carlson Acked-by: Johannes Schindelin Acked-by: Dan Shumow Acked-by: Junio C Hamano [...] > --- a/Documentation/technical/hash-function-transition.txt > +++ b/Documentation/technical/hash-function-transition.txt > @@ -59,14 +59,11 @@ that are believed to be cryptographically secure. > > Goals > - > -Where NewHash is a strong 256-bit hash function to replace SHA-1 (see > -"Selection of a New Hash", below): > - > -1. The transition to NewHash can be done one local repository at a time. > +1. The transition to SHA-256 can be done one local repository at a time. Yay! [...] > [extensions] > - objectFormat = newhash > + objectFormat = sha256 > compatObjectFormat = sha1 Yes, makes sense. [...] > @@ -155,36 +152,36 @@ repository extensions. > Object names > > Objects can be named by their 40 hexadecimal digit sha1-name or 64 > -hexadecimal digit newhash-name, plus names derived from those (see > +hexadecimal digit sha256-name, plus names derived from those (see > gitrevisions(7)). > > The sha1-name of an object is the SHA-1 of the concatenation of its > type, length, a nul byte, and the object's sha1-content. This is the > traditional used in Git to name objects. > > -The newhash-name of an object is the NewHash of the concatenation of its > -type, length, a nul byte, and the object's newhash-content. > +The sha256-name of an object is the SHA-256 of the concatenation of its > +type, length, a nul byte, and the object's sha256-content. Sensible. [...] > > Object format > ~ > The content as a byte sequence of a tag, commit, or tree object named > -by sha1 and newhash differ because an object named by newhash-name refers to > +by sha1 and sha256 differ because an object named by sha256-name refers to Not about this patch: this should say SHA-1 and SHA-256, I think. Leaving it as is in this patch as you did is the right thing. [...] > @@ -255,10 +252,10 @@ network byte order): >
[PATCH v2 2/2] doc hash-function-transition: pick SHA-256 as NewHash
>From a security perspective, it seems that SHA-256, BLAKE2, SHA3-256, K12, and so on are all believed to have similar security properties. All are good options from a security point of view. SHA-256 has a number of advantages: * It has been around for a while, is widely used, and is supported by just about every single crypto library (OpenSSL, mbedTLS, CryptoNG, SecureTransport, etc). * When you compare against SHA1DC, most vectorized SHA-256 implementations are indeed faster, even without acceleration. * If we're doing signatures with OpenPGP (or even, I suppose, CMS), we're going to be using SHA-2, so it doesn't make sense to have our security depend on two separate algorithms when either one of them alone could break the security when we could just depend on one. So SHA-256 it is. See the "Hash algorithm analysis" thread as of [1]. Linus has come around to this choice and suggested Junio make the final pick, and he's endorsed SHA-256 [3]. This follow-up change changes occurrences of "NewHash" to "SHA-256" (or "sha256", depending on the context). The "Selection of a New Hash" section has also been changed to note that historically we used the the "NewHash" name while we didn't know what the new hash function would be. This leaves no use of "NewHash" anywhere in git.git except in the aforementioned section (and as a variable name in t/t9700/test.pl, but that use from 2008 has nothing to do with this transition plan). 1. https://public-inbox.org/git/20180720215220.gb18...@genre.crustytoothpaste.net/ 2. https://public-inbox.org/git/ca+55afwse9bf8e0hlk9pp3fvd5lavy5grdsv3fbntgzekja...@mail.gmail.com/ 3. https://public-inbox.org/git/xmqqzhygwd5o@gitster-ct.c.googlers.com/ Helped-by: Jonathan Nieder Helped-by: Junio C Hamano Signed-off-by: Ævar Arnfjörð Bjarmason --- On Wed, Jul 25 2018, Junio C Hamano wrote: > Jonathan Nieder writes: > [...] > All interesting ideas and good suggestions. I'll leave 2/2 in the > mail archive and take only 1/2 for now. I'd expect the final > version, not too soon after mulling over the suggestions raised > here, but not in too distant future to prevent us from forgetting > ;-) Here's a v2 which uses the suggestions for both the commit message & documentation from Jonathan and yourself. Thanks! .../technical/hash-function-transition.txt| 196 +- 1 file changed, 99 insertions(+), 97 deletions(-) diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt index 5ee4754adb..5041e57273 100644 --- a/Documentation/technical/hash-function-transition.txt +++ b/Documentation/technical/hash-function-transition.txt @@ -59,14 +59,11 @@ that are believed to be cryptographically secure. Goals - -Where NewHash is a strong 256-bit hash function to replace SHA-1 (see -"Selection of a New Hash", below): - -1. The transition to NewHash can be done one local repository at a time. +1. The transition to SHA-256 can be done one local repository at a time. a. Requiring no action by any other party. - b. A NewHash repository can communicate with SHA-1 Git servers + b. A SHA-256 repository can communicate with SHA-1 Git servers (push/fetch). - c. Users can use SHA-1 and NewHash identifiers for objects + c. Users can use SHA-1 and SHA-256 identifiers for objects interchangeably (see "Object names on the command line", below). d. New signed objects make use of a stronger hash function than SHA-1 for their security guarantees. @@ -79,7 +76,7 @@ Where NewHash is a strong 256-bit hash function to replace SHA-1 (see Non-Goals - -1. Add NewHash support to Git protocol. This is valuable and the +1. Add SHA-256 support to Git protocol. This is valuable and the logical next step but it is out of scope for this initial design. 2. Transparently improving the security of existing SHA-1 signed objects. @@ -87,26 +84,26 @@ Non-Goals repository. 4. Taking the opportunity to fix other bugs in Git's formats and protocols. -5. Shallow clones and fetches into a NewHash repository. (This will - change when we add NewHash support to Git protocol.) -6. Skip fetching some submodules of a project into a NewHash - repository. (This also depends on NewHash support in Git +5. Shallow clones and fetches into a SHA-256 repository. (This will + change when we add SHA-256 support to Git protocol.) +6. Skip fetching some submodules of a project into a SHA-256 + repository. (This also depends on SHA-256 support in Git protocol.) Overview We introduce a new repository format extension. Repositories with this -extension enabled use NewHash instead of SHA-1 to name their objects. +extension enabled use SHA-256 instead of SHA-1 to name their objects. This affects both object names and object content --- both the names of objects and all references to other objects within an object are switched to the new hash function.