Re: [RFC PATCH 09/12] Add a base implementation of SHA-256 support
On Wed, Aug 29, 2018 at 11:32:08AM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Aug 29 2018, brian m. carlson wrote: > > > SHA-1 is weak and we need to transition to a new hash function. For > > some time, we have referred to this new function as NewHash. > > > > The selection criteria for NewHash specify that it should (a) be 256 > > bits in length, (b) have high quality implementations available, (c) > > should match Git's needs in terms of security, and (d) ideally, be fast > > to compute. > > > > SHA-256 has a variety of high quality implementations across various > > libraries. It is implemented by every cryptographic library we support > > and is available on every platform and in almost every programming > > language. It is often highly optimized, since it is commonly used in > > TLS and elsewhere. Additionally, there are various command line > > utilities that implement it, which is useful for educational and testing > > purposes. > > > > SHA-256 is presently considered secure and has received a reasonable > > amount of cryptanalysis in the literature. It is, admittedly, not > > resistant to length extension attacks, but Git object storage is immune > > to those due to the length field at the beginning. > > > > SHA-256 is somewhat slower to compute than SHA-1 in software. However, > > since our default SHA-1 implementation is collision-detecting, a > > reasonable cryptographic library implementation of SHA-256 will actually > > be faster than SHA-256. In addition, modern ARM and AMD processors (and > > some Intel processors) contain instructions for implementing SHA-256 in > > hardware, making it the fastest possible option. > > > > There are other reasons to select SHA-256. With signed commits and > > tags, it's possible to use SHA-256 for signatures and therefore have to > > rely on only one hash algorithm for security. > > None of this is wrong, but I think this would be better off as a simple > "See Documentation/technical/hash-function-transition.txt for why we're > switching to SHA-256", and to the extent that something is said here > that isn't said there it could be a patch to amend that document. I can certainly shorten this somewhat. I wrote this back when there wasn't a consensus on hash algorithm and Junio was going to leave it to me to make a decision. I was therefore obligated to provide a coherent rationale for that decision. > > Add a basic implementation of SHA-256 based off libtomcrypt, which is in > > the public domain. Optimize it and tidy it somewhat. > > For future changes & maintenance of this, let's do that in two > steps. One where we add the upstream code as-is, and another where the > tidying / cleanup / git specific stuff is wired, which makes it easy to > audit upstream as-is v.s. our changes in isolation. Also in the first of > those commits, say in the commit message "add a [libtomcrypt] copy from > such-and-such a URL at such-and-such a version", so that it's easy to > reproduce the import & find out how to re-update it. Doing what you suggest basically means importing a large amount of libtomcrypt into our codebase, since there are a large number of reused macros all over libtomcrypt (such as for processing a generic hash and for memcpy). This isn't surprising for a general purpose crypto library, but I did a decent amount to change it, condense it into a small number of files, and make it meet our code standards. The phrase "somewhat" may have been an understatement. This is also why I added tests: because I'm human and making a small change in a crypto library can result in wrong output very quickly. > Is this something we see ourselves perma-forking? Or as with sha1dc are > we likely to pull in upstream changes from time-to-time?SHA256 obiously > isn't under active development, but there's been some churn in the > upstream code since it was added, and if you're doing some optimizing / > tidying that's presumably something upstream could benefit from as well, > as well as just us being nicer open source citizens feeding > e.g. portability fixes to upstream (since git tends to get ported a > lot). This is a permafork. We need a basic SHA-256 implementation, and this one was faster than the one I'd written some time ago. Similarly to the block-sha1 implementation, I see this as something that we'll be shipping forever with little updating. I expect with the amount of changes we're making, they're unlikely to want our code. Also, any changes to our code would be under the GPLv2, which would be unappealing to a public domain library. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [RFC PATCH 09/12] Add a base implementation of SHA-256 support
On 8/28/2018 8:58 PM, brian m. carlson wrote: SHA-256 is somewhat slower to compute than SHA-1 in software. However, since our default SHA-1 implementation is collision-detecting, a reasonable cryptographic library implementation of SHA-256 will actually be faster than SHA-256. Nit: do you mean "faster than SHA-1DC"?
Re: [RFC PATCH 09/12] Add a base implementation of SHA-256 support
On Wed, Aug 29 2018, brian m. carlson wrote: > SHA-1 is weak and we need to transition to a new hash function. For > some time, we have referred to this new function as NewHash. > > The selection criteria for NewHash specify that it should (a) be 256 > bits in length, (b) have high quality implementations available, (c) > should match Git's needs in terms of security, and (d) ideally, be fast > to compute. > > SHA-256 has a variety of high quality implementations across various > libraries. It is implemented by every cryptographic library we support > and is available on every platform and in almost every programming > language. It is often highly optimized, since it is commonly used in > TLS and elsewhere. Additionally, there are various command line > utilities that implement it, which is useful for educational and testing > purposes. > > SHA-256 is presently considered secure and has received a reasonable > amount of cryptanalysis in the literature. It is, admittedly, not > resistant to length extension attacks, but Git object storage is immune > to those due to the length field at the beginning. > > SHA-256 is somewhat slower to compute than SHA-1 in software. However, > since our default SHA-1 implementation is collision-detecting, a > reasonable cryptographic library implementation of SHA-256 will actually > be faster than SHA-256. In addition, modern ARM and AMD processors (and > some Intel processors) contain instructions for implementing SHA-256 in > hardware, making it the fastest possible option. > > There are other reasons to select SHA-256. With signed commits and > tags, it's possible to use SHA-256 for signatures and therefore have to > rely on only one hash algorithm for security. None of this is wrong, but I think this would be better off as a simple "See Documentation/technical/hash-function-transition.txt for why we're switching to SHA-256", and to the extent that something is said here that isn't said there it could be a patch to amend that document. > Add a basic implementation of SHA-256 based off libtomcrypt, which is in > the public domain. Optimize it and tidy it somewhat. For future changes & maintenance of this, let's do that in two steps. One where we add the upstream code as-is, and another where the tidying / cleanup / git specific stuff is wired, which makes it easy to audit upstream as-is v.s. our changes in isolation. Also in the first of those commits, say in the commit message "add a [libtomcrypt] copy from such-and-such a URL at such-and-such a version", so that it's easy to reproduce the import & find out how to re-update it. Is this something we see ourselves perma-forking? Or as with sha1dc are we likely to pull in upstream changes from time-to-time?SHA256 obiously isn't under active development, but there's been some churn in the upstream code since it was added, and if you're doing some optimizing / tidying that's presumably something upstream could benefit from as well, as well as just us being nicer open source citizens feeding e.g. portability fixes to upstream (since git tends to get ported a lot). So I wonder if we can't convince them to add a few macros to their code, and then do something like what I did in a0103914c2 ("sha1dc: update from upstream", 2017-05-20) for sha1dc allowing us to use their code as-is with some defines in the Makefile, which both makes it easier to update, and sets up a process where our default approach is to submit changes upstream, instead of working on our perma-fork.
[RFC PATCH 09/12] Add a base implementation of SHA-256 support
SHA-1 is weak and we need to transition to a new hash function. For some time, we have referred to this new function as NewHash. The selection criteria for NewHash specify that it should (a) be 256 bits in length, (b) have high quality implementations available, (c) should match Git's needs in terms of security, and (d) ideally, be fast to compute. SHA-256 has a variety of high quality implementations across various libraries. It is implemented by every cryptographic library we support and is available on every platform and in almost every programming language. It is often highly optimized, since it is commonly used in TLS and elsewhere. Additionally, there are various command line utilities that implement it, which is useful for educational and testing purposes. SHA-256 is presently considered secure and has received a reasonable amount of cryptanalysis in the literature. It is, admittedly, not resistant to length extension attacks, but Git object storage is immune to those due to the length field at the beginning. SHA-256 is somewhat slower to compute than SHA-1 in software. However, since our default SHA-1 implementation is collision-detecting, a reasonable cryptographic library implementation of SHA-256 will actually be faster than SHA-256. In addition, modern ARM and AMD processors (and some Intel processors) contain instructions for implementing SHA-256 in hardware, making it the fastest possible option. There are other reasons to select SHA-256. With signed commits and tags, it's possible to use SHA-256 for signatures and therefore have to rely on only one hash algorithm for security. Add a basic implementation of SHA-256 based off libtomcrypt, which is in the public domain. Optimize it and tidy it somewhat. Wire up SHA-256 in the list of hash algorithms, and add a test that the algorithm works correctly. Note that with this patch, it is still not possible to switch to using SHA-256 in Git. Additional patches are needed to prepare the code to handle a larger hash algorithm and further test fixes are needed. Signed-off-by: brian m. carlson --- Makefile | 4 + cache.h| 12 ++- hash.h | 19 - sha1-file.c| 45 +++ sha256/block/sha256.c | 180 + sha256/block/sha256.h | 26 ++ t/helper/test-sha256.c | 7 ++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t0014-hash.sh| 25 ++ 10 files changed, 316 insertions(+), 4 deletions(-) create mode 100644 sha256/block/sha256.c create mode 100644 sha256/block/sha256.h create mode 100644 t/helper/test-sha256.c diff --git a/Makefile b/Makefile index e047dea988..624d852e79 100644 --- a/Makefile +++ b/Makefile @@ -733,6 +733,7 @@ TEST_BUILTINS_OBJS += test-run-command.o TEST_BUILTINS_OBJS += test-scrap-cache-tree.o TEST_BUILTINS_OBJS += test-sha1-array.o TEST_BUILTINS_OBJS += test-sha1.o +TEST_BUILTINS_OBJS += test-sha256.o TEST_BUILTINS_OBJS += test-sigchain.o TEST_BUILTINS_OBJS += test-strcmp-offset.o TEST_BUILTINS_OBJS += test-string-list.o @@ -1623,6 +1624,9 @@ endif endif endif +LIB_OBJS += sha256/block/sha256.o +BASIC_CFLAGS += -DSHA256_BLK + ifdef SHA1_MAX_BLOCK_SIZE LIB_OBJS += compat/sha1-chunked.o BASIC_CFLAGS += -DSHA1_MAX_BLOCK_SIZE="$(SHA1_MAX_BLOCK_SIZE)" diff --git a/cache.h b/cache.h index c1b5a7a337..5a35497b34 100644 --- a/cache.h +++ b/cache.h @@ -48,11 +48,17 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long); /* The block size of SHA-1. */ #define GIT_SHA1_BLKSZ 64 +/* The length in bytes and in hex digits of an object name (SHA-256 value). */ +#define GIT_SHA256_RAWSZ 32 +#define GIT_SHA256_HEXSZ (2 * GIT_SHA256_RAWSZ) +/* The block size of SHA-256. */ +#define GIT_SHA256_BLKSZ 64 + /* The length in byte and in hex digits of the largest possible hash value. */ -#define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ -#define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ +#define GIT_MAX_RAWSZ GIT_SHA256_RAWSZ +#define GIT_MAX_HEXSZ GIT_SHA256_HEXSZ /* The largest possible block size for any supported hash. */ -#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ +#define GIT_MAX_BLKSZ GIT_SHA256_BLKSZ struct object_id { unsigned char hash[GIT_MAX_RAWSZ]; diff --git a/hash.h b/hash.h index 46dff69eb3..88d18896d7 100644 --- a/hash.h +++ b/hash.h @@ -15,6 +15,8 @@ #include "block-sha1/sha1.h" #endif +#include "sha256/block/sha256.h" + #ifndef platform_SHA_CTX /* * platform's underlying implementation of SHA-1; could be OpenSSL, @@ -34,6 +36,18 @@ #define git_SHA1_Updateplatform_SHA1_Update #define git_SHA1_Final platform_SHA1_Final +#ifndef platform_SHA256_CTX +#define platform_SHA256_CTXSHA256_CTX +#define platform_SHA256_Init SHA256_Init +#define platform_SHA256_Update SHA256_Update +#define platform_SHA256_Final SHA256_Final +#endif + +#define git_SHA256_CTX platform_SHA256_CTX +#define git_SHA256_Init