Re: [RFC PATCH 09/12] Add a base implementation of SHA-256 support

2018-08-29 Thread brian m. carlson
On Wed, Aug 29, 2018 at 11:32:08AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Aug 29 2018, brian m. carlson wrote:
> 
> > SHA-1 is weak and we need to transition to a new hash function.  For
> > some time, we have referred to this new function as NewHash.
> >
> > The selection criteria for NewHash specify that it should (a) be 256
> > bits in length, (b) have high quality implementations available, (c)
> > should match Git's needs in terms of security, and (d) ideally, be fast
> > to compute.
> >
> > SHA-256 has a variety of high quality implementations across various
> > libraries.  It is implemented by every cryptographic library we support
> > and is available on every platform and in almost every programming
> > language.  It is often highly optimized, since it is commonly used in
> > TLS and elsewhere.  Additionally, there are various command line
> > utilities that implement it, which is useful for educational and testing
> > purposes.
> >
> > SHA-256 is presently considered secure and has received a reasonable
> > amount of cryptanalysis in the literature.  It is, admittedly, not
> > resistant to length extension attacks, but Git object storage is immune
> > to those due to the length field at the beginning.
> >
> > SHA-256 is somewhat slower to compute than SHA-1 in software.  However,
> > since our default SHA-1 implementation is collision-detecting, a
> > reasonable cryptographic library implementation of SHA-256 will actually
> > be faster than SHA-256.  In addition, modern ARM and AMD processors (and
> > some Intel processors) contain instructions for implementing SHA-256 in
> > hardware, making it the fastest possible option.
> >
> > There are other reasons to select SHA-256.  With signed commits and
> > tags, it's possible to use SHA-256 for signatures and therefore have to
> > rely on only one hash algorithm for security.
> 
> None of this is wrong, but I think this would be better off as a simple
> "See Documentation/technical/hash-function-transition.txt for why we're
> switching to SHA-256", and to the extent that something is said here
> that isn't said there it could be a patch to amend that document.

I can certainly shorten this somewhat.  I wrote this back when there
wasn't a consensus on hash algorithm and Junio was going to leave it to
me to make a decision.  I was therefore obligated to provide a coherent
rationale for that decision.

> > Add a basic implementation of SHA-256 based off libtomcrypt, which is in
> > the public domain.  Optimize it and tidy it somewhat.
> 
> For future changes & maintenance of this, let's do that in two
> steps. One where we add the upstream code as-is, and another where the
> tidying / cleanup / git specific stuff is wired, which makes it easy to
> audit upstream as-is v.s. our changes in isolation. Also in the first of
> those commits, say in the commit message "add a [libtomcrypt] copy from
> such-and-such a URL at such-and-such a version", so that it's easy to
> reproduce the import & find out how to re-update it.

Doing what you suggest basically means importing a large amount of
libtomcrypt into our codebase, since there are a large number of reused
macros all over libtomcrypt (such as for processing a generic hash and
for memcpy).

This isn't surprising for a general purpose crypto library, but I did a
decent amount to change it, condense it into a small number of files,
and make it meet our code standards.  The phrase "somewhat" may have
been an understatement.

This is also why I added tests: because I'm human and making a small
change in a crypto library can result in wrong output very quickly.

> Is this something we see ourselves perma-forking? Or as with sha1dc are
> we likely to pull in upstream changes from time-to-time?SHA256 obiously
> isn't under active development, but there's been some churn in the
> upstream code since it was added, and if you're doing some optimizing /
> tidying that's presumably something upstream could benefit from as well,
> as well as just us being nicer open source citizens feeding
> e.g. portability fixes to upstream (since git tends to get ported a
> lot).

This is a permafork.  We need a basic SHA-256 implementation, and this
one was faster than the one I'd written some time ago.  Similarly to the
block-sha1 implementation, I see this as something that we'll be
shipping forever with little updating.

I expect with the amount of changes we're making, they're unlikely to
want our code.  Also, any changes to our code would be under the GPLv2,
which would be unappealing to a public domain library.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [RFC PATCH 09/12] Add a base implementation of SHA-256 support

2018-08-29 Thread Derrick Stolee

On 8/28/2018 8:58 PM, brian m. carlson wrote:

SHA-256 is somewhat slower to compute than SHA-1 in software.  However,
since our default SHA-1 implementation is collision-detecting, a
reasonable cryptographic library implementation of SHA-256 will actually
be faster than SHA-256.


Nit: do you mean "faster than SHA-1DC"?



Re: [RFC PATCH 09/12] Add a base implementation of SHA-256 support

2018-08-29 Thread Ævar Arnfjörð Bjarmason


On Wed, Aug 29 2018, brian m. carlson wrote:

> SHA-1 is weak and we need to transition to a new hash function.  For
> some time, we have referred to this new function as NewHash.
>
> The selection criteria for NewHash specify that it should (a) be 256
> bits in length, (b) have high quality implementations available, (c)
> should match Git's needs in terms of security, and (d) ideally, be fast
> to compute.
>
> SHA-256 has a variety of high quality implementations across various
> libraries.  It is implemented by every cryptographic library we support
> and is available on every platform and in almost every programming
> language.  It is often highly optimized, since it is commonly used in
> TLS and elsewhere.  Additionally, there are various command line
> utilities that implement it, which is useful for educational and testing
> purposes.
>
> SHA-256 is presently considered secure and has received a reasonable
> amount of cryptanalysis in the literature.  It is, admittedly, not
> resistant to length extension attacks, but Git object storage is immune
> to those due to the length field at the beginning.
>
> SHA-256 is somewhat slower to compute than SHA-1 in software.  However,
> since our default SHA-1 implementation is collision-detecting, a
> reasonable cryptographic library implementation of SHA-256 will actually
> be faster than SHA-256.  In addition, modern ARM and AMD processors (and
> some Intel processors) contain instructions for implementing SHA-256 in
> hardware, making it the fastest possible option.
>
> There are other reasons to select SHA-256.  With signed commits and
> tags, it's possible to use SHA-256 for signatures and therefore have to
> rely on only one hash algorithm for security.

None of this is wrong, but I think this would be better off as a simple
"See Documentation/technical/hash-function-transition.txt for why we're
switching to SHA-256", and to the extent that something is said here
that isn't said there it could be a patch to amend that document.

> Add a basic implementation of SHA-256 based off libtomcrypt, which is in
> the public domain.  Optimize it and tidy it somewhat.

For future changes & maintenance of this, let's do that in two
steps. One where we add the upstream code as-is, and another where the
tidying / cleanup / git specific stuff is wired, which makes it easy to
audit upstream as-is v.s. our changes in isolation. Also in the first of
those commits, say in the commit message "add a [libtomcrypt] copy from
such-and-such a URL at such-and-such a version", so that it's easy to
reproduce the import & find out how to re-update it.

Is this something we see ourselves perma-forking? Or as with sha1dc are
we likely to pull in upstream changes from time-to-time?SHA256 obiously
isn't under active development, but there's been some churn in the
upstream code since it was added, and if you're doing some optimizing /
tidying that's presumably something upstream could benefit from as well,
as well as just us being nicer open source citizens feeding
e.g. portability fixes to upstream (since git tends to get ported a
lot).

So I wonder if we can't convince them to add a few macros to their code,
and then do something like what I did in a0103914c2 ("sha1dc: update
from upstream", 2017-05-20) for sha1dc allowing us to use their code
as-is with some defines in the Makefile, which both makes it easier to
update, and sets up a process where our default approach is to submit
changes upstream, instead of working on our perma-fork.


[RFC PATCH 09/12] Add a base implementation of SHA-256 support

2018-08-28 Thread brian m. carlson
SHA-1 is weak and we need to transition to a new hash function.  For
some time, we have referred to this new function as NewHash.

The selection criteria for NewHash specify that it should (a) be 256
bits in length, (b) have high quality implementations available, (c)
should match Git's needs in terms of security, and (d) ideally, be fast
to compute.

SHA-256 has a variety of high quality implementations across various
libraries.  It is implemented by every cryptographic library we support
and is available on every platform and in almost every programming
language.  It is often highly optimized, since it is commonly used in
TLS and elsewhere.  Additionally, there are various command line
utilities that implement it, which is useful for educational and testing
purposes.

SHA-256 is presently considered secure and has received a reasonable
amount of cryptanalysis in the literature.  It is, admittedly, not
resistant to length extension attacks, but Git object storage is immune
to those due to the length field at the beginning.

SHA-256 is somewhat slower to compute than SHA-1 in software.  However,
since our default SHA-1 implementation is collision-detecting, a
reasonable cryptographic library implementation of SHA-256 will actually
be faster than SHA-256.  In addition, modern ARM and AMD processors (and
some Intel processors) contain instructions for implementing SHA-256 in
hardware, making it the fastest possible option.

There are other reasons to select SHA-256.  With signed commits and
tags, it's possible to use SHA-256 for signatures and therefore have to
rely on only one hash algorithm for security.

Add a basic implementation of SHA-256 based off libtomcrypt, which is in
the public domain.  Optimize it and tidy it somewhat.

Wire up SHA-256 in the list of hash algorithms, and add a test that the
algorithm works correctly.

Note that with this patch, it is still not possible to switch to using
SHA-256 in Git.  Additional patches are needed to prepare the code to
handle a larger hash algorithm and further test fixes are needed.

Signed-off-by: brian m. carlson 
---
 Makefile   |   4 +
 cache.h|  12 ++-
 hash.h |  19 -
 sha1-file.c|  45 +++
 sha256/block/sha256.c  | 180 +
 sha256/block/sha256.h  |  26 ++
 t/helper/test-sha256.c |   7 ++
 t/helper/test-tool.c   |   1 +
 t/helper/test-tool.h   |   1 +
 t/t0014-hash.sh|  25 ++
 10 files changed, 316 insertions(+), 4 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 t/helper/test-sha256.c

diff --git a/Makefile b/Makefile
index e047dea988..624d852e79 100644
--- a/Makefile
+++ b/Makefile
@@ -733,6 +733,7 @@ TEST_BUILTINS_OBJS += test-run-command.o
 TEST_BUILTINS_OBJS += test-scrap-cache-tree.o
 TEST_BUILTINS_OBJS += test-sha1-array.o
 TEST_BUILTINS_OBJS += test-sha1.o
+TEST_BUILTINS_OBJS += test-sha256.o
 TEST_BUILTINS_OBJS += test-sigchain.o
 TEST_BUILTINS_OBJS += test-strcmp-offset.o
 TEST_BUILTINS_OBJS += test-string-list.o
@@ -1623,6 +1624,9 @@ endif
 endif
 endif
 
+LIB_OBJS += sha256/block/sha256.o
+BASIC_CFLAGS += -DSHA256_BLK
+
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
BASIC_CFLAGS += -DSHA1_MAX_BLOCK_SIZE="$(SHA1_MAX_BLOCK_SIZE)"
diff --git a/cache.h b/cache.h
index c1b5a7a337..5a35497b34 100644
--- a/cache.h
+++ b/cache.h
@@ -48,11 +48,17 @@ unsigned long git_deflate_bound(git_zstream *, unsigned 
long);
 /* The block size of SHA-1. */
 #define GIT_SHA1_BLKSZ 64
 
+/* The length in bytes and in hex digits of an object name (SHA-256 value). */
+#define GIT_SHA256_RAWSZ 32
+#define GIT_SHA256_HEXSZ (2 * GIT_SHA256_RAWSZ)
+/* The block size of SHA-256. */
+#define GIT_SHA256_BLKSZ 64
+
 /* The length in byte and in hex digits of the largest possible hash value. */
-#define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
-#define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+#define GIT_MAX_RAWSZ GIT_SHA256_RAWSZ
+#define GIT_MAX_HEXSZ GIT_SHA256_HEXSZ
 /* The largest possible block size for any supported hash. */
-#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
+#define GIT_MAX_BLKSZ GIT_SHA256_BLKSZ
 
 struct object_id {
unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 46dff69eb3..88d18896d7 100644
--- a/hash.h
+++ b/hash.h
@@ -15,6 +15,8 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#include "sha256/block/sha256.h"
+
 #ifndef platform_SHA_CTX
 /*
  * platform's underlying implementation of SHA-1; could be OpenSSL,
@@ -34,6 +36,18 @@
 #define git_SHA1_Updateplatform_SHA1_Update
 #define git_SHA1_Final platform_SHA1_Final
 
+#ifndef platform_SHA256_CTX
+#define platform_SHA256_CTXSHA256_CTX
+#define platform_SHA256_Init   SHA256_Init
+#define platform_SHA256_Update SHA256_Update
+#define platform_SHA256_Final  SHA256_Final
+#endif
+
+#define git_SHA256_CTX platform_SHA256_CTX
+#define git_SHA256_Init