Re: git, monorepos, and access control

2018-12-05 Thread brian m. carlson
On Wed, Dec 05, 2018 at 04:01:05PM -0500, Jeff King wrote:
> You could work around that by teaching the server to refuse to use "X"
> in any way when the client does not have the right permissions. But:
> 
>   - that's just one example; there are probably a number of such leaks
> 
>   - you're fighting an uphill battle against the way Git is implemented;
> there's no single point to enforce this kind of segmentation
> 
> The model that fits more naturally with how Git is implemented would be
> to use submodules. There you leak the hash of the commit from the
> private submodule, but that's probably obscure enough (and if you're
> really worried, you can add a random nonce to the commit messages in the
> submodule to make their hashes unguessable).

I tend to agree that submodules are the way to go here over a monorepo.
Not only do you have much better access control opportunities, in
general, smaller repositories are going to perform better and be easier
to work with than monorepos. While VFS for Git is a great technology,
using a set of smaller repositories is going to outperform it every
time, both on developer machines, and on your source code hosting
platform.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc

2018-12-03 Thread brian m. carlson
On Mon, Dec 03, 2018 at 03:37:35PM -0800, Jonathan Tan wrote:
> Signed-off-by: Jonathan Tan 
> ---
>  Documentation/technical/packfile-uri.txt | 83 
>  Documentation/technical/protocol-v2.txt  |  6 +-
>  2 files changed, 88 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/technical/packfile-uri.txt
> 
> diff --git a/Documentation/technical/packfile-uri.txt 
> b/Documentation/technical/packfile-uri.txt
> new file mode 100644
> index 00..6535801486
> --- /dev/null
> +++ b/Documentation/technical/packfile-uri.txt
> @@ -0,0 +1,83 @@
> +Packfile URIs
> +=
> +
> +This feature allows servers to serve part of their packfile response as URIs.
> +This allows server designs that improve scalability in bandwidth and CPU 
> usage
> +(for example, by serving some data through a CDN), and (in the future) 
> provides
> +some measure of resumability to clients.
> +
> +This feature is available only in protocol version 2.
> +
> +Protocol
> +
> +
> +The server advertises `packfile-uris`.
> +
> +If the client replies with the following arguments:
> +
> + * packfile-uris
> + * thin-pack
> + * ofs-delta
> +
> +when the server sends the packfile, it MAY send a `packfile-uris` section
> +directly before the `packfile` section (right after `wanted-refs` if it is
> +sent) containing HTTP(S) URIs. See protocol-v2.txt for the documentation of
> +this section.
> +
> +Clients then should understand that the returned packfile could be 
> incomplete,
> +and that it needs to download all the given URIs before the fetch or clone is
> +complete. Each URI should point to a Git packfile (which may be a thin pack 
> and
> +which may contain offset deltas).


Some thoughts here:

First, I'd like to see a section (and a bit in the implementation)
requiring HTTPS if the original protocol is secure (SSH or HTTPS).
Allowing the server to downgrade to HTTP, even by accident, would be a
security problem.

Second, this feature likely should be opt-in for SSH. One issue I've
seen repeatedly is that people don't want to use HTTPS to fetch things
when they're using SSH for Git. Many people in corporate environments
have proxies that break HTTP for non-browser use cases[0], and using SSH
is the only way that they can make a functional Git connection.

Third, I think the server needs to be required to both support Range
headers and never change the content of a URI, so that we can have
resumable clone implicit in this design. There are some places in the
world where connections are poor and fetching even the initial packfile
at once might be a problem. (I've seen such questions on Stack
Overflow, for example.)

Having said that, I think overall this is a good idea and I'm glad to
see a proposal for it.

[0] For example, a naughty-word filter may corrupt or block certain byte
sequences that occur incidentally in the pack stream.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH] tests: avoid syntax triggering old dash bug

2018-11-27 Thread brian m. carlson
On Tue, Nov 27, 2018 at 05:42:53PM +0100, Ævar Arnfjörð Bjarmason wrote:
> Avoid a bug in dash that's been fixed ever since its
> ec2c84d ("[PARSER] Fix clobbering of checkkwd", 2011-03-15)[1] first
> released with dash v0.5.7 in July 2011.
> 
> This fixes 1/2 tests failing on Debian Lenny & Squeeze. The other
> failure is due to 1b42f45255 ("git-svn: apply "svn.pathnameencoding"
> before URL encoding", 2016-02-09).

Are people still using such versions of Debian?  I only see wheezy (7)
on the mirrors, not squeeze (6) or lenny (5).  It might be better for us
to encourage users to upgrade to an OS that has security support rather
than work around bugs in obsolete OSes.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [RFC PATCH] Introduce "precious" file concept

2018-11-27 Thread brian m. carlson
On Tue, Nov 27, 2018 at 02:50:34PM +, Per Lundberg wrote:
> I agree strongly with this personally; if we must choose between "might
> break automation" and "might delete non-garbage files", I would say the
> former is the lesser evil of the two.
> 
> But, if I had 10 000 000 servers set up using automated scripts that
> would break because of this, I might think differently. Quite likely so,
> in fact.
> 
> What are these automation scenarios _more specifically_? Junio or Brian,
> would you care to elaborate? Is it for build servers where you want "git
> clean -dfx" to always reset the working copy to a pristine state or are
> we talking about some other scenarios?

We had long-running CI servers, since bootstrapping a new system took an
hour.  These would check out the branch to test and run some process
(essentially, a "make" and "make test").  Then, another branch would be
tested, and so on.  The old branch would likely not be merged at this
point.

The scenario I'm thinking of is when a file (say a CSS file) became
built instead of stored in the repository.  Then the file would be added
to .gitignore in the new commit, and it would be generated as part of
the make step.  It would be important to blow away that file away when
checking out a new commit, because not doing so would mean that the CI
system would simply fail to work and require manual intervention.

Moreover, a CI job might fail, due to either a flaky test or a
legitimate failures, so the job might need to be re-run multiple times.
Requiring human intervention, especially when such jobs might be running
at odd hours, would be undesirable.

Another thing we did was to use a specially named gitignore file in our
build step.  We created a new repository, copied the special gitignore
file in as .gitignore, copied in the source and build products, ran git
add and git commit, and then ran git clean -dfx to remove proprietary
source code, packaging the result.  A change to the behavior of git
clean -dfx would be devastating here.

I point this out to underscore how fundamental this change is.  People
overwhelmingly do not read the release notes, so expecting people to
realize that a change has been made, especially when many people only
upgrade Git because of a security issue, may result in unexpected
consequences.  Just because we don't think of this use of Git as normal
or desirable doesn't mean people don't do it and don't expect it to
keep working.  People do read and rely on our documentation.

I think any change we make here has to be opt-in, at least until Git
3.0.  A config knob would probably be the right way to go.  I realize
that may not provide all the benefits we want out of the box, but it
lets people turn the option on once and forget about it.  It also lets
people who don't desire this new behavior explicitly turn it off.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: BUG: CR marker ^M doesn't show up in '-' lines of diffs when the ending of the removed line is CR+LF

2018-11-25 Thread brian m. carlson
On Sun, Nov 25, 2018 at 03:03:11PM +0100, Frank Schäfer wrote:
> Am 24.11.18 um 23:07 schrieb Johannes Sixt:
> > I don't think that there is anything to fix. If you have a file with
> > CRLF in it, but you did not declare to Git that CRLF is the expected
> > end-of-line indicator, then the CR *is* trailing whitespace (because
> > the line ends at LF), and 'git diff' highlights it. 
> 
> Sure, it's correct to highlight it.
> But it doesn't highlight it in removed lines, just in added lines.
> I can see no good reason why removed and added lines should be treated
> differently.

The default behavior is to highlight whitespace errors only in new
lines, because the assumption is that while you don't want to introduce
new errors, you can't do anything about old mistakes without rewriting
history.

I agree that in many circumstances, such as code review, this may be
undesirable.  In the past, I've done code reviews where I may let
existing trailing whitespace go but am strict about not introducing
new trailing whitespace, and being able to see both is helpful.

If you want to see whitespace errors in both the old and the new, use
--ws-error-highlight or set diff.wsErrorHighlight.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: insteadOf and git-request-pull output

2018-11-16 Thread brian m. carlson
On Thu, Nov 15, 2018 at 01:28:26PM -0500, Konstantin Ryabitsev wrote:
> Hi, all:
> 
> Looks like setting url.insteadOf rules alters the output of
> git-request-pull. I'm not sure that's the intended use of insteadOf,
> which is supposed to replace URLs for local use, not to expose them
> publicly (but I may be wrong). E.g.:
> 
> $ git request-pull HEAD^ git://foo.example.com/example | grep example
>   git://foo.example.com/example
> 
> $ git config url.ssh://bar.insteadOf git://foo
> 
> $ git request-pull HEAD^ git://foo.example.com/example | grep example
>   ssh://bar.example.com/example
> 
> I think that if we use the "principle of least surprise," insteadOf
> rules shouldn't be applied for git-request-pull URLs.

I'd like to point out a different use that may change your view.  I have
an insteadOf alias, gh:, that points to GitHub.  Performing the rewrite
is definitely the right thing to do, since other users may not have my
alias available.

I agree that in your case, a rewrite seems less appropriate, but I think
we should only skip the rewrite if the value already matches a valid
URL.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


[PATCH v6 11/12] sha256: add an SHA-256 implementation using libgcrypt

2018-11-13 Thread brian m. carlson
Generally, one gets better performance out of cryptographic routines
written in assembly than C, and this is also true for SHA-256.  In
addition, most Linux distributions cannot distribute Git linked against
OpenSSL for licensing reasons.

Most systems with GnuPG will also have libgcrypt, since it is a
dependency of GnuPG.  libgcrypt is also faster than the SHA1DC
implementation for messages of a few KiB and larger.

For comparison, on a Core i7-6600U, this implementation processes 16 KiB
chunks at 355 MiB/s while SHA1DC processes equivalent chunks at 337
MiB/s.

In addition, libgcrypt is licensed under the LGPL 2.1, which is
compatible with the GPL.  Add an implementation of SHA-256 that uses
libgcrypt.

Signed-off-by: brian m. carlson 
---
 Makefile| 13 +++--
 hash.h  |  4 
 sha256/gcrypt.h | 30 ++
 3 files changed, 45 insertions(+), 2 deletions(-)
 create mode 100644 sha256/gcrypt.h

diff --git a/Makefile b/Makefile
index 1302c07f5c..6c99844aa8 100644
--- a/Makefile
+++ b/Makefile
@@ -179,6 +179,10 @@ all::
 # in one call to the platform's SHA1_Update(). e.g. APPLE_COMMON_CRYPTO
 # wants 'SHA1_MAX_BLOCK_SIZE=1024L*1024L*1024L' defined.
 #
+# Define BLK_SHA256 to use the built-in SHA-256 routines.
+#
+# Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1647,8 +1651,13 @@ endif
 endif
 endif
 
-LIB_OBJS += sha256/block/sha256.o
-BASIC_CFLAGS += -DSHA256_BLK
+ifdef GCRYPT_SHA256
+   BASIC_CFLAGS += -DSHA256_GCRYPT
+   EXTLIBS += -lgcrypt
+else
+   LIB_OBJS += sha256/block/sha256.o
+   BASIC_CFLAGS += -DSHA256_BLK
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index a9bc624020..2ef098052d 100644
--- a/hash.h
+++ b/hash.h
@@ -15,7 +15,11 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#if defined(SHA256_GCRYPT)
+#include "sha256/gcrypt.h"
+#else
 #include "sha256/block/sha256.h"
+#endif
 
 #ifndef platform_SHA_CTX
 /*
diff --git a/sha256/gcrypt.h b/sha256/gcrypt.h
new file mode 100644
index 00..09bd8bb200
--- /dev/null
+++ b/sha256/gcrypt.h
@@ -0,0 +1,30 @@
+#ifndef SHA256_GCRYPT_H
+#define SHA256_GCRYPT_H
+
+#include 
+
+#define SHA256_DIGEST_SIZE 32
+
+typedef gcry_md_hd_t gcrypt_SHA256_CTX;
+
+inline void gcrypt_SHA256_Init(gcrypt_SHA256_CTX *ctx)
+{
+   gcry_md_open(ctx, GCRY_MD_SHA256, 0);
+}
+
+inline void gcrypt_SHA256_Update(gcrypt_SHA256_CTX *ctx, const void *data, 
size_t len)
+{
+   gcry_md_write(*ctx, data, len);
+}
+
+inline void gcrypt_SHA256_Final(unsigned char *digest, gcrypt_SHA256_CTX *ctx)
+{
+   memcpy(digest, gcry_md_read(*ctx, GCRY_MD_SHA256), SHA256_DIGEST_SIZE);
+}
+
+#define platform_SHA256_CTX gcrypt_SHA256_CTX
+#define platform_SHA256_Init gcrypt_SHA256_Init
+#define platform_SHA256_Update gcrypt_SHA256_Update
+#define platform_SHA256_Final gcrypt_SHA256_Final
+
+#endif


[PATCH v6 09/12] commit-graph: convert to using the_hash_algo

2018-11-13 Thread brian m. carlson
Instead of using hard-coded constants for object sizes, use
the_hash_algo to look them up.  In addition, use a function call to look
up the object ID version and produce the correct value.  For now, we use
version 1, which means to use the default algorithm used in the rest of
the repository.

Signed-off-by: brian m. carlson 
---
 commit-graph.c | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..6763d19288 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -23,16 +23,11 @@
 #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
 #define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
 
-#define GRAPH_DATA_WIDTH 36
+#define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
 
 #define GRAPH_VERSION_1 0x1
 #define GRAPH_VERSION GRAPH_VERSION_1
 
-#define GRAPH_OID_VERSION_SHA1 1
-#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ
-#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
-#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
-
 #define GRAPH_OCTOPUS_EDGES_NEEDED 0x8000
 #define GRAPH_PARENT_MISSING 0x7fff
 #define GRAPH_EDGE_LAST_MASK 0x7fff
@@ -44,13 +39,18 @@
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
-   + GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
+   + GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
 
 char *get_commit_graph_filename(const char *obj_dir)
 {
return xstrfmt("%s/info/commit-graph", obj_dir);
 }
 
+static uint8_t oid_version(void)
+{
+   return 1;
+}
+
 static struct commit_graph *alloc_commit_graph(void)
 {
struct commit_graph *g = xcalloc(1, sizeof(*g));
@@ -125,15 +125,15 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
}
 
hash_version = *(unsigned char*)(data + 5);
-   if (hash_version != GRAPH_OID_VERSION) {
+   if (hash_version != oid_version()) {
error(_("hash version %X does not match version %X"),
- hash_version, GRAPH_OID_VERSION);
+ hash_version, oid_version());
goto cleanup_fail;
}
 
graph = alloc_commit_graph();
 
-   graph->hash_len = GRAPH_OID_LEN;
+   graph->hash_len = the_hash_algo->rawsz;
graph->num_chunks = *(unsigned char*)(data + 6);
graph->graph_fd = fd;
graph->data = graph_map;
@@ -149,7 +149,7 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
 
chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 
-   if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
+   if (chunk_offset > graph_size - the_hash_algo->rawsz) {
error(_("improper chunk offset %08x%08x"), 
(uint32_t)(chunk_offset >> 32),
  (uint32_t)chunk_offset);
goto cleanup_fail;
@@ -764,6 +764,7 @@ void write_commit_graph(const char *obj_dir,
int num_extra_edges;
struct commit_list *parent;
struct progress *progress = NULL;
+   const unsigned hashsz = the_hash_algo->rawsz;
 
if (!commit_graph_compatible(the_repository))
return;
@@ -909,7 +910,7 @@ void write_commit_graph(const char *obj_dir,
hashwrite_be32(f, GRAPH_SIGNATURE);
 
hashwrite_u8(f, GRAPH_VERSION);
-   hashwrite_u8(f, GRAPH_OID_VERSION);
+   hashwrite_u8(f, oid_version());
hashwrite_u8(f, num_chunks);
hashwrite_u8(f, 0); /* unused padding byte */
 
@@ -924,8 +925,8 @@ void write_commit_graph(const char *obj_dir,
 
chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
-   chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
-   chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
+   chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
+   chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
 
for (i = 0; i <= num_chunks; i++) {
@@ -938,8 +939,8 @@ void write_commit_graph(const char *obj_dir,
}
 
write_graph_chunk_fanout(f, commits.list, commits.nr);
-   write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
-   write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
+   write_graph_chunk_oids(f, hashsz, commits.list, commits.nr);
+   write_graph_chunk_data(f, hashsz, commits.list, commits.nr);
write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
close_commit_graph(the_repository);


[PATCH v6 02/12] sha1-file: provide functions to look up hash algorithms

2018-11-13 Thread brian m. carlson
There are several ways we might refer to a hash algorithm: by name, such
as in the config file; by format ID, such as in a pack; or internally,
by a pointer to the hash_algos array.  Provide functions to look up hash
algorithms based on these various forms and return the internal constant
used for them.  If conversion to another form is necessary, this
internal constant can be used to look up the proper data in the
hash_algos array.

Signed-off-by: brian m. carlson 
---
 hash.h  | 13 +
 sha1-file.c | 21 +
 2 files changed, 34 insertions(+)

diff --git a/hash.h b/hash.h
index 7c8238bc2e..80881eea47 100644
--- a/hash.h
+++ b/hash.h
@@ -98,4 +98,17 @@ struct git_hash_algo {
 };
 extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS];
 
+/*
+ * Return a GIT_HASH_* constant based on the name.  Returns GIT_HASH_UNKNOWN if
+ * the name doesn't match a known algorithm.
+ */
+int hash_algo_by_name(const char *name);
+/* Identical, except based on the format ID. */
+int hash_algo_by_id(uint32_t format_id);
+/* Identical, except for a pointer to struct git_hash_algo. */
+static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
+{
+   return p - hash_algos;
+}
+
 #endif
diff --git a/sha1-file.c b/sha1-file.c
index 3b9c042691..93ed8c8686 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -122,6 +122,27 @@ const char *empty_blob_oid_hex(void)
return oid_to_hex_r(buf, the_hash_algo->empty_blob);
 }
 
+int hash_algo_by_name(const char *name)
+{
+   int i;
+   if (!name)
+   return GIT_HASH_UNKNOWN;
+   for (i = 1; i < GIT_HASH_NALGOS; i++)
+   if (!strcmp(name, hash_algos[i].name))
+   return i;
+   return GIT_HASH_UNKNOWN;
+}
+
+int hash_algo_by_id(uint32_t format_id)
+{
+   int i;
+   for (i = 1; i < GIT_HASH_NALGOS; i++)
+   if (format_id == hash_algos[i].format_id)
+   return i;
+   return GIT_HASH_UNKNOWN;
+}
+
+
 /*
  * This is meant to hold a *small* number of objects that you would
  * want read_sha1_file() to be able to return, but yet you do not want


[PATCH v6 10/12] Add a base implementation of SHA-256 support

2018-11-13 Thread brian m. carlson
SHA-1 is weak and we need to transition to a new hash function.  For
some time, we have referred to this new function as NewHash.  Recently,
we decided to pick SHA-256 as NewHash.  The reasons behind the choice of
SHA-256 are outlined in the thread starting at [1] and in the commit
history for the hash function transition document.

Add a basic implementation of SHA-256 based off libtomcrypt, which is in
the public domain.  Optimize it and restructure it to meet our coding
standards.  Pull in the update and final functions from the SHA-1 block
implementation, as we know these function correctly with all compilers.
This implementation is slower than SHA-1, but more performant
implementations will be introduced in future commits.

Wire up SHA-256 in the list of hash algorithms, and add a test that the
algorithm works correctly.

Note that with this patch, it is still not possible to switch to using
SHA-256 in Git.  Additional patches are needed to prepare the code to
handle a larger hash algorithm and further test fixes are needed.

[1] 
https://public-inbox.org/git/20180609224913.gc38...@genre.crustytoothpaste.net/

Signed-off-by: brian m. carlson 
---
 Makefile   |   4 +
 cache.h|  12 ++-
 hash.h |  19 +++-
 sha1-file.c|  45 ++
 sha256/block/sha256.c  | 196 +
 sha256/block/sha256.h  |  24 +
 t/helper/test-sha256.c |   7 ++
 t/helper/test-tool.c   |   1 +
 t/helper/test-tool.h   |   1 +
 t/t0015-hash.sh|  26 ++
 10 files changed, 331 insertions(+), 4 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 t/helper/test-sha256.c

diff --git a/Makefile b/Makefile
index c6f06bf50b..1302c07f5c 100644
--- a/Makefile
+++ b/Makefile
@@ -749,6 +749,7 @@ TEST_BUILTINS_OBJS += test-run-command.o
 TEST_BUILTINS_OBJS += test-scrap-cache-tree.o
 TEST_BUILTINS_OBJS += test-sha1.o
 TEST_BUILTINS_OBJS += test-sha1-array.o
+TEST_BUILTINS_OBJS += test-sha256.o
 TEST_BUILTINS_OBJS += test-sigchain.o
 TEST_BUILTINS_OBJS += test-strcmp-offset.o
 TEST_BUILTINS_OBJS += test-string-list.o
@@ -1646,6 +1647,9 @@ endif
 endif
 endif
 
+LIB_OBJS += sha256/block/sha256.o
+BASIC_CFLAGS += -DSHA256_BLK
+
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
BASIC_CFLAGS += -DSHA1_MAX_BLOCK_SIZE="$(SHA1_MAX_BLOCK_SIZE)"
diff --git a/cache.h b/cache.h
index 62b2f3a5e8..b0dfe42736 100644
--- a/cache.h
+++ b/cache.h
@@ -48,11 +48,17 @@ unsigned long git_deflate_bound(git_zstream *, unsigned 
long);
 /* The block size of SHA-1. */
 #define GIT_SHA1_BLKSZ 64
 
+/* The length in bytes and in hex digits of an object name (SHA-256 value). */
+#define GIT_SHA256_RAWSZ 32
+#define GIT_SHA256_HEXSZ (2 * GIT_SHA256_RAWSZ)
+/* The block size of SHA-256. */
+#define GIT_SHA256_BLKSZ 64
+
 /* The length in byte and in hex digits of the largest possible hash value. */
-#define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
-#define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+#define GIT_MAX_RAWSZ GIT_SHA256_RAWSZ
+#define GIT_MAX_HEXSZ GIT_SHA256_HEXSZ
 /* The largest possible block size for any supported hash. */
-#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
+#define GIT_MAX_BLKSZ GIT_SHA256_BLKSZ
 
 struct object_id {
unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 1bcf7ab6fd..a9bc624020 100644
--- a/hash.h
+++ b/hash.h
@@ -15,6 +15,8 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#include "sha256/block/sha256.h"
+
 #ifndef platform_SHA_CTX
 /*
  * platform's underlying implementation of SHA-1; could be OpenSSL,
@@ -34,6 +36,18 @@
 #define git_SHA1_Updateplatform_SHA1_Update
 #define git_SHA1_Final platform_SHA1_Final
 
+#ifndef platform_SHA256_CTX
+#define platform_SHA256_CTXSHA256_CTX
+#define platform_SHA256_Init   SHA256_Init
+#define platform_SHA256_Update SHA256_Update
+#define platform_SHA256_Final  SHA256_Final
+#endif
+
+#define git_SHA256_CTX platform_SHA256_CTX
+#define git_SHA256_Initplatform_SHA256_Init
+#define git_SHA256_Update  platform_SHA256_Update
+#define git_SHA256_Final   platform_SHA256_Final
+
 #ifdef SHA1_MAX_BLOCK_SIZE
 #include "compat/sha1-chunked.h"
 #undef git_SHA1_Update
@@ -52,12 +66,15 @@
 #define GIT_HASH_UNKNOWN 0
 /* SHA-1 */
 #define GIT_HASH_SHA1 1
+/* SHA-256  */
+#define GIT_HASH_SHA256 2
 /* Number of algorithms supported (including unknown). */
-#define GIT_HASH_NALGOS (GIT_HASH_SHA1 + 1)
+#define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
 
 /* A suitably aligned type for stack allocations of hash contexts. */
 union git_hash_ctx {
git_SHA_CTX sha1;
+   git_SHA256_CTX sha256;
 };
 typedef union git_hash_ctx git_hash_ctx;
 
diff --git a/sha1-file.c b/sha1-file.c
index c47349a5f8..e7764822ea 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -40,10 +40,20 @@
 #define EMPTY_TREE_SHA1_BIN_LITERAL \
 &

[PATCH v6 12/12] hash: add an SHA-256 implementation using OpenSSL

2018-11-13 Thread brian m. carlson
We already have OpenSSL routines available for SHA-1, so add routines
for SHA-256 as well.

On a Core i7-6600U, this SHA-256 implementation compares favorably to
the SHA1DC SHA-1 implementation:

SHA-1: 157 MiB/s (64 byte chunks); 337 MiB/s (16 KiB chunks)
SHA-256: 165 MiB/s (64 byte chunks); 408 MiB/s (16 KiB chunks)

Signed-off-by: brian m. carlson 
---
 Makefile | 7 +++
 hash.h   | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/Makefile b/Makefile
index 6c99844aa8..6844deb92d 100644
--- a/Makefile
+++ b/Makefile
@@ -183,6 +183,8 @@ all::
 #
 # Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
 #
+# Define OPENSSL_SHA256 to use the SHA-256 routines in OpenSSL.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1651,6 +1653,10 @@ endif
 endif
 endif
 
+ifdef OPENSSL_SHA256
+   EXTLIBS += $(LIB_4_CRYPTO)
+   BASIC_CFLAGS += -DSHA256_OPENSSL
+else
 ifdef GCRYPT_SHA256
BASIC_CFLAGS += -DSHA256_GCRYPT
EXTLIBS += -lgcrypt
@@ -1658,6 +1664,7 @@ else
LIB_OBJS += sha256/block/sha256.o
BASIC_CFLAGS += -DSHA256_BLK
 endif
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index 2ef098052d..adde708cf2 100644
--- a/hash.h
+++ b/hash.h
@@ -17,6 +17,8 @@
 
 #if defined(SHA256_GCRYPT)
 #include "sha256/gcrypt.h"
+#elif defined(SHA256_OPENSSL)
+#include 
 #else
 #include "sha256/block/sha256.h"
 #endif


[PATCH v6 08/12] t/helper: add a test helper to compute hash speed

2018-11-13 Thread brian m. carlson
Add a utility (which is less for the testsuite and more for developers)
that can compute hash speeds for whatever hash algorithms are
implemented.  This allows developers to test their personal systems to
determine the performance characteristics of various algorithms.

Signed-off-by: brian m. carlson 
---
 Makefile   |  1 +
 t/helper/test-hash-speed.c | 61 ++
 t/helper/test-tool.c   |  1 +
 t/helper/test-tool.h   |  1 +
 4 files changed, 64 insertions(+)
 create mode 100644 t/helper/test-hash-speed.c

diff --git a/Makefile b/Makefile
index daf0b198ec..c6f06bf50b 100644
--- a/Makefile
+++ b/Makefile
@@ -726,6 +726,7 @@ TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
 TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
+TEST_BUILTINS_OBJS += test-hash-speed.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
 TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o
diff --git a/t/helper/test-hash-speed.c b/t/helper/test-hash-speed.c
new file mode 100644
index 00..432233c7f0
--- /dev/null
+++ b/t/helper/test-hash-speed.c
@@ -0,0 +1,61 @@
+#include "test-tool.h"
+#include "cache.h"
+
+#define NUM_SECONDS 3
+
+static inline void compute_hash(const struct git_hash_algo *algo, git_hash_ctx 
*ctx, uint8_t *final, const void *p, size_t len)
+{
+   algo->init_fn(ctx);
+   algo->update_fn(ctx, p, len);
+   algo->final_fn(final, ctx);
+}
+
+int cmd__hash_speed(int ac, const char **av)
+{
+   git_hash_ctx ctx;
+   unsigned char hash[GIT_MAX_RAWSZ];
+   clock_t initial, start, end;
+   unsigned bufsizes[] = { 64, 256, 1024, 8192, 16384 };
+   int i;
+   void *p;
+   const struct git_hash_algo *algo = NULL;
+
+   if (ac == 2) {
+   for (i = 1; i < GIT_HASH_NALGOS; i++) {
+   if (!strcmp(av[1], hash_algos[i].name)) {
+   algo = _algos[i];
+   break;
+   }
+   }
+   }
+   if (!algo)
+   die("usage: test-tool hash-speed algo_name");
+
+   /* Use this as an offset to make overflow less likely. */
+   initial = clock();
+
+   printf("algo: %s\n", algo->name);
+
+   for (i = 0; i < ARRAY_SIZE(bufsizes); i++) {
+   unsigned long j, kb;
+   double kb_per_sec;
+   p = xcalloc(1, bufsizes[i]);
+   start = end = clock() - initial;
+   for (j = 0; ((end - start) / CLOCKS_PER_SEC) < NUM_SECONDS; 
j++) {
+   compute_hash(algo, , hash, p, bufsizes[i]);
+
+   /*
+* Only check elapsed time every 128 iterations to avoid
+* dominating the runtime with system calls.
+*/
+   if (!(j & 127))
+   end = clock() - initial;
+   }
+   kb = j * bufsizes[i];
+   kb_per_sec = kb / (1024 * ((double)end - start) / 
CLOCKS_PER_SEC);
+   printf("size %u: %lu iters; %lu KiB; %0.2f KiB/s\n", 
bufsizes[i], j, kb, kb_per_sec);
+   free(p);
+   }
+
+   exit(0);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index bfb195b1a8..c222d532a2 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -20,6 +20,7 @@ static struct test_cmd cmds[] = {
{ "example-decorate", cmd__example_decorate },
{ "genrandom", cmd__genrandom },
{ "hashmap", cmd__hashmap },
+   { "hash-speed", cmd__hash_speed },
{ "index-version", cmd__index_version },
{ "json-writer", cmd__json_writer },
{ "lazy-init-name-hash", cmd__lazy_init_name_hash },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index b5b7712a1d..70dafb7649 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -16,6 +16,7 @@ int cmd__dump_untracked_cache(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
 int cmd__hashmap(int argc, const char **argv);
+int cmd__hash_speed(int argc, const char **argv);
 int cmd__index_version(int argc, const char **argv);
 int cmd__json_writer(int argc, const char **argv);
 int cmd__lazy_init_name_hash(int argc, const char **argv);


[PATCH v6 03/12] hex: introduce functions to print arbitrary hashes

2018-11-13 Thread brian m. carlson
Currently, we have functions that turn an arbitrary SHA-1 value or an
object ID into hex format, either using a static buffer or with a
user-provided buffer.  Add variants of these functions that can handle
an arbitrary hash algorithm, specified by constant.  Update the
documentation as well.

While we're at it, remove the "extern" declaration from this family of
functions, since it's not needed and our style now recommends against
it.

We use the variant taking the algorithm structure pointer as the
internal variant, since taking an algorithm pointer is the easiest way
to handle all of the variants in use.

Note that we maintain these functions because there are hashes which
must change based on the hash algorithm in use but are not object IDs
(such as pack checksums).

Signed-off-by: brian m. carlson 
---
 cache.h | 15 +--
 hex.c   | 32 
 2 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/cache.h b/cache.h
index ca36b44ee0..cb7268deea 100644
--- a/cache.h
+++ b/cache.h
@@ -1365,9 +1365,9 @@ extern int get_oid_hex(const char *hex, struct object_id 
*sha1);
 extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
 
 /*
- * Convert a binary sha1 to its hex equivalent. The `_r` variant is reentrant,
+ * Convert a binary hash to its hex equivalent. The `_r` variant is reentrant,
  * and writes the NUL-terminated output to the buffer `out`, which must be at
- * least `GIT_SHA1_HEXSZ + 1` bytes, and returns a pointer to out for
+ * least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for
  * convenience.
  *
  * The non-`_r` variant returns a static buffer, but uses a ring of 4
@@ -1375,10 +1375,13 @@ extern int hex_to_bytes(unsigned char *binary, const 
char *hex, size_t len);
  *
  *   printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two));
  */
-extern char *sha1_to_hex_r(char *out, const unsigned char *sha1);
-extern char *oid_to_hex_r(char *out, const struct object_id *oid);
-extern char *sha1_to_hex(const unsigned char *sha1);   /* static buffer 
result! */
-extern char *oid_to_hex(const struct object_id *oid);  /* same static buffer 
as sha1_to_hex */
+char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const 
struct git_hash_algo *);
+char *sha1_to_hex_r(char *out, const unsigned char *sha1);
+char *oid_to_hex_r(char *out, const struct object_id *oid);
+char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo 
*);  /* static buffer result! */
+char *sha1_to_hex(const unsigned char *sha1);  
/* same static buffer */
+char *hash_to_hex(const unsigned char *hash);  
/* same static buffer */
+char *oid_to_hex(const struct object_id *oid); 
/* same static buffer */
 
 /*
  * Parse a 40-character hexadecimal object ID starting from hex, updating the
diff --git a/hex.c b/hex.c
index 10af1a29e8..7850a8879d 100644
--- a/hex.c
+++ b/hex.c
@@ -73,14 +73,15 @@ int parse_oid_hex(const char *hex, struct object_id *oid, 
const char **end)
return ret;
 }
 
-char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
+char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
+ const struct git_hash_algo *algop)
 {
static const char hex[] = "0123456789abcdef";
char *buf = buffer;
int i;
 
-   for (i = 0; i < the_hash_algo->rawsz; i++) {
-   unsigned int val = *sha1++;
+   for (i = 0; i < algop->rawsz; i++) {
+   unsigned int val = *hash++;
*buf++ = hex[val >> 4];
*buf++ = hex[val & 0xf];
}
@@ -89,20 +90,35 @@ char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
return buffer;
 }
 
-char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
 {
-   return sha1_to_hex_r(buffer, oid->hash);
+   return hash_to_hex_algop_r(buffer, sha1, _algos[GIT_HASH_SHA1]);
 }
 
-char *sha1_to_hex(const unsigned char *sha1)
+char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+{
+   return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo);
+}
+
+char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo 
*algop)
 {
static int bufno;
static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
-   return sha1_to_hex_r(hexbuffer[bufno], sha1);
+   return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop);
+}
+
+char *sha1_to_hex(const unsigned char *sha1)
+{
+   return hash_to_hex_algop(sha1, _algos[GIT_HASH_SHA1]);
+}
+
+char *hash_to_hex(const unsigned char *hash)
+{
+   return hash_to_hex_algop(hash, the_hash_algo);
 }
 
 char *oid_to_hex(const struct object_id *oid)
 {
-   return sha1_to_hex(oid->hash);
+   return hash_to_hex_algop(oid->hash, the_hash_algo);
 }


[PATCH v6 00/12] Base SHA-256 implementation

2018-11-13 Thread brian m. carlson
This series provides a functional SHA-256 implementation and wires it
up, along with some housekeeping patches to make it suitable for
testing.

Changes from v5:
* Remove inclusion of "git-compat-util.h" in header.
* Remove "inline" from definition of hash_to_hex_algop_r.
* Switch perl invocations in t0015 to use -e instead of -E.
* Switch perl invocations in t0015 to use autoflush and postfix for.

Changes from v4:
* Downcase hex constants for consistency.
* Remove needless parentheses in return statement.
* Remove braces for single statement loops.
* Switch to +=.
* Add references to rationale for SHA-256.

Changes from v3:
* Switch to using inline functions instead of macros in many cases.
* Undefine remaining macros at the top.

Changes from v2:
* Improve commit messages to include timing and performance information.
* Improve commit messages to be less ambiguous and more friendly to a
  wider variety of English speakers.
* Prefer functions taking struct git_hash_algo in hex.c.
* Port pieces of the block-sha1 implementation over to the block-sha256
  implementation for better compatibility.
* Drop patch 13 in favor of further discussion about the best way
  forward for versioning commit graph.
* Rename the test so as to have a different number from other tests.
* Rebase on master.

Changes from v1:
* Add a hash_to_hex function mirroring sha1_to_hex, but for
  the_hash_algo.
* Strip commit message explanation about why we chose SHA-256.
* Rebase on master
* Strip leading whitespace from commit message.
* Improve commit-graph patch to cover new code added since v1.
* Be more honest about the scope of work involved in porting the SHA-256
  implementation out of libtomcrypt.
* Revert change to limit hashcmp to 20 bytes.

brian m. carlson (12):
  sha1-file: rename algorithm to "sha1"
  sha1-file: provide functions to look up hash algorithms
  hex: introduce functions to print arbitrary hashes
  cache: make hashcmp and hasheq work with larger hashes
  t: add basic tests for our SHA-1 implementation
  t: make the sha1 test-tool helper generic
  sha1-file: add a constant for hash block size
  t/helper: add a test helper to compute hash speed
  commit-graph: convert to using the_hash_algo
  Add a base implementation of SHA-256 support
  sha256: add an SHA-256 implementation using libgcrypt
  hash: add an SHA-256 implementation using OpenSSL

 Makefile  |  22 +++
 cache.h   |  51 ---
 commit-graph.c|  33 ++---
 hash.h|  41 +-
 hex.c |  32 +++--
 sha1-file.c   |  70 -
 sha256/block/sha256.c | 196 ++
 sha256/block/sha256.h |  24 
 sha256/gcrypt.h   |  30 
 t/helper/test-hash-speed.c|  61 
 t/helper/{test-sha1.c => test-hash.c} |  19 +--
 t/helper/test-sha1.c  |  52 +--
 t/helper/test-sha256.c|   7 +
 t/helper/test-tool.c  |   2 +
 t/helper/test-tool.h  |   4 +
 t/t0015-hash.sh   |  55 
 16 files changed, 595 insertions(+), 104 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 sha256/gcrypt.h
 create mode 100644 t/helper/test-hash-speed.c
 copy t/helper/{test-sha1.c => test-hash.c} (65%)
 create mode 100644 t/helper/test-sha256.c
 create mode 100755 t/t0015-hash.sh

Range-diff against v5:
 1:  a004a4c982 <  -:  -- :hash-impl
 2:  cf9f7f5620 =  1:  fa21b5948c sha1-file: rename algorithm to "sha1"
 3:  0144deaebe =  2:  4e146c92af sha1-file: provide functions to look up hash 
algorithms
 4:  b74858fb03 !  3:  10e7893242 hex: introduce functions to print arbitrary 
hashes
@@ -64,8 +64,8 @@
  }
  
 -char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
-+inline char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
-+  const struct git_hash_algo *algop)
++char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
++const struct git_hash_algo *algop)
  {
static const char hex[] = "0123456789abcdef";
char *buf = buffer;
 5:  e9703017a4 =  4:  6396a0ff57 cache: make hashcmp and hasheq work with 
larger hashes
 6:  ab85a834fd !  5:  a51a8b5638 t: add basic tests for our SHA-1 
implementation
@@ -33,7 +33,7 @@
 +  grep c12252ceda8be8994d5fa0290a47231c1d16aae3 actual &&
 +  printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha1 >actual &&
 +  grep 32d10c7b8cf96570ca04ce37f2a19d84240d3a89 actual &&
-+  perl -E "for (1..10) { print q{aa}; }" | \
++  perl -e "$| = 1; print q{aa} for 1..1000

[PATCH v6 06/12] t: make the sha1 test-tool helper generic

2018-11-13 Thread brian m. carlson
Since we're going to have multiple hash algorithms to test, it makes
sense to share as much of the test code as possible.  Convert the sha1
helper for the test-tool to be generic and move it out into its own
module.  This will allow us to share most of this code with our NewHash
implementation.

Signed-off-by: brian m. carlson 
---
 Makefile  |  1 +
 t/helper/{test-sha1.c => test-hash.c} | 19 +-
 t/helper/test-sha1.c  | 52 +--
 t/helper/test-tool.h  |  2 ++
 4 files changed, 14 insertions(+), 60 deletions(-)
 copy t/helper/{test-sha1.c => test-hash.c} (65%)

diff --git a/Makefile b/Makefile
index 016fdcdb81..daf0b198ec 100644
--- a/Makefile
+++ b/Makefile
@@ -724,6 +724,7 @@ TEST_BUILTINS_OBJS += test-dump-split-index.o
 TEST_BUILTINS_OBJS += test-dump-untracked-cache.o
 TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
+TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
diff --git a/t/helper/test-sha1.c b/t/helper/test-hash.c
similarity index 65%
copy from t/helper/test-sha1.c
copy to t/helper/test-hash.c
index 1ba0675c75..0a31de66f3 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-hash.c
@@ -1,13 +1,14 @@
 #include "test-tool.h"
 #include "cache.h"
 
-int cmd__sha1(int ac, const char **av)
+int cmd_hash_impl(int ac, const char **av, int algo)
 {
-   git_SHA_CTX ctx;
-   unsigned char sha1[20];
+   git_hash_ctx ctx;
+   unsigned char hash[GIT_MAX_HEXSZ];
unsigned bufsz = 8192;
int binary = 0;
char *buffer;
+   const struct git_hash_algo *algop = _algos[algo];
 
if (ac == 2) {
if (!strcmp(av[1], "-b"))
@@ -26,7 +27,7 @@ int cmd__sha1(int ac, const char **av)
die("OOPS");
}
 
-   git_SHA1_Init();
+   algop->init_fn();
 
while (1) {
ssize_t sz, this_sz;
@@ -38,20 +39,20 @@ int cmd__sha1(int ac, const char **av)
if (sz == 0)
break;
if (sz < 0)
-   die_errno("test-sha1");
+   die_errno("test-hash");
this_sz += sz;
cp += sz;
room -= sz;
}
if (this_sz == 0)
break;
-   git_SHA1_Update(, buffer, this_sz);
+   algop->update_fn(, buffer, this_sz);
}
-   git_SHA1_Final(sha1, );
+   algop->final_fn(hash, );
 
if (binary)
-   fwrite(sha1, 1, 20, stdout);
+   fwrite(hash, 1, algop->rawsz, stdout);
else
-   puts(sha1_to_hex(sha1));
+   puts(hash_to_hex_algop(hash, algop));
exit(0);
 }
diff --git a/t/helper/test-sha1.c b/t/helper/test-sha1.c
index 1ba0675c75..d860c387c3 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-sha1.c
@@ -3,55 +3,5 @@
 
 int cmd__sha1(int ac, const char **av)
 {
-   git_SHA_CTX ctx;
-   unsigned char sha1[20];
-   unsigned bufsz = 8192;
-   int binary = 0;
-   char *buffer;
-
-   if (ac == 2) {
-   if (!strcmp(av[1], "-b"))
-   binary = 1;
-   else
-   bufsz = strtoul(av[1], NULL, 10) * 1024 * 1024;
-   }
-
-   if (!bufsz)
-   bufsz = 8192;
-
-   while ((buffer = malloc(bufsz)) == NULL) {
-   fprintf(stderr, "bufsz %u is too big, halving...\n", bufsz);
-   bufsz /= 2;
-   if (bufsz < 1024)
-   die("OOPS");
-   }
-
-   git_SHA1_Init();
-
-   while (1) {
-   ssize_t sz, this_sz;
-   char *cp = buffer;
-   unsigned room = bufsz;
-   this_sz = 0;
-   while (room) {
-   sz = xread(0, cp, room);
-   if (sz == 0)
-   break;
-   if (sz < 0)
-   die_errno("test-sha1");
-   this_sz += sz;
-   cp += sz;
-   room -= sz;
-   }
-   if (this_sz == 0)
-   break;
-   git_SHA1_Update(, buffer, this_sz);
-   }
-   git_SHA1_Final(sha1, );
-
-   if (binary)
-   fwrite(sha1, 1, 20, stdout);
-   else
-   puts(sha1_to_hex(sha1));
-   exit(0);
+   return cmd_hash_impl(ac, av, GIT_HASH_SHA1);
 }
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 042f12464b..b5b7712a1d 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -51

[PATCH v6 05/12] t: add basic tests for our SHA-1 implementation

2018-11-13 Thread brian m. carlson
We have in the past had some unfortunate endianness issues with some
SHA-1 implementations we ship, especially on big-endian machines.  Add
an explicit test using the test helper to catch these issues and point
them out prominently.  This test can also be used as a staging ground
for people testing additional algorithms to verify that their
implementations are working as expected.

Signed-off-by: brian m. carlson 
---
 t/t0015-hash.sh | 29 +
 1 file changed, 29 insertions(+)
 create mode 100755 t/t0015-hash.sh

diff --git a/t/t0015-hash.sh b/t/t0015-hash.sh
new file mode 100755
index 00..884fe5acd1
--- /dev/null
+++ b/t/t0015-hash.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+test_description='test basic hash implementation'
+. ./test-lib.sh
+
+
+test_expect_success 'test basic SHA-1 hash values' '
+   test-tool sha1 actual &&
+   grep da39a3ee5e6b4b0d3255bfef95601890afd80709 actual &&
+   printf "a" | test-tool sha1 >actual &&
+   grep 86f7e437faa5a7fce15d1ddcb9eaeaea377667b8 actual &&
+   printf "abc" | test-tool sha1 >actual &&
+   grep a9993e364706816aba3e25717850c26c9cd0d89d actual &&
+   printf "message digest" | test-tool sha1 >actual &&
+   grep c12252ceda8be8994d5fa0290a47231c1d16aae3 actual &&
+   printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha1 >actual &&
+   grep 32d10c7b8cf96570ca04ce37f2a19d84240d3a89 actual &&
+   perl -e "$| = 1; print q{aa} for 1..10;" | \
+   test-tool sha1 >actual &&
+   grep 34aa973cd4c4daa4f61eeb2bdbad27316534016f actual &&
+   printf "blob 0\0" | test-tool sha1 >actual &&
+   grep e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 actual &&
+   printf "blob 3\0abc" | test-tool sha1 >actual &&
+   grep f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f actual &&
+   printf "tree 0\0" | test-tool sha1 >actual &&
+   grep 4b825dc642cb6eb9a060e54bf8d69288fbee4904 actual
+'
+
+test_done


[PATCH v6 04/12] cache: make hashcmp and hasheq work with larger hashes

2018-11-13 Thread brian m. carlson
In 183a638b7d ("hashcmp: assert constant hash size", 2018-08-23), we
modified hashcmp to assert that the hash size was always 20 to help it
optimize and inline calls to memcmp.  In a future series, we replaced
many calls to hashcmp and oidcmp with calls to hasheq and oideq to
improve inlining further.

However, we want to support hash algorithms other than SHA-1, namely
SHA-256.  When doing so, we must handle the case where these values are
32 bytes long as well as 20.  Adjust hashcmp to handle two cases:
20-byte matches, and maximum-size matches.  Therefore, when we include
SHA-256, we'll automatically handle it properly, while at the same time
teaching the compiler that there are only two possible options to
consider.  This will allow the compiler to write the most efficient
possible code.

Copy similar code into hasheq and perform an identical transformation.
At least with GCC 8.2.0, making hasheq defer to hashcmp when there are
two branches prevents the compiler from inlining the comparison, while
the code in this patch is inlined properly.  Add a comment to avoid an
accidental performance regression from well-intentioned refactoring.

Signed-off-by: brian m. carlson 
---
 cache.h | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index cb7268deea..8607878dbc 100644
--- a/cache.h
+++ b/cache.h
@@ -1028,16 +1028,12 @@ extern const struct object_id null_oid;
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
/*
-* This is a temporary optimization hack. By asserting the size here,
-* we let the compiler know that it's always going to be 20, which lets
-* it turn this fixed-size memcmp into a few inline instructions.
-*
-* This will need to be extended or ripped out when we learn about
-* hashes of different sizes.
+* Teach the compiler that there are only two possibilities of hash size
+* here, so that it can optimize for this case as much as possible.
 */
-   if (the_hash_algo->rawsz != 20)
-   BUG("hash size not yet supported by hashcmp");
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
+   return memcmp(sha1, sha2, GIT_MAX_RAWSZ);
+   return memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)
@@ -1047,7 +1043,13 @@ static inline int oidcmp(const struct object_id *oid1, 
const struct object_id *o
 
 static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return !hashcmp(sha1, sha2);
+   /*
+* We write this here instead of deferring to hashcmp so that the
+* compiler can properly inline it and avoid calling memcmp.
+*/
+   if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
+   return !memcmp(sha1, sha2, GIT_MAX_RAWSZ);
+   return !memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oideq(const struct object_id *oid1, const struct object_id 
*oid2)


[PATCH v6 07/12] sha1-file: add a constant for hash block size

2018-11-13 Thread brian m. carlson
There is one place we need the hash algorithm block size: the HMAC code
for push certs.  Expose this constant in struct git_hash_algo and expose
values for SHA-1 and for the largest value of any hash.

Signed-off-by: brian m. carlson 
---
 cache.h | 4 
 hash.h  | 3 +++
 sha1-file.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/cache.h b/cache.h
index 8607878dbc..62b2f3a5e8 100644
--- a/cache.h
+++ b/cache.h
@@ -45,10 +45,14 @@ unsigned long git_deflate_bound(git_zstream *, unsigned 
long);
 /* The length in bytes and in hex digits of an object name (SHA-1 value). */
 #define GIT_SHA1_RAWSZ 20
 #define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
+/* The block size of SHA-1. */
+#define GIT_SHA1_BLKSZ 64
 
 /* The length in byte and in hex digits of the largest possible hash value. */
 #define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
 #define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+/* The largest possible block size for any supported hash. */
+#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
 
 struct object_id {
unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 80881eea47..1bcf7ab6fd 100644
--- a/hash.h
+++ b/hash.h
@@ -81,6 +81,9 @@ struct git_hash_algo {
/* The length of the hash in hex characters. */
size_t hexsz;
 
+   /* The block size of the hash. */
+   size_t blksz;
+
/* The hash initialization function. */
git_hash_init_fn init_fn;
 
diff --git a/sha1-file.c b/sha1-file.c
index 93ed8c8686..c47349a5f8 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -90,6 +90,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
0x,
0,
0,
+   0,
git_hash_unknown_init,
git_hash_unknown_update,
git_hash_unknown_final,
@@ -102,6 +103,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
0x73686131,
GIT_SHA1_RAWSZ,
GIT_SHA1_HEXSZ,
+   GIT_SHA1_BLKSZ,
git_hash_sha1_init,
git_hash_sha1_update,
git_hash_sha1_final,


[PATCH v6 01/12] sha1-file: rename algorithm to "sha1"

2018-11-13 Thread brian m. carlson
The transition plan anticipates us using a syntax such as "^{sha1}" for
disambiguation.  Since this is a syntax some people will be typing a
lot, it makes sense to provide a short, easy-to-type syntax.  Omitting
the dash doesn't create any ambiguity; however, it does make the syntax
shorter and easier to type, especially for touch typists.  In addition,
the transition plan already uses "sha1" in this context.

Rename the name of SHA-1 implementation to "sha1".

Note that this change creates no backwards compatibility concerns, since
we haven't yet used this field in any configuration settings.

Signed-off-by: brian m. carlson 
---
 sha1-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1-file.c b/sha1-file.c
index 2daf7d9935..3b9c042691 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -97,7 +97,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
NULL,
},
{
-   "sha-1",
+   "sha1",
/* "sha1", big-endian */
0x73686131,
GIT_SHA1_RAWSZ,


Re: [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms

2018-11-13 Thread brian m. carlson
On Wed, Nov 14, 2018 at 12:11:07AM +, Ramsay Jones wrote:
> 
> 
> On 13/11/2018 18:42, Derrick Stolee wrote:
> > On 11/4/2018 6:44 PM, brian m. carlson wrote:
> >> +int hash_algo_by_name(const char *name)
> >> +{
> >> +    int i;
> >> +    if (!name)
> >> +    return GIT_HASH_UNKNOWN;
> >> +    for (i = 1; i < GIT_HASH_NALGOS; i++)
> >> +    if (!strcmp(name, hash_algos[i].name))
> >> +    return i;
> >> +    return GIT_HASH_UNKNOWN;
> >> +}
> >> +
> > 
> > Today's test coverage report [1] shows this method is not covered in the 
> > test suite. Looking at 'pu', it doesn't have any callers.
> > 
> > Do you have a work in progress series that will use this? Could we add a 
> > test-tool to exercise this somehow?
> 
> There are actually 4 unused external symbols resulting from Brian's
> 'bc/sha-256' branch. The new unused externals in 'pu' looks like:
> 
> $ diff nsc psc
> 37a38,39
> > hex.o   - hash_to_hex

I have code that uses this in my object-id-part15 series.  I also have
another series coming after this one that makes heavy use of it.

> > hex.o   - hash_to_hex_algop_r

I believe this is because it's inline, since it is indeed used just a
few lines below its definition.  I'll drop the inline, since it's meant
to be externally visible.

> > sha1-file.o - hash_algo_by_id

This will be used when I write pack index v3, which will be in my
object-id-part15 series.

> > sha1-file.o - hash_algo_by_name

This is used in my object-id-part15 series.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH] git-compat-util: prefer poll.h to sys/poll.h

2018-11-13 Thread brian m. carlson
On Wed, Nov 14, 2018 at 08:10:43AM +0700, Đoàn Trần Công Danh wrote:
> POSIX specifies that  is the correct header for poll(2)
> whereas  is only needed for some old libc.
> 
> Let's follow the POSIX way by default.
> 
> This effectively eliminates musl's warning:
> 
> warning redirecting incorrect #include  to 
> 
> Signed-off-by: Đoàn Trần Công Danh 

I think this patch is fine.  This was in SUSv2, and I don't feel bad
about siding with a spec that's at least 17 years old.

> t0028, t1308.23, t3900.34 is failing under musl,
> Those test cases in question also fails without this patch.
> 
> - t0028 is failing because musl `iconv` output UTF-16 without BOM.
> I'm not sure if my installation is broken, or it's musl's default behavior.
> But, I think RFC2781, section 4.3 allows the missing BOM

While the spec may allow this, we cannot for practical reasons.  There
are a large number of broken Windows programs that don't honor the spec
when it says to interpret UTF-16 byte sequences without a BOM as
big-endian, and instead use little-endian.  Since we cannot fix all the
broken Windows programs people use, we need to emit the BOM in UTF-16
mode.

> - t1308.23 is failing because musl `fopen` is success when open directory
> in readonly mode. POSIX allows this behavior:
> http://pubs.opengroup.org/onlinepubs/7908799/xsh/fopen.html
> [EISDIR]
> The named file is a directory and mode requires write access.

Does setting FREAD_READS_DIRECTORIES fix this?
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms

2018-11-13 Thread brian m. carlson
On Tue, Nov 13, 2018 at 07:45:41PM +0100, Duy Nguyen wrote:
> On Tue, Nov 13, 2018 at 7:42 PM Derrick Stolee  wrote:
> >
> > On 11/4/2018 6:44 PM, brian m. carlson wrote:
> > > +int hash_algo_by_name(const char *name)
> > > +{
> > > + int i;
> > > + if (!name)
> > > + return GIT_HASH_UNKNOWN;
> > > + for (i = 1; i < GIT_HASH_NALGOS; i++)
> > > + if (!strcmp(name, hash_algos[i].name))
> > > + return i;
> > > + return GIT_HASH_UNKNOWN;
> > > +}
> > > +
> >
> > Today's test coverage report [1] shows this method is not covered in the
> > test suite. Looking at 'pu', it doesn't have any callers.
> >
> > Do you have a work in progress series that will use this? Could we add a
> > test-tool to exercise this somehow?
> 
> Going by the function name, it should be used when hash-object or
> other commands learn about --object-format=.

Correct.  I have extensions.objectFormat code that will use this.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [RFC PATCH] Introduce "precious" file concept

2018-11-12 Thread brian m. carlson
On Sun, Nov 11, 2018 at 01:33:44PM +0100, Ævar Arnfjörð Bjarmason wrote:
> The users who need protection against git deleting their files the most
> are exactly the sort of users who aren't expert-level enough to
> understand the nuances of how the semantics of .gitignore and "precious"
> are going to interact before git eats their data.
> 
> This is pretty apparent from the bug reports we're getting about
> this. None of them are:
> 
> "Hey, I 100% understood .gitignore semantics including this one part
> of the docs where you say you'll do this, but just forgot one day
> and deleted my work. Can we get some more safety?"
> 
> But rather (with some hyperbole for effect):
> 
> "ZOMG git deleted my file! Is this a bug??"
> 
> So I think we should have the inverse of this "precious"
> attribute". Just a change to the docs to say that .gitignore doesn't
> imply these eager deletion semantics on tree unpacking anymore, and if
> users want it back they can define a "garbage" attribute
> (s/precious/garbage/).
> 
> That will lose no data, and in the very rare cases where a checkout of
> tracked files would overwrite an ignored pattern, we can just error out
> (as we do with the "Ok to overwrite" branch removed) and tell the user
> to delete the files to proceed.

This is going to totally hose automation.  My last job had files which
might move from tracked to untracked (a file that had become generated),
and long-running CI and build systems would need to be able to check out
one status and switch to the other.  Your proposed change will prevent
those systems from working, whereas they previously did.

I agree that your proposal would have been a better design originally,
but breaking the way automated systems currently work is probably going
to be a dealbreaker.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 04/10] fast-export: avoid dying when filtering by paths and old tags exist

2018-11-12 Thread brian m. carlson
On Sun, Nov 11, 2018 at 01:44:43AM -0500, Jeff King wrote:
> > +   git fast-export --tag-of-filtered-object=rewrite --all -- bar 
> > >output &&
> > +   grep -A 1 refs/tags/v0.0.0.0.0.0.1 output | grep -E ^from.0{40}
> 
> I don't think "grep -A" is portable (and we don't seem to otherwise use
> it). You can probably do something similar with sed.
> 
> Use $ZERO_OID instead of hard-coding 40, which future-proofs for the
> hash transition (though I suppose the hash is not likely to get
> _shorter_ ;) ).

It would indeed be nice if we used $ZERO_OID.  Also, we prefer to write
"egrep", since some less capable systems don't have a grep with -E.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 0/1] http: add support selecting http version

2018-11-07 Thread brian m. carlson
On Wed, Nov 07, 2018 at 02:44:51PM +0100, Daniel Stenberg wrote:
> On Wed, 7 Nov 2018, Force.Charlie-I via GitGitGadget wrote:
> 
> > Normally, git doesn't need to set curl to select the HTTP version, it
> > works fine without HTTP2. Adding HTTP2 support is a icing on the cake.
> 
> Just a FYI:
> 
> Starting with libcurl 7.62.0 (released a week ago), it now defaults to the
> "2TLS" setting unless you tell it otherwise. With 2TLS, libcurl will attempt
> to use HTTP/2 for HTTPS URLs.

With this information, I think I would rather we rely on libcurl to do
this rather than putting it in Git.  Users will automatically get the
best supported protocol instead of having to configure it manually.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v5 10/12] Add a base implementation of SHA-256 support

2018-11-06 Thread brian m. carlson
On Mon, Nov 05, 2018 at 12:39:14PM +0100, Ævar Arnfjörð Bjarmason wrote:
> On Sun, Nov 04 2018, brian m. carlson wrote:
> > +   {
> > +   "sha256",
> > +   /* "s256", big-endian */
> 
> The existing entry/comment for sha1 is:
> 
>   "sha1",
>   /* "sha1", big-endian */
> 
> So why the sha256/s256 difference in the code/comment? Wondering if I'm
> missing something and we're using "s256" for something.

Ah, good question.  The comment refers to the format_id field which
follows this comment.  The value is the big-endian representation of
"s256" as a 32-bit value.  I picked that over "sha2" to avoid confusion
in case we add SHA-512 in the future, since that's also an SHA-2
algorithm.

Config files and command-line interfaces will use "sha1" or "sha256",
and binary formats will use those 32-bit values ("sha1" or "s256").

> >  const char *empty_tree_oid_hex(void)
> > diff --git a/sha256/block/sha256.c b/sha256/block/sha256.c
> > [...]
> 
> I had a question before about whether we see ourselves perma-forking
> this implementation based off libtomcrypt, as I recall you said yes.

Yes.

> Still, I think it would be better to introduce this in at least two-four
> commits where the upstream code is added as-is, then trimmed down to
> size, then adapted to our coding style, and finally we add our own
> utility functions.

At this point, the only code that's actually used from libtomcrypt is
the block transform.  The upstream code is split over multiple files in
multiple directories and won't compile in our codebase without many
files and a lot of work, so I don't feel good about either including
code that doesn't compile or including large numbers of files that don't
meet our coding standards (and that may still not compile because of
platform issues).

> It'll make it easier to forward-port any future upstream changes.

I don't foresee many, if any, changes to this code.  It either
implements the specification or it doesn't, and it's objectively easy to
determine which.  There's not even an argument to port performance
improvements, since almost everyone will be using a crypto library to
provide this code because libraries perform so dramatically better.
I've tried to not make the code perform worse than it did originally,
but that's it.

Furthermore, the modified code carries a relatively small amount of
resemblance to the original, so if we did port changes forward, we'd
probably have conflicts.

It seems like you really want to include the upstream code as a separate
commit and I understand where you're coming from with wanting to have
this split out into logical commits, but due to the specific nature of
this code, I see a lot of downsides and not a lot of upsides.

> > +   perl -E "for (1..10) { print q{aa}; }" | \
> > +   test-tool sha256 >actual &&
> > +   grep cdc76e5c9914fb9281a1c7e284d73e67f1809a48a497200e046d39ccc7112cd0 
> > actual &&
> > +   perl -E "for (1..10) { print q{abcdefghijklmnopqrstuvwxyz}; }" | \
> > +   test-tool sha256 >actual &&
> 
> I've been wanting to make use depend on perl >= 5.10 (previous noises
> about that on-list), but for now we claim to support >=5.8, which
> doesn't have the -E switch.

Good point.  I'll fix that.  After having written a lot of one-liners,
I always write -E, and this was originally a one-liner.

> But most importantly you aren't even using -E features here, and this
> isn't very idoimatic Perl. Instead do, respectively:
> 
> perl -e 'print q{aa} x 10'
> perl -e "print q{abcdefghijklmnopqrstuvwxyz} x 10"

I considered the more idiomatic version originally, but the latter could
allocate a decent amount of memory in one chunk, and I wanted to avoid
that.  I think what I'd like to do, actually, is turn on autoflush and
use a postfix for, which would be more idiomatic and could potentially
provide better testing of the chunking code.  I'll add a comment to that
effect.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Design of multiple hash support

2018-11-05 Thread brian m. carlson
On Mon, Nov 05, 2018 at 02:00:42PM -0800, Jonathan Nieder wrote:
> Hi,
> 
> Duy Nguyen wrote:
> > On Mon, Nov 5, 2018 at 2:02 AM brian m. carlson
> >  wrote:
> 
> >> There are basically two approaches I can take.  The first is to provide
> >> each command that needs to learn about this with its own --hash
> >> argument.  So we'd have:
> >>
> >>   git init --hash=sha256
> >>   git show-index --hash=sha256  >>
> >> The other alternative is that we provide a global option to git, which
> >> is parsed by all programs, like so:
> >>
> >>   git --hash=sha256 init
> >>   git --hash=sha256 show-index  [...]
> > I'm leaning towards "git foo --hash=".
> 
> Can you say a little more about the semantics of the option?  For
> commands like "git init", I tend to agree with Duy here, since it
> allows each command's manual to describe what the option means in the
> context of that command.

Sure.  The semantics for git init are "produce a repository with this
hash algorithm".  The semantics for git index-pack are "the pack I want
you to index uses this hash algorithm".  Essentially, more generically,
the semantics are "the repository or data object uses this hash
algorithm".

> For "git show-index", ideally Git should use the object format named
> in the idx file.

I agree that will be the eventual goal.  It will also be what I ship in
the final series, in all likelihood.  I have most of pack v3
implemented, but it's not complete yet.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Design of multiple hash support

2018-11-05 Thread brian m. carlson
On Mon, Nov 05, 2018 at 10:03:21AM -0800, Stefan Beller wrote:
> On Sun, Nov 4, 2018 at 6:36 PM Junio C Hamano  wrote:
> >
> > "brian m. carlson"  writes:
> >
> > > I'm currently working on getting Git to support multiple hash algorithms
> > > in the same binary (SHA-1 and SHA-256).  In order to have a fully
> > > functional binary, we'll need to have some way of indicating to certain
> > > commands (such as init and show-index) that they should assume a certain
> > > hash algorithm.
> > >
> > > There are basically two approaches I can take.  The first is to provide
> > > each command that needs to learn about this with its own --hash
> > > argument.  So we'd have:
> > >
> > >   git init --hash=sha256
> > >   git show-index --hash=sha256  > >
> > > The other alternative is that we provide a global option to git, which
> > > is parsed by all programs, like so:
> > >
> > >   git --hash=sha256 init
> > >   git --hash=sha256 show-index  >
> > I am assuming that "show-index" above is a typo for something like
> > "hash-object"?

> Actually both seem plausible, as both do not require
> RUN_SETUP, which means they cannot rely on the
> extensions.objectFormat setting.

Correct.  In general, I assume that options that want a repository will
use the repository for that information.  There are a small number of
programs, such as init, that need to either set up a repository (without
reference to another repository) or need to inspect files without
necessarily being in a repository.

For example, we will want to have a way of indicating which hash we
would like to use in a fresh repository.  I am for the moment assuming
that we're in a stage 4 configuration: that is, that we're all SHA-1 or
all SHA-256.  A clone will provide this for us, but a git init will not.

Also, our pack index v3 format knows about which hash algorithm is in
use, but packs are not labeled with the algorithm they use.  This isn't
really a problem in normal use, since we always know from context which
algorithm is in use, but we'll need to indicate to index-pack (which
technically need not run in a repository) which algorithm it should use.

show-index will eventually learn to parse the index itself to learn
which algorithms are in use, so it is technically not required here.

> When having a global setting, would that override the configured
> object format extension in a repository, or do we error out?
> 
> So maybe
> 
>   git -c extensions.objectFormat=sha256 init
> 
> is the way to go, for now? (Are repository format extensions parsed
> just like normal config, such that non-RUN_SETUP commands
> can rely on the (non-)existence to determine whether to use
> the default or the given hash function?)

The extensions callbacks are only handled in check_repo_format, so they
necessarily require a repository.  This is not new with my code.

Furthermore, one would have to specify "-c
core.repositoryformatversion=1" as well, as extensions require that
version in order to have any effect.

My current approach for the testsuite is to have git init honor a new
GIT_DEFAULT_HASH environment variable so we need not modify every place
in the testsuite that calls git init (of which there are many).  That
may or may not be greeted with joy by reviewers, but it seemed to be the
minimum viable approach.

> There is a section "Object names on the command line"
> in Documentation/technical/hash-function-transition.txt
> and I assume that this before the "dark launch"
> phase, so I would expect the latter to work (no error
> but conversion/translation on the fly) eventually as a goal.
> But the former might be in scope of one series.

Currently, I'm not implementing the stage 1-3 implementations.  I'm
merely going from the point where we have a binary that does only
SHA-256 and cannot perform SHA-1 operations at all to a stage 4
implementation, where the binary can do either, but a repository is
wholly one or the other.

> > It can work this way:
> >
> >  - read HEAD, discover that I am on 'master' branch, read refs/heads/master
> >to learn the object name in 40-hex, realize that it cannot be
> >sha256 and report "corrupt ref".
> >
> > Or it can work this way:
> >
> >  - read repository format, realize it is a good old sha1 repository.
> >
> >  - do the usual thing to get to read_object() to read the commit
> >object data for the commit at HEAD, doing all of it in sha1.
> >
> >  - in the commit object data, locate references to other objects
> >that use sha1 name.
> >
> >  - replace these sha1 references with their sha256 counter

Wildcard URL config matching

2018-11-05 Thread brian m. carlson
In a272b9e70a ("urlmatch: allow globbing for the URL host part",
2017-01-31), we added support for wildcard matching for URLs when
reading from .git/config.  Now it's possible to specify an option like
http.http://*.example.com/.cookieFile and have it match for the URL
http://foo.example.com.  However, since this option also allows
wildcards at any level, the following also matches:
http.http://*.*.*/.cookieFile.

I'm wondering if it was intentional to allow this behavior or if we
intended to allow only the leftmost label (or labels) to be a wildcard.
The tests seem to test only the leftmost label, and that is the behavior
that one has for TLS certificates, for example.  I don't really see a
situation where one would want to match hostname labels in an arbitrary
position, but perhaps I'm simply not being imaginative enough in
thinking through the use cases.

Regardless of what we decide, I'll be sending a patch, either to add
additional tests, or correct the code (or both).

I ask because we're implementing this behavior for Git LFS, where we
don't iterate over all configuration keys, instead looking up certain
values in a hash.  We'll need to make some changes in order to have
things work correctly if we want to implement the current Git behavior
to prevent combinatorial explosion.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Design of multiple hash support

2018-11-04 Thread brian m. carlson
I'm currently working on getting Git to support multiple hash algorithms
in the same binary (SHA-1 and SHA-256).  In order to have a fully
functional binary, we'll need to have some way of indicating to certain
commands (such as init and show-index) that they should assume a certain
hash algorithm.

There are basically two approaches I can take.  The first is to provide
each command that needs to learn about this with its own --hash
argument.  So we'd have:

  git init --hash=sha256
  git show-index --hash=sha256 https://keybase.io/bk2204


signature.asc
Description: PGP signature


[PATCH v5 09/12] commit-graph: convert to using the_hash_algo

2018-11-04 Thread brian m. carlson
Instead of using hard-coded constants for object sizes, use
the_hash_algo to look them up.  In addition, use a function call to look
up the object ID version and produce the correct value.  For now, we use
version 1, which means to use the default algorithm used in the rest of
the repository.

Signed-off-by: brian m. carlson 
---
 commit-graph.c | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..6763d19288 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -23,16 +23,11 @@
 #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
 #define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
 
-#define GRAPH_DATA_WIDTH 36
+#define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
 
 #define GRAPH_VERSION_1 0x1
 #define GRAPH_VERSION GRAPH_VERSION_1
 
-#define GRAPH_OID_VERSION_SHA1 1
-#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ
-#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
-#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
-
 #define GRAPH_OCTOPUS_EDGES_NEEDED 0x8000
 #define GRAPH_PARENT_MISSING 0x7fff
 #define GRAPH_EDGE_LAST_MASK 0x7fff
@@ -44,13 +39,18 @@
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
-   + GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
+   + GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
 
 char *get_commit_graph_filename(const char *obj_dir)
 {
return xstrfmt("%s/info/commit-graph", obj_dir);
 }
 
+static uint8_t oid_version(void)
+{
+   return 1;
+}
+
 static struct commit_graph *alloc_commit_graph(void)
 {
struct commit_graph *g = xcalloc(1, sizeof(*g));
@@ -125,15 +125,15 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
}
 
hash_version = *(unsigned char*)(data + 5);
-   if (hash_version != GRAPH_OID_VERSION) {
+   if (hash_version != oid_version()) {
error(_("hash version %X does not match version %X"),
- hash_version, GRAPH_OID_VERSION);
+ hash_version, oid_version());
goto cleanup_fail;
}
 
graph = alloc_commit_graph();
 
-   graph->hash_len = GRAPH_OID_LEN;
+   graph->hash_len = the_hash_algo->rawsz;
graph->num_chunks = *(unsigned char*)(data + 6);
graph->graph_fd = fd;
graph->data = graph_map;
@@ -149,7 +149,7 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
 
chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 
-   if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
+   if (chunk_offset > graph_size - the_hash_algo->rawsz) {
error(_("improper chunk offset %08x%08x"), 
(uint32_t)(chunk_offset >> 32),
  (uint32_t)chunk_offset);
goto cleanup_fail;
@@ -764,6 +764,7 @@ void write_commit_graph(const char *obj_dir,
int num_extra_edges;
struct commit_list *parent;
struct progress *progress = NULL;
+   const unsigned hashsz = the_hash_algo->rawsz;
 
if (!commit_graph_compatible(the_repository))
return;
@@ -909,7 +910,7 @@ void write_commit_graph(const char *obj_dir,
hashwrite_be32(f, GRAPH_SIGNATURE);
 
hashwrite_u8(f, GRAPH_VERSION);
-   hashwrite_u8(f, GRAPH_OID_VERSION);
+   hashwrite_u8(f, oid_version());
hashwrite_u8(f, num_chunks);
hashwrite_u8(f, 0); /* unused padding byte */
 
@@ -924,8 +925,8 @@ void write_commit_graph(const char *obj_dir,
 
chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
-   chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
-   chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
+   chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
+   chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
 
for (i = 0; i <= num_chunks; i++) {
@@ -938,8 +939,8 @@ void write_commit_graph(const char *obj_dir,
}
 
write_graph_chunk_fanout(f, commits.list, commits.nr);
-   write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
-   write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
+   write_graph_chunk_oids(f, hashsz, commits.list, commits.nr);
+   write_graph_chunk_data(f, hashsz, commits.list, commits.nr);
write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
close_commit_graph(the_repository);


[PATCH v5 12/12] hash: add an SHA-256 implementation using OpenSSL

2018-11-04 Thread brian m. carlson
We already have OpenSSL routines available for SHA-1, so add routines
for SHA-256 as well.

On a Core i7-6600U, this SHA-256 implementation compares favorably to
the SHA1DC SHA-1 implementation:

SHA-1: 157 MiB/s (64 byte chunks); 337 MiB/s (16 KiB chunks)
SHA-256: 165 MiB/s (64 byte chunks); 408 MiB/s (16 KiB chunks)

Signed-off-by: brian m. carlson 
---
 Makefile | 7 +++
 hash.h   | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/Makefile b/Makefile
index 5a07e03100..36fd3a149b 100644
--- a/Makefile
+++ b/Makefile
@@ -183,6 +183,8 @@ all::
 #
 # Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
 #
+# Define OPENSSL_SHA256 to use the SHA-256 routines in OpenSSL.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1638,6 +1640,10 @@ endif
 endif
 endif
 
+ifdef OPENSSL_SHA256
+   EXTLIBS += $(LIB_4_CRYPTO)
+   BASIC_CFLAGS += -DSHA256_OPENSSL
+else
 ifdef GCRYPT_SHA256
BASIC_CFLAGS += -DSHA256_GCRYPT
EXTLIBS += -lgcrypt
@@ -1645,6 +1651,7 @@ else
LIB_OBJS += sha256/block/sha256.o
BASIC_CFLAGS += -DSHA256_BLK
 endif
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index 2ef098052d..adde708cf2 100644
--- a/hash.h
+++ b/hash.h
@@ -17,6 +17,8 @@
 
 #if defined(SHA256_GCRYPT)
 #include "sha256/gcrypt.h"
+#elif defined(SHA256_OPENSSL)
+#include 
 #else
 #include "sha256/block/sha256.h"
 #endif


[PATCH v5 11/12] sha256: add an SHA-256 implementation using libgcrypt

2018-11-04 Thread brian m. carlson
Generally, one gets better performance out of cryptographic routines
written in assembly than C, and this is also true for SHA-256.  In
addition, most Linux distributions cannot distribute Git linked against
OpenSSL for licensing reasons.

Most systems with GnuPG will also have libgcrypt, since it is a
dependency of GnuPG.  libgcrypt is also faster than the SHA1DC
implementation for messages of a few KiB and larger.

For comparison, on a Core i7-6600U, this implementation processes 16 KiB
chunks at 355 MiB/s while SHA1DC processes equivalent chunks at 337
MiB/s.

In addition, libgcrypt is licensed under the LGPL 2.1, which is
compatible with the GPL.  Add an implementation of SHA-256 that uses
libgcrypt.

Signed-off-by: brian m. carlson 
---
 Makefile| 13 +++--
 hash.h  |  4 
 sha256/gcrypt.h | 30 ++
 3 files changed, 45 insertions(+), 2 deletions(-)
 create mode 100644 sha256/gcrypt.h

diff --git a/Makefile b/Makefile
index e99b7712f6..5a07e03100 100644
--- a/Makefile
+++ b/Makefile
@@ -179,6 +179,10 @@ all::
 # in one call to the platform's SHA1_Update(). e.g. APPLE_COMMON_CRYPTO
 # wants 'SHA1_MAX_BLOCK_SIZE=1024L*1024L*1024L' defined.
 #
+# Define BLK_SHA256 to use the built-in SHA-256 routines.
+#
+# Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1634,8 +1638,13 @@ endif
 endif
 endif
 
-LIB_OBJS += sha256/block/sha256.o
-BASIC_CFLAGS += -DSHA256_BLK
+ifdef GCRYPT_SHA256
+   BASIC_CFLAGS += -DSHA256_GCRYPT
+   EXTLIBS += -lgcrypt
+else
+   LIB_OBJS += sha256/block/sha256.o
+   BASIC_CFLAGS += -DSHA256_BLK
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index a9bc624020..2ef098052d 100644
--- a/hash.h
+++ b/hash.h
@@ -15,7 +15,11 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#if defined(SHA256_GCRYPT)
+#include "sha256/gcrypt.h"
+#else
 #include "sha256/block/sha256.h"
+#endif
 
 #ifndef platform_SHA_CTX
 /*
diff --git a/sha256/gcrypt.h b/sha256/gcrypt.h
new file mode 100644
index 00..09bd8bb200
--- /dev/null
+++ b/sha256/gcrypt.h
@@ -0,0 +1,30 @@
+#ifndef SHA256_GCRYPT_H
+#define SHA256_GCRYPT_H
+
+#include 
+
+#define SHA256_DIGEST_SIZE 32
+
+typedef gcry_md_hd_t gcrypt_SHA256_CTX;
+
+inline void gcrypt_SHA256_Init(gcrypt_SHA256_CTX *ctx)
+{
+   gcry_md_open(ctx, GCRY_MD_SHA256, 0);
+}
+
+inline void gcrypt_SHA256_Update(gcrypt_SHA256_CTX *ctx, const void *data, 
size_t len)
+{
+   gcry_md_write(*ctx, data, len);
+}
+
+inline void gcrypt_SHA256_Final(unsigned char *digest, gcrypt_SHA256_CTX *ctx)
+{
+   memcpy(digest, gcry_md_read(*ctx, GCRY_MD_SHA256), SHA256_DIGEST_SIZE);
+}
+
+#define platform_SHA256_CTX gcrypt_SHA256_CTX
+#define platform_SHA256_Init gcrypt_SHA256_Init
+#define platform_SHA256_Update gcrypt_SHA256_Update
+#define platform_SHA256_Final gcrypt_SHA256_Final
+
+#endif


[PATCH v5 10/12] Add a base implementation of SHA-256 support

2018-11-04 Thread brian m. carlson
SHA-1 is weak and we need to transition to a new hash function.  For
some time, we have referred to this new function as NewHash.  Recently,
we decided to pick SHA-256 as NewHash.  The reasons behind the choice of
SHA-256 are outlined in the thread starting at [1] and in the commit
history for the hash function transition document.

Add a basic implementation of SHA-256 based off libtomcrypt, which is in
the public domain.  Optimize it and restructure it to meet our coding
standards.  Pull in the update and final functions from the SHA-1 block
implementation, as we know these function correctly with all compilers.
This implementation is slower than SHA-1, but more performant
implementations will be introduced in future commits.

Wire up SHA-256 in the list of hash algorithms, and add a test that the
algorithm works correctly.

Note that with this patch, it is still not possible to switch to using
SHA-256 in Git.  Additional patches are needed to prepare the code to
handle a larger hash algorithm and further test fixes are needed.

[1] 
https://public-inbox.org/git/20180609224913.gc38...@genre.crustytoothpaste.net/

Signed-off-by: brian m. carlson 
---
 Makefile   |   4 +
 cache.h|  12 ++-
 hash.h |  19 +++-
 sha1-file.c|  45 ++
 sha256/block/sha256.c  | 196 +
 sha256/block/sha256.h  |  26 ++
 t/helper/test-sha256.c |   7 ++
 t/helper/test-tool.c   |   1 +
 t/helper/test-tool.h   |   1 +
 t/t0015-hash.sh|  25 ++
 10 files changed, 332 insertions(+), 4 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 t/helper/test-sha256.c

diff --git a/Makefile b/Makefile
index 68169a7abb..e99b7712f6 100644
--- a/Makefile
+++ b/Makefile
@@ -739,6 +739,7 @@ TEST_BUILTINS_OBJS += test-run-command.o
 TEST_BUILTINS_OBJS += test-scrap-cache-tree.o
 TEST_BUILTINS_OBJS += test-sha1.o
 TEST_BUILTINS_OBJS += test-sha1-array.o
+TEST_BUILTINS_OBJS += test-sha256.o
 TEST_BUILTINS_OBJS += test-sigchain.o
 TEST_BUILTINS_OBJS += test-strcmp-offset.o
 TEST_BUILTINS_OBJS += test-string-list.o
@@ -1633,6 +1634,9 @@ endif
 endif
 endif
 
+LIB_OBJS += sha256/block/sha256.o
+BASIC_CFLAGS += -DSHA256_BLK
+
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
BASIC_CFLAGS += -DSHA1_MAX_BLOCK_SIZE="$(SHA1_MAX_BLOCK_SIZE)"
diff --git a/cache.h b/cache.h
index 9e5d1dd85a..48ce1565e6 100644
--- a/cache.h
+++ b/cache.h
@@ -48,11 +48,17 @@ unsigned long git_deflate_bound(git_zstream *, unsigned 
long);
 /* The block size of SHA-1. */
 #define GIT_SHA1_BLKSZ 64
 
+/* The length in bytes and in hex digits of an object name (SHA-256 value). */
+#define GIT_SHA256_RAWSZ 32
+#define GIT_SHA256_HEXSZ (2 * GIT_SHA256_RAWSZ)
+/* The block size of SHA-256. */
+#define GIT_SHA256_BLKSZ 64
+
 /* The length in byte and in hex digits of the largest possible hash value. */
-#define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
-#define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+#define GIT_MAX_RAWSZ GIT_SHA256_RAWSZ
+#define GIT_MAX_HEXSZ GIT_SHA256_HEXSZ
 /* The largest possible block size for any supported hash. */
-#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
+#define GIT_MAX_BLKSZ GIT_SHA256_BLKSZ
 
 struct object_id {
unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 1bcf7ab6fd..a9bc624020 100644
--- a/hash.h
+++ b/hash.h
@@ -15,6 +15,8 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#include "sha256/block/sha256.h"
+
 #ifndef platform_SHA_CTX
 /*
  * platform's underlying implementation of SHA-1; could be OpenSSL,
@@ -34,6 +36,18 @@
 #define git_SHA1_Updateplatform_SHA1_Update
 #define git_SHA1_Final platform_SHA1_Final
 
+#ifndef platform_SHA256_CTX
+#define platform_SHA256_CTXSHA256_CTX
+#define platform_SHA256_Init   SHA256_Init
+#define platform_SHA256_Update SHA256_Update
+#define platform_SHA256_Final  SHA256_Final
+#endif
+
+#define git_SHA256_CTX platform_SHA256_CTX
+#define git_SHA256_Initplatform_SHA256_Init
+#define git_SHA256_Update  platform_SHA256_Update
+#define git_SHA256_Final   platform_SHA256_Final
+
 #ifdef SHA1_MAX_BLOCK_SIZE
 #include "compat/sha1-chunked.h"
 #undef git_SHA1_Update
@@ -52,12 +66,15 @@
 #define GIT_HASH_UNKNOWN 0
 /* SHA-1 */
 #define GIT_HASH_SHA1 1
+/* SHA-256  */
+#define GIT_HASH_SHA256 2
 /* Number of algorithms supported (including unknown). */
-#define GIT_HASH_NALGOS (GIT_HASH_SHA1 + 1)
+#define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
 
 /* A suitably aligned type for stack allocations of hash contexts. */
 union git_hash_ctx {
git_SHA_CTX sha1;
+   git_SHA256_CTX sha256;
 };
 typedef union git_hash_ctx git_hash_ctx;
 
diff --git a/sha1-file.c b/sha1-file.c
index 9bdd04ea45..c97d93a14a 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -40,10 +40,20 @@
 #define EMPTY_TREE_SHA1_BIN_LITERAL \
 &

[PATCH v5 07/12] sha1-file: add a constant for hash block size

2018-11-04 Thread brian m. carlson
There is one place we need the hash algorithm block size: the HMAC code
for push certs.  Expose this constant in struct git_hash_algo and expose
values for SHA-1 and for the largest value of any hash.

Signed-off-by: brian m. carlson 
---
 cache.h | 4 
 hash.h  | 3 +++
 sha1-file.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/cache.h b/cache.h
index bab8e8964f..9e5d1dd85a 100644
--- a/cache.h
+++ b/cache.h
@@ -45,10 +45,14 @@ unsigned long git_deflate_bound(git_zstream *, unsigned 
long);
 /* The length in bytes and in hex digits of an object name (SHA-1 value). */
 #define GIT_SHA1_RAWSZ 20
 #define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
+/* The block size of SHA-1. */
+#define GIT_SHA1_BLKSZ 64
 
 /* The length in byte and in hex digits of the largest possible hash value. */
 #define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
 #define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+/* The largest possible block size for any supported hash. */
+#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
 
 struct object_id {
unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 80881eea47..1bcf7ab6fd 100644
--- a/hash.h
+++ b/hash.h
@@ -81,6 +81,9 @@ struct git_hash_algo {
/* The length of the hash in hex characters. */
size_t hexsz;
 
+   /* The block size of the hash. */
+   size_t blksz;
+
/* The hash initialization function. */
git_hash_init_fn init_fn;
 
diff --git a/sha1-file.c b/sha1-file.c
index 7e9dedc744..9bdd04ea45 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -90,6 +90,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
0x,
0,
0,
+   0,
git_hash_unknown_init,
git_hash_unknown_update,
git_hash_unknown_final,
@@ -102,6 +103,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
0x73686131,
GIT_SHA1_RAWSZ,
GIT_SHA1_HEXSZ,
+   GIT_SHA1_BLKSZ,
git_hash_sha1_init,
git_hash_sha1_update,
git_hash_sha1_final,


[PATCH v5 01/12] sha1-file: rename algorithm to "sha1"

2018-11-04 Thread brian m. carlson
The transition plan anticipates us using a syntax such as "^{sha1}" for
disambiguation.  Since this is a syntax some people will be typing a
lot, it makes sense to provide a short, easy-to-type syntax.  Omitting
the dash doesn't create any ambiguity; however, it does make the syntax
shorter and easier to type, especially for touch typists.  In addition,
the transition plan already uses "sha1" in this context.

Rename the name of SHA-1 implementation to "sha1".

Note that this change creates no backwards compatibility concerns, since
we haven't yet used this field in any configuration settings.

Signed-off-by: brian m. carlson 
---
 sha1-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1-file.c b/sha1-file.c
index dd0b6aa873..91311ebb3d 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -97,7 +97,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
NULL,
},
{
-   "sha-1",
+   "sha1",
/* "sha1", big-endian */
0x73686131,
GIT_SHA1_RAWSZ,


[PATCH v5 06/12] t: make the sha1 test-tool helper generic

2018-11-04 Thread brian m. carlson
Since we're going to have multiple hash algorithms to test, it makes
sense to share as much of the test code as possible.  Convert the sha1
helper for the test-tool to be generic and move it out into its own
module.  This will allow us to share most of this code with our NewHash
implementation.

Signed-off-by: brian m. carlson 
---
 Makefile  |  1 +
 t/helper/{test-sha1.c => test-hash.c} | 19 +-
 t/helper/test-sha1.c  | 52 +--
 t/helper/test-tool.h  |  2 ++
 4 files changed, 14 insertions(+), 60 deletions(-)
 copy t/helper/{test-sha1.c => test-hash.c} (65%)

diff --git a/Makefile b/Makefile
index d18ab0fe78..81dc9ac819 100644
--- a/Makefile
+++ b/Makefile
@@ -714,6 +714,7 @@ TEST_BUILTINS_OBJS += test-dump-split-index.o
 TEST_BUILTINS_OBJS += test-dump-untracked-cache.o
 TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
+TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
diff --git a/t/helper/test-sha1.c b/t/helper/test-hash.c
similarity index 65%
copy from t/helper/test-sha1.c
copy to t/helper/test-hash.c
index 1ba0675c75..0a31de66f3 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-hash.c
@@ -1,13 +1,14 @@
 #include "test-tool.h"
 #include "cache.h"
 
-int cmd__sha1(int ac, const char **av)
+int cmd_hash_impl(int ac, const char **av, int algo)
 {
-   git_SHA_CTX ctx;
-   unsigned char sha1[20];
+   git_hash_ctx ctx;
+   unsigned char hash[GIT_MAX_HEXSZ];
unsigned bufsz = 8192;
int binary = 0;
char *buffer;
+   const struct git_hash_algo *algop = _algos[algo];
 
if (ac == 2) {
if (!strcmp(av[1], "-b"))
@@ -26,7 +27,7 @@ int cmd__sha1(int ac, const char **av)
die("OOPS");
}
 
-   git_SHA1_Init();
+   algop->init_fn();
 
while (1) {
ssize_t sz, this_sz;
@@ -38,20 +39,20 @@ int cmd__sha1(int ac, const char **av)
if (sz == 0)
break;
if (sz < 0)
-   die_errno("test-sha1");
+   die_errno("test-hash");
this_sz += sz;
cp += sz;
room -= sz;
}
if (this_sz == 0)
break;
-   git_SHA1_Update(, buffer, this_sz);
+   algop->update_fn(, buffer, this_sz);
}
-   git_SHA1_Final(sha1, );
+   algop->final_fn(hash, );
 
if (binary)
-   fwrite(sha1, 1, 20, stdout);
+   fwrite(hash, 1, algop->rawsz, stdout);
else
-   puts(sha1_to_hex(sha1));
+   puts(hash_to_hex_algop(hash, algop));
exit(0);
 }
diff --git a/t/helper/test-sha1.c b/t/helper/test-sha1.c
index 1ba0675c75..d860c387c3 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-sha1.c
@@ -3,55 +3,5 @@
 
 int cmd__sha1(int ac, const char **av)
 {
-   git_SHA_CTX ctx;
-   unsigned char sha1[20];
-   unsigned bufsz = 8192;
-   int binary = 0;
-   char *buffer;
-
-   if (ac == 2) {
-   if (!strcmp(av[1], "-b"))
-   binary = 1;
-   else
-   bufsz = strtoul(av[1], NULL, 10) * 1024 * 1024;
-   }
-
-   if (!bufsz)
-   bufsz = 8192;
-
-   while ((buffer = malloc(bufsz)) == NULL) {
-   fprintf(stderr, "bufsz %u is too big, halving...\n", bufsz);
-   bufsz /= 2;
-   if (bufsz < 1024)
-   die("OOPS");
-   }
-
-   git_SHA1_Init();
-
-   while (1) {
-   ssize_t sz, this_sz;
-   char *cp = buffer;
-   unsigned room = bufsz;
-   this_sz = 0;
-   while (room) {
-   sz = xread(0, cp, room);
-   if (sz == 0)
-   break;
-   if (sz < 0)
-   die_errno("test-sha1");
-   this_sz += sz;
-   cp += sz;
-   room -= sz;
-   }
-   if (this_sz == 0)
-   break;
-   git_SHA1_Update(, buffer, this_sz);
-   }
-   git_SHA1_Final(sha1, );
-
-   if (binary)
-   fwrite(sha1, 1, 20, stdout);
-   else
-   puts(sha1_to_hex(sha1));
-   exit(0);
+   return cmd_hash_impl(ac, av, GIT_HASH_SHA1);
 }
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index e4890566da..29ac7b0b0d 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -50

[PATCH v5 08/12] t/helper: add a test helper to compute hash speed

2018-11-04 Thread brian m. carlson
Add a utility (which is less for the testsuite and more for developers)
that can compute hash speeds for whatever hash algorithms are
implemented.  This allows developers to test their personal systems to
determine the performance characteristics of various algorithms.

Signed-off-by: brian m. carlson 
---
 Makefile   |  1 +
 t/helper/test-hash-speed.c | 61 ++
 t/helper/test-tool.c   |  1 +
 t/helper/test-tool.h   |  1 +
 4 files changed, 64 insertions(+)
 create mode 100644 t/helper/test-hash-speed.c

diff --git a/Makefile b/Makefile
index 81dc9ac819..68169a7abb 100644
--- a/Makefile
+++ b/Makefile
@@ -716,6 +716,7 @@ TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
 TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
+TEST_BUILTINS_OBJS += test-hash-speed.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
 TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o
diff --git a/t/helper/test-hash-speed.c b/t/helper/test-hash-speed.c
new file mode 100644
index 00..432233c7f0
--- /dev/null
+++ b/t/helper/test-hash-speed.c
@@ -0,0 +1,61 @@
+#include "test-tool.h"
+#include "cache.h"
+
+#define NUM_SECONDS 3
+
+static inline void compute_hash(const struct git_hash_algo *algo, git_hash_ctx 
*ctx, uint8_t *final, const void *p, size_t len)
+{
+   algo->init_fn(ctx);
+   algo->update_fn(ctx, p, len);
+   algo->final_fn(final, ctx);
+}
+
+int cmd__hash_speed(int ac, const char **av)
+{
+   git_hash_ctx ctx;
+   unsigned char hash[GIT_MAX_RAWSZ];
+   clock_t initial, start, end;
+   unsigned bufsizes[] = { 64, 256, 1024, 8192, 16384 };
+   int i;
+   void *p;
+   const struct git_hash_algo *algo = NULL;
+
+   if (ac == 2) {
+   for (i = 1; i < GIT_HASH_NALGOS; i++) {
+   if (!strcmp(av[1], hash_algos[i].name)) {
+   algo = _algos[i];
+   break;
+   }
+   }
+   }
+   if (!algo)
+   die("usage: test-tool hash-speed algo_name");
+
+   /* Use this as an offset to make overflow less likely. */
+   initial = clock();
+
+   printf("algo: %s\n", algo->name);
+
+   for (i = 0; i < ARRAY_SIZE(bufsizes); i++) {
+   unsigned long j, kb;
+   double kb_per_sec;
+   p = xcalloc(1, bufsizes[i]);
+   start = end = clock() - initial;
+   for (j = 0; ((end - start) / CLOCKS_PER_SEC) < NUM_SECONDS; 
j++) {
+   compute_hash(algo, , hash, p, bufsizes[i]);
+
+   /*
+* Only check elapsed time every 128 iterations to avoid
+* dominating the runtime with system calls.
+*/
+   if (!(j & 127))
+   end = clock() - initial;
+   }
+   kb = j * bufsizes[i];
+   kb_per_sec = kb / (1024 * ((double)end - start) / 
CLOCKS_PER_SEC);
+   printf("size %u: %lu iters; %lu KiB; %0.2f KiB/s\n", 
bufsizes[i], j, kb, kb_per_sec);
+   free(p);
+   }
+
+   exit(0);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 6b5836dc1b..e009c8186d 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -20,6 +20,7 @@ static struct test_cmd cmds[] = {
{ "example-decorate", cmd__example_decorate },
{ "genrandom", cmd__genrandom },
{ "hashmap", cmd__hashmap },
+   { "hash-speed", cmd__hash_speed },
{ "index-version", cmd__index_version },
{ "json-writer", cmd__json_writer },
{ "lazy-init-name-hash", cmd__lazy_init_name_hash },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 29ac7b0b0d..19a7e8332a 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -16,6 +16,7 @@ int cmd__dump_untracked_cache(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
 int cmd__hashmap(int argc, const char **argv);
+int cmd__hash_speed(int argc, const char **argv);
 int cmd__index_version(int argc, const char **argv);
 int cmd__json_writer(int argc, const char **argv);
 int cmd__lazy_init_name_hash(int argc, const char **argv);


[PATCH v5 05/12] t: add basic tests for our SHA-1 implementation

2018-11-04 Thread brian m. carlson
We have in the past had some unfortunate endianness issues with some
SHA-1 implementations we ship, especially on big-endian machines.  Add
an explicit test using the test helper to catch these issues and point
them out prominently.  This test can also be used as a staging ground
for people testing additional algorithms to verify that their
implementations are working as expected.

Signed-off-by: brian m. carlson 
---
 t/t0015-hash.sh | 29 +
 1 file changed, 29 insertions(+)
 create mode 100755 t/t0015-hash.sh

diff --git a/t/t0015-hash.sh b/t/t0015-hash.sh
new file mode 100755
index 00..8e763c2c3d
--- /dev/null
+++ b/t/t0015-hash.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+test_description='test basic hash implementation'
+. ./test-lib.sh
+
+
+test_expect_success 'test basic SHA-1 hash values' '
+   test-tool sha1 actual &&
+   grep da39a3ee5e6b4b0d3255bfef95601890afd80709 actual &&
+   printf "a" | test-tool sha1 >actual &&
+   grep 86f7e437faa5a7fce15d1ddcb9eaeaea377667b8 actual &&
+   printf "abc" | test-tool sha1 >actual &&
+   grep a9993e364706816aba3e25717850c26c9cd0d89d actual &&
+   printf "message digest" | test-tool sha1 >actual &&
+   grep c12252ceda8be8994d5fa0290a47231c1d16aae3 actual &&
+   printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha1 >actual &&
+   grep 32d10c7b8cf96570ca04ce37f2a19d84240d3a89 actual &&
+   perl -E "for (1..10) { print q{aa}; }" | \
+   test-tool sha1 >actual &&
+   grep 34aa973cd4c4daa4f61eeb2bdbad27316534016f actual &&
+   printf "blob 0\0" | test-tool sha1 >actual &&
+   grep e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 actual &&
+   printf "blob 3\0abc" | test-tool sha1 >actual &&
+   grep f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f actual &&
+   printf "tree 0\0" | test-tool sha1 >actual &&
+   grep 4b825dc642cb6eb9a060e54bf8d69288fbee4904 actual
+'
+
+test_done


[PATCH v5 04/12] cache: make hashcmp and hasheq work with larger hashes

2018-11-04 Thread brian m. carlson
In 183a638b7d ("hashcmp: assert constant hash size", 2018-08-23), we
modified hashcmp to assert that the hash size was always 20 to help it
optimize and inline calls to memcmp.  In a future series, we replaced
many calls to hashcmp and oidcmp with calls to hasheq and oideq to
improve inlining further.

However, we want to support hash algorithms other than SHA-1, namely
SHA-256.  When doing so, we must handle the case where these values are
32 bytes long as well as 20.  Adjust hashcmp to handle two cases:
20-byte matches, and maximum-size matches.  Therefore, when we include
SHA-256, we'll automatically handle it properly, while at the same time
teaching the compiler that there are only two possible options to
consider.  This will allow the compiler to write the most efficient
possible code.

Copy similar code into hasheq and perform an identical transformation.
At least with GCC 8.2.0, making hasheq defer to hashcmp when there are
two branches prevents the compiler from inlining the comparison, while
the code in this patch is inlined properly.  Add a comment to avoid an
accidental performance regression from well-intentioned refactoring.

Signed-off-by: brian m. carlson 
---
 cache.h | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index 51580c4b77..bab8e8964f 100644
--- a/cache.h
+++ b/cache.h
@@ -1027,16 +1027,12 @@ extern const struct object_id null_oid;
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
/*
-* This is a temporary optimization hack. By asserting the size here,
-* we let the compiler know that it's always going to be 20, which lets
-* it turn this fixed-size memcmp into a few inline instructions.
-*
-* This will need to be extended or ripped out when we learn about
-* hashes of different sizes.
+* Teach the compiler that there are only two possibilities of hash size
+* here, so that it can optimize for this case as much as possible.
 */
-   if (the_hash_algo->rawsz != 20)
-   BUG("hash size not yet supported by hashcmp");
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
+   return memcmp(sha1, sha2, GIT_MAX_RAWSZ);
+   return memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)
@@ -1046,7 +1042,13 @@ static inline int oidcmp(const struct object_id *oid1, 
const struct object_id *o
 
 static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return !hashcmp(sha1, sha2);
+   /*
+* We write this here instead of deferring to hashcmp so that the
+* compiler can properly inline it and avoid calling memcmp.
+*/
+   if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
+   return !memcmp(sha1, sha2, GIT_MAX_RAWSZ);
+   return !memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oideq(const struct object_id *oid1, const struct object_id 
*oid2)


[PATCH v5 00/12] Base SHA-256 implementation

2018-11-04 Thread brian m. carlson
This series provides a functional SHA-256 implementation and wires it
up, along with some housekeeping patches to make it suitable for
testing.

Changes from v4:
* Downcase hex constants for consistency.
* Remove needless parentheses in return statement.
* Remove braces for single statement loops.
* Switch to +=.
* Add references to rationale for SHA-256.
* Remove inclusion of "git-compat-util.h" in header.

Changes from v3:
* Switch to using inline functions instead of macros in many cases.
* Undefine remaining macros at the top.

Changes from v2:
* Improve commit messages to include timing and performance information.
* Improve commit messages to be less ambiguous and more friendly to a
  wider variety of English speakers.
* Prefer functions taking struct git_hash_algo in hex.c.
* Port pieces of the block-sha1 implementation over to the block-sha256
  implementation for better compatibility.
* Drop patch 13 in favor of further discussion about the best way
  forward for versioning commit graph.
* Rename the test so as to have a different number from other tests.
* Rebase on master.

Changes from v1:
* Add a hash_to_hex function mirroring sha1_to_hex, but for
  the_hash_algo.
* Strip commit message explanation about why we chose SHA-256.
* Rebase on master
* Strip leading whitespace from commit message.
* Improve commit-graph patch to cover new code added since v1.
* Be more honest about the scope of work involved in porting the SHA-256
  implementation out of libtomcrypt.
* Revert change to limit hashcmp to 20 bytes.

brian m. carlson (12):
  sha1-file: rename algorithm to "sha1"
  sha1-file: provide functions to look up hash algorithms
  hex: introduce functions to print arbitrary hashes
  cache: make hashcmp and hasheq work with larger hashes
  t: add basic tests for our SHA-1 implementation
  t: make the sha1 test-tool helper generic
  sha1-file: add a constant for hash block size
  t/helper: add a test helper to compute hash speed
  commit-graph: convert to using the_hash_algo
  Add a base implementation of SHA-256 support
  sha256: add an SHA-256 implementation using libgcrypt
  hash: add an SHA-256 implementation using OpenSSL

 Makefile  |  22 +++
 cache.h   |  51 ---
 commit-graph.c|  33 ++---
 hash.h|  41 +-
 hex.c |  32 +++--
 sha1-file.c   |  70 -
 sha256/block/sha256.c | 196 ++
 sha256/block/sha256.h |  26 
 sha256/gcrypt.h   |  30 
 t/helper/test-hash-speed.c|  61 
 t/helper/{test-sha1.c => test-hash.c} |  19 +--
 t/helper/test-sha1.c  |  52 +--
 t/helper/test-sha256.c|   7 +
 t/helper/test-tool.c  |   2 +
 t/helper/test-tool.h  |   4 +
 t/t0015-hash.sh   |  54 +++
 16 files changed, 596 insertions(+), 104 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 sha256/gcrypt.h
 create mode 100644 t/helper/test-hash-speed.c
 copy t/helper/{test-sha1.c => test-hash.c} (65%)
 create mode 100644 t/helper/test-sha256.c
 create mode 100755 t/t0015-hash.sh

Range-diff against v4:
 1:  a004a4c982 <  -:  -- :hash-impl
 2:  cf9f7f5620 =  1:  cf9f7f5620 sha1-file: rename algorithm to "sha1"
 3:  0144deaebe =  2:  0144deaebe sha1-file: provide functions to look up hash 
algorithms
 4:  b74858fb03 =  3:  b74858fb03 hex: introduce functions to print arbitrary 
hashes
 5:  e9703017a4 =  4:  e9703017a4 cache: make hashcmp and hasheq work with 
larger hashes
 6:  ab85a834fd =  5:  ab85a834fd t: add basic tests for our SHA-1 
implementation
 7:  962f6d8903 =  6:  962f6d8903 t: make the sha1 test-tool helper generic
 8:  53addf4d58 =  7:  53addf4d58 sha1-file: add a constant for hash block size
 9:  9ace10faa2 =  8:  9ace10faa2 t/helper: add a test helper to compute hash 
speed
10:  9adc56d01e =  9:  9adc56d01e commit-graph: convert to using the_hash_algo
11:  f48cb1ad27 ! 10:  90544c504c Add a base implementation of SHA-256 support
@@ -4,7 +4,9 @@
 
 SHA-1 is weak and we need to transition to a new hash function.  For
 some time, we have referred to this new function as NewHash.  Recently,
-we decided to pick SHA-256 as NewHash.
+we decided to pick SHA-256 as NewHash.  The reasons behind the choice 
of
+SHA-256 are outlined in the thread starting at [1] and in the commit
+history for the hash function transition document.
 
 Add a basic implementation of SHA-256 based off libtomcrypt, which is 
in
 the public domain.  Optimize it and restructure it to meet our coding
@@ -20,6 +22,8 @@
 SHA-256 in Git.  Additional patches are needed to prepa

[PATCH v5 03/12] hex: introduce functions to print arbitrary hashes

2018-11-04 Thread brian m. carlson
Currently, we have functions that turn an arbitrary SHA-1 value or an
object ID into hex format, either using a static buffer or with a
user-provided buffer.  Add variants of these functions that can handle
an arbitrary hash algorithm, specified by constant.  Update the
documentation as well.

While we're at it, remove the "extern" declaration from this family of
functions, since it's not needed and our style now recommends against
it.

We use the variant taking the algorithm structure pointer as the
internal variant, since taking an algorithm pointer is the easiest way
to handle all of the variants in use.

Note that we maintain these functions because there are hashes which
must change based on the hash algorithm in use but are not object IDs
(such as pack checksums).

Signed-off-by: brian m. carlson 
---
 cache.h | 15 +--
 hex.c   | 32 
 2 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/cache.h b/cache.h
index 59c8a93046..51580c4b77 100644
--- a/cache.h
+++ b/cache.h
@@ -1364,9 +1364,9 @@ extern int get_oid_hex(const char *hex, struct object_id 
*sha1);
 extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
 
 /*
- * Convert a binary sha1 to its hex equivalent. The `_r` variant is reentrant,
+ * Convert a binary hash to its hex equivalent. The `_r` variant is reentrant,
  * and writes the NUL-terminated output to the buffer `out`, which must be at
- * least `GIT_SHA1_HEXSZ + 1` bytes, and returns a pointer to out for
+ * least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for
  * convenience.
  *
  * The non-`_r` variant returns a static buffer, but uses a ring of 4
@@ -1374,10 +1374,13 @@ extern int hex_to_bytes(unsigned char *binary, const 
char *hex, size_t len);
  *
  *   printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two));
  */
-extern char *sha1_to_hex_r(char *out, const unsigned char *sha1);
-extern char *oid_to_hex_r(char *out, const struct object_id *oid);
-extern char *sha1_to_hex(const unsigned char *sha1);   /* static buffer 
result! */
-extern char *oid_to_hex(const struct object_id *oid);  /* same static buffer 
as sha1_to_hex */
+char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const 
struct git_hash_algo *);
+char *sha1_to_hex_r(char *out, const unsigned char *sha1);
+char *oid_to_hex_r(char *out, const struct object_id *oid);
+char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo 
*);  /* static buffer result! */
+char *sha1_to_hex(const unsigned char *sha1);  
/* same static buffer */
+char *hash_to_hex(const unsigned char *hash);  
/* same static buffer */
+char *oid_to_hex(const struct object_id *oid); 
/* same static buffer */
 
 /*
  * Parse a 40-character hexadecimal object ID starting from hex, updating the
diff --git a/hex.c b/hex.c
index 10af1a29e8..d2e8bb9540 100644
--- a/hex.c
+++ b/hex.c
@@ -73,14 +73,15 @@ int parse_oid_hex(const char *hex, struct object_id *oid, 
const char **end)
return ret;
 }
 
-char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
+inline char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
+   const struct git_hash_algo *algop)
 {
static const char hex[] = "0123456789abcdef";
char *buf = buffer;
int i;
 
-   for (i = 0; i < the_hash_algo->rawsz; i++) {
-   unsigned int val = *sha1++;
+   for (i = 0; i < algop->rawsz; i++) {
+   unsigned int val = *hash++;
*buf++ = hex[val >> 4];
*buf++ = hex[val & 0xf];
}
@@ -89,20 +90,35 @@ char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
return buffer;
 }
 
-char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
 {
-   return sha1_to_hex_r(buffer, oid->hash);
+   return hash_to_hex_algop_r(buffer, sha1, _algos[GIT_HASH_SHA1]);
 }
 
-char *sha1_to_hex(const unsigned char *sha1)
+char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+{
+   return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo);
+}
+
+char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo 
*algop)
 {
static int bufno;
static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
-   return sha1_to_hex_r(hexbuffer[bufno], sha1);
+   return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop);
+}
+
+char *sha1_to_hex(const unsigned char *sha1)
+{
+   return hash_to_hex_algop(sha1, _algos[GIT_HASH_SHA1]);
+}
+
+char *hash_to_hex(const unsigned char *hash)
+{
+   return hash_to_hex_algop(hash, the_hash_algo);
 }
 
 char *oid_to_hex(const struct object_id *oid)
 {
-   return

[PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms

2018-11-04 Thread brian m. carlson
There are several ways we might refer to a hash algorithm: by name, such
as in the config file; by format ID, such as in a pack; or internally,
by a pointer to the hash_algos array.  Provide functions to look up hash
algorithms based on these various forms and return the internal constant
used for them.  If conversion to another form is necessary, this
internal constant can be used to look up the proper data in the
hash_algos array.

Signed-off-by: brian m. carlson 
---
 hash.h  | 13 +
 sha1-file.c | 21 +
 2 files changed, 34 insertions(+)

diff --git a/hash.h b/hash.h
index 7c8238bc2e..80881eea47 100644
--- a/hash.h
+++ b/hash.h
@@ -98,4 +98,17 @@ struct git_hash_algo {
 };
 extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS];
 
+/*
+ * Return a GIT_HASH_* constant based on the name.  Returns GIT_HASH_UNKNOWN if
+ * the name doesn't match a known algorithm.
+ */
+int hash_algo_by_name(const char *name);
+/* Identical, except based on the format ID. */
+int hash_algo_by_id(uint32_t format_id);
+/* Identical, except for a pointer to struct git_hash_algo. */
+static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
+{
+   return p - hash_algos;
+}
+
 #endif
diff --git a/sha1-file.c b/sha1-file.c
index 91311ebb3d..7e9dedc744 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -122,6 +122,27 @@ const char *empty_blob_oid_hex(void)
return oid_to_hex_r(buf, the_hash_algo->empty_blob);
 }
 
+int hash_algo_by_name(const char *name)
+{
+   int i;
+   if (!name)
+   return GIT_HASH_UNKNOWN;
+   for (i = 1; i < GIT_HASH_NALGOS; i++)
+   if (!strcmp(name, hash_algos[i].name))
+   return i;
+   return GIT_HASH_UNKNOWN;
+}
+
+int hash_algo_by_id(uint32_t format_id)
+{
+   int i;
+   for (i = 1; i < GIT_HASH_NALGOS; i++)
+   if (format_id == hash_algos[i].format_id)
+   return i;
+   return GIT_HASH_UNKNOWN;
+}
+
+
 /*
  * This is meant to hold a *small* number of objects that you would
  * want read_sha1_file() to be able to return, but yet you do not want


Re: [PATCH/RFC v2] sequencer.c: record revert/cherry-pick commit with trailer lines

2018-11-04 Thread brian m. carlson
On Sun, Nov 04, 2018 at 07:10:26PM +0100, Nguyễn Thái Ngọc Duy wrote:
> When a commit is reverted (or cherry-picked with -x) we add an English
> sentence recording that commit id in the new commit message. Make
> these real trailer lines instead so that they are more friendly to
> parsers (especially "git interpret-trailers").
> 
> A reverted commit will have a new trailer
> 
> Revert: 
> 
> Similarly a cherry-picked commit with -x will have
> 
> Cherry-picked-from: 
> 
> When reverting or cherry picking a merge, the reverted/cherry-picked
> branch will be shown using extended SHA-1 syntax, e.g.
> 
> Revert: ~2
> 
> Since we're not producing the old lines "This reverts commit ..." and
> "(cherry picked from commit .." anymore, scripts that look for these
> lines will need to be updated to handle both. Fresh new history could
> just rely on git-interpret-trailers instead.

Overall, I like the idea of this series.  This is a much cleaner way to
handle things and much better for machine-readability.  I foresee git
cherry potentially learning how to parse this, for example, for cases
where the patch-id doesn't match due to context changes.

However, I do have concerns about breaking compatibility with existing
scripts.  I wonder if we could add a long alias for git cherry-pick -x,
say "--notate" and have "--notate=text" mean "-x" and "--notate=trailer"
mean this new format.  Similarly, git revert could learn such an option
as well.

One final thought: since our trailers seem to act as if we wrote "this
commit" (has been), I wonder if we should say "Reverts" instead of
"Revert" for consistency.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [RFC v1] Add virtual file system settings and hook proc

2018-11-04 Thread brian m. carlson
On Sun, Nov 04, 2018 at 07:34:01AM +0100, Duy Nguyen wrote:
> On Wed, Oct 31, 2018 at 9:53 PM Ben Peart  wrote:
> > It's more than a dynamic sparse-checkout because the same list is also
> > used to exclude any file/folder not listed.  That means any file not
> > listed won't ever be updated by git (like in 'checkout' for example) so
> > 'stale' files could be left in the working directory.  It also means git
> > won't find new/untracked files unless they are specifically added to the
> > list.
> 
> OK. I'm not at all interested in carrying maintenance burden for some
> software behind closed doors. I could see values in having a more
> flexible sparse checkout but this now seems like very tightly designed
> for GVFS. So unless there's another use case (preferably open source)
> for this, I don't think this should be added in git.git.

I should point out that VFS for Git is an open-source project and will
likely have larger use than just at Microsoft.  There are both Windows
and Mac clients and there are plans for a Linux client as well.
Ideally, it would work with an unmodified upstream Git, which is (I
assume) why Ben is sending this series.

Personally, I don't love the current name used in this series.  I don't
see this patch as introducing a virtual file system in the Unix sense of
that word, and I think calling it that in Git core will be confusing to
Unix users.  I would prefer to see it as a hook (maybe called
"sparse-checkout" or "sparse-exclude"; better names are okay), and
simply turn it on based on whether or not there's an appropriate hook
file there and whether core.sparseCheckout is on (or possibly with
hook.sparseExclude or something).  With a design more like that, I don't
see a problem with it in principle.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: git-rebase is ignoring working-tree-encoding

2018-11-04 Thread brian m. carlson
On Sun, Nov 04, 2018 at 05:37:09PM +0100, Adrián Gimeno Balaguer wrote:
> I wrote a "counterpart" easy fix which instead only prohibites BOM for
> the opposite endianness (for example if
> working-tree-encoding=UTF-16LE, then finding an UTF-16BE BOM in the
> file would cause Git to signal the error right before committing,
> diffing, etc.). That way user would be encouraged to modify the file's
> encoding to match the one specified in working-tree-encoding before
> allowing these actions, therefore preventing Git from encoding to the
> wrong endianness after file is written out. With few repository tests,
> this new behaviour worked as expected. But then I realized this
> solution would perhaps be unacceptable for Git's source code as it
> would violate that Unicode standard. Anyways, here is a PR in my Git
> fork with the changes I did, for reference:

I actually think such a solution (although I haven't looked at your
patch) would be fine, and I would encourage you to send it to the list.
It's my understanding that many people on Windows want to write things
in UTF-16 encoding but only little-endian with a BOM.  Allowing them to
write that, even if Git won't be able to guarantee producing that, would
be fine, as long as the data is what we expect.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: git-rebase is ignoring working-tree-encoding

2018-11-04 Thread brian m. carlson
On Fri, Nov 02, 2018 at 03:30:17AM +0100, Adrián Gimeno Balaguer wrote:
> I’m attempting to perform fixups via git-rebase of UTF-16 LE files
> (the project I’m working on requires that exact encoding on certain
> files). When the rebase is complete, Git changes that file’s encoding
> to UTF-16 BE. I have been using the newer working-tree-encoding
> attribute in .gitattributes. I’m using Git for Windows.
> 
> $ git version
> git version 2.19.1.windows.1
> 
> Here is a sample UTF-16 LE file (with BOM and LF endings) with
> following atributes in .gitattributes:
> 
> test.txt eol=lf -text working-tree-encoding=UTF-16

Do things work for you if you write this as "UTF-16LE"?  When you use
working-tree-encoding, the file is stored internally as UTF-8, but it's
serialized to the specified encoding when written out.

Asking for "UTF-16" is ambiguous: there are two endiannesses, and so as
long as you get a BOM in the output, either one is an acceptable option.
Which one you get is dependent on what the underlying code thinks is the
default, and traditionally for Unix systems and Unix tools that's been
big-endian.  If you want a particular endianness, you should specify it.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 2/2] t/t7510-signed-commit.sh: add signing subkey to Eris Discordia key

2018-11-04 Thread brian m. carlson
On Sun, Nov 04, 2018 at 10:47:10AM +0100, Michał Górny wrote:
> diff --git a/t/t7510-signed-commit.sh b/t/t7510-signed-commit.sh
> index e8377286d..86d3f93fa 100755
> --- a/t/t7510-signed-commit.sh
> +++ b/t/t7510-signed-commit.sh
> @@ -197,9 +197,9 @@ test_expect_success GPG 'show bad signature with custom 
> format' '
>  test_expect_success GPG 'show untrusted signature with custom format' '
>   cat >expect <<-\EOF &&
>   U
> - 61092E85B7227189
> + 65A0EEA02E30CAD7
>   Eris Discordia 
> - D4BE22311AD3131E5EDA29A461092E85B7227189
> + F8364A59E07FFE9F4D63005A65A0EEA02E30CAD7
>   D4BE22311AD3131E5EDA29A461092E85B7227189
>   EOF
>   git log -1 --format="%G?%n%GK%n%GS%n%GF%n%GP" eighth-signed-alt >actual 
> &&
> @@ -209,7 +209,7 @@ test_expect_success GPG 'show untrusted signature with 
> custom format' '
>  test_expect_success GPG 'show unknown signature with custom format' '
>   cat >expect <<-\EOF &&
>   E
> - 61092E85B7227189
> + 65A0EEA02E30CAD7

It's my understanding that GnuPG will use the most recent subkey
suitable for a particular purpose, and I think the test relies on that
behavior.  However, I'm not sure that's documented.  Do we want to rely
on that behavior or be more explicit?  (This is a question, not an
opinion.)
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Git Slowness on Windows w/o Internet

2018-11-03 Thread brian m. carlson
On Fri, Nov 02, 2018 at 11:10:51AM -0500, Peter Kostyukov wrote:
> Wanted to bring to your attention an issue that we discovered on our
> Windows Jenkins nodes with git scm installed (git.exe). Our Jenkins
> servers don't have Internet access. It appears that git.exe is trying
> to connect to various Cloudflare and Akamai CDN instances over the
> Internet when it first runs and it keeps trying to connect to these
> CDNs every git.exe execution until it makes a successful attempt. See
> the screenshot attached with the details.
> 
> Enabling Internet access via proxy fixes the issue and git.exe
> continues to work fast on the next attempts to run git.exe
> 
> Is there any configuration setting that can disable this git's
> behavior or is there any other workaround without allowing Internet
> access? Otherwise, every git command run on a server without the
> Internet takes about 30 seconds to complete.

Git itself doesn't make any attempt to access those systems unless it's
configured to do so (e.g. a remote is set up to talk to those systems
and fetch or pull is used).

It's possible that you're using a distribution package that performs
this behavior, say, to check for updates.  I'd recommend that you
contact the distributor, which in this case might be Git for Windows,
and see if they can tell you more about what's going on.  The URL for
that project is at https://github.com/git-for-windows/git.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH] send-email: Avoid empty transfer encoding header

2018-11-01 Thread brian m. carlson
On Thu, Nov 01, 2018 at 09:09:34PM -0400, Aaron Lindsay wrote:
> Fix a small bug introduced by "7a36987ff (send-email: add an auto option
> for transfer encoding, 2018-07-14)
> 
> I saw the following message when setting --transfer-encoding for a file
> with the same encoding:
> $ git send-email --transfer-encoding=8bit example.patch
> Use of uninitialized value $xfer_encoding in concatenation (.) or string
> at /usr/lib/git-core/git-send-email line 1744.
> 
> Signed-off-by: Aaron Lindsay 
> ---
>  git-send-email.perl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/git-send-email.perl b/git-send-email.perl
> index 2be5dac33..39c15bcc8 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -1834,7 +1834,7 @@ sub apply_transfer_encoding {
>   my $from = shift;
>   my $to = shift;
>  
> - return $message if ($from eq $to and $from ne '7bit');
> + return ($message, $to) if ($from eq $to and $from ne '7bit');

Thanks, this is obviously correct.

Would you like to squash in the below patch to add a test?

- >% -
From 00000000 Mon Sep 17 00:00:00 2001
From: "brian m. carlson" 
Date: Fri, 2 Nov 2018 01:51:33 +
Subject: [PATCH] squash! send-email: Avoid empty transfer encoding header

Signed-off-by: brian m. carlson 
---
 t/t9001-send-email.sh | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 1ef1a19003..ee1efcc59d 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -492,6 +492,21 @@ do
--validate \
$patches longline.patch
'
+
+done
+
+for enc in 7bit 8bit quoted-printable base64
+do
+   test_expect_success $PREREQ "--transfer-encoding=$enc produces correct 
header" '
+   clean_fake_sendmail &&
+   git send-email \
+   --from="Example " \
+   --to=nob...@example.com \
+   --smtp-server="$(pwd)/fake.sendmail" \
+   --transfer-encoding=$enc \
+   $patches &&
+       grep "Content-Transfer-Encoding: $enc" msgtxt1
+   '
 done
 
 test_expect_success $PREREQ 'Invalid In-Reply-To' '
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: using --force-with-lease after git rebase

2018-10-31 Thread brian m. carlson
On Wed, Oct 31, 2018 at 12:13:06PM -0700, Alexander Mills wrote:
> I have been confused about the need for --force-with-lease after rebasing
> 
> Imagine I have a feature branch:
> 
> git checkout --no-track -b 'feature' 'origin/dev'
> git push -u origin feature
> 
> I do some work, and then I rebase against origin/dev to keep up to
> date with the integration branch.
> 
> git fetch origin/dev
> git rebase origin/dev
> 
> then I try to push to the remote
> 
> git push origin feature
> 
> but that is rejected, I have to do:
> 
> git push --force-with-lease origin feature
> 
> why is that? Why do I need to force push my feature branch to the
> remote tracking branch after rebasing against the integration branch?

When you perform a push to an existing branch, Git checks to make sure
that the push is a fast-forward; that is, that your changes are a strict
superset of the commits that are in that branch.  When you rebase a
branch, you rewrite the commits and their object IDs (their hashes).
Since you've rewritten some of the commits that were on the version of
your branch that you pushed to the server, the commits you want to push
are no longer a strict superset, and Git requires a force push.

This provision is in place to prevent data loss.  For example, if you
and a colleague are both collaborating on a branch that isn't rebased,
you would want to avoid pushing changes that overwrote those that your
colleague had made.  Git making your push fail is its way of helping
you realize that there are changes you have not included and making you
decide how you want to handle them.

You strictly do not need to use --force-with-lease; you could just use
--force instead.  But using --force-with-lease over --force when
possible ensures that you are overwriting what you think you're
overwriting, and not additional working changes that someone else has
made, so it's a good practice.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [BUG?] protocol.version=2 sends HTTP "Expect" headers

2018-10-31 Thread brian m. carlson
On Wed, Oct 31, 2018 at 12:03:53PM -0400, Jeff King wrote:
> Since 959dfcf42f (smart-http: Really never use Expect: 100-continue,
> 2011-03-14), we try to avoid sending "Expect" headers, since some
> proxies apparently don't handle them well. There we have to explicitly
> tell curl not to use them.
> 
> The exception is large requests with GSSAPI, as explained in c80d96ca0c
> (remote-curl: fix large pushes with GSSAPI, 2013-10-31).
> 
> However, Jon Simons noticed that when using protocol.version=2, we've
> started sending Expect headers again. That's because rather than going
> through post_rpc(), we push the stateless data through a proxy_state
> struct. And in proxy_state_init(), when we set up the headers, we do not
> disable curl's Expect handling.
> 
> So a few questions:
> 
>   - is this a bug or not? I.e., do we still need to care about proxies
> that can't handle Expect? The original commit was from 2011. Maybe
> things are better now. Or maybe that's blind optimism.
> 
> Nobody has complained yet, but that's probably just because v2 isn't
> widely deployed yet.

HTTP/1.1 requires support for 100 Continue on the server side and in
proxies (it is mandatory to implement).  The original commit disabling
it (959dfcf42f ("smart-http: Really never use Expect: 100-continue",
2011-03-14)), describes proxies as the reason for disabling it.

It's my understanding that all major proxies (including, as of version
3.2, Squid) support HTTP/1.1 at this point.  Moreover, Kerberos is more
likely to be used in enterprises, where proxies (especially poorly
behaved and outright broken proxies) are more common.  We haven't seen
any complaints about Kerberos support in a long time.  This leads me to
believe that things generally work[0].

Finally, modern versions of libcurl also have a timeout after which they
assume that the server is not going to respond and just send the request
body anyways.  This makes the issue mostly moot.

>   - alternatively, should we just leave it on for v2, and provide a
> config switch to disable it if you have a crappy proxy? I don't know
> how widespread the problem is, but I can imagine that the issue is
> subtle enough that most users wouldn't even know.

For the reasons I mentioned above, I'd leave it on for now.  Between
libcurl and better proxy support, I think this issue is mostly solved.
If we need to fix it in the future, we can, and people can fall back to
older protocols via config in the interim.

[0] In some environments, people use SSH because the proxy breaks
everything that looks like HTTP, but that's beyond the scope of this
discussion.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v4 10/12] Add a base implementation of SHA-256 support

2018-10-31 Thread brian m. carlson
On Mon, Oct 29, 2018 at 09:39:33AM +0900, Junio C Hamano wrote:
> "brian m. carlson"  writes:
> > This may not strictly be needed, but removing it makes the header no
> > longer self-contained, which blows up my (and others') in-editor
> > linting.
> 
> That sounds like bending backwards to please tools, though.  Can't
> these in-editor linting learn the local rules of the codebase they
> are asked to operate on?

Doing so involves writing (in my case) Vim code to configure settings
for this repo.

Since language linting tools invoke compilers and other language
runtimes, you need to specify command-line arguments to those tools, and
in general, that's not a safe thing you can read from the repository
configuration, since just viewing files should not be able to execute
arbitrary code[0].  Languages such as Perl which can execute arbitrary
code with compile checks have to be enabled explicitly with ALE (which
is what I'm using).

I need to do a reroll to address some other issues people brought up, so
I can remove this line.

[0] Pass "-o .bashrc" to the preprocessor, for example.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v4 10/12] Add a base implementation of SHA-256 support

2018-10-28 Thread brian m. carlson
On Wed, Oct 24, 2018 at 08:02:55PM -0700, Carlo Arenas wrote:
> On Wed, Oct 24, 2018 at 7:41 PM brian m. carlson
>  wrote:
> > diff --git a/sha256/block/sha256.h b/sha256/block/sha256.h
> > new file mode 100644
> > index 00..38f02f7e6c
> > --- /dev/null
> > +++ b/sha256/block/sha256.h
> > @@ -0,0 +1,26 @@
> > +#ifndef SHA256_BLOCK_SHA256_H
> > +#define SHA256_BLOCK_SHA256_H
> > +
> > +#include "git-compat-util.h"
> 
> this shouldn't be needed and might be discouraged as per the
> instructions in Documentation/CodingGuidelines

This may not strictly be needed, but removing it makes the header no
longer self-contained, which blows up my (and others') in-editor
linting.  I think it's okay to add this extra header here to keep it
self-contained, even if we know that it's not going to be absolutely
required.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 0/3] some more hdr-check clean headers

2018-10-28 Thread brian m. carlson
On Sat, Oct 27, 2018 at 02:47:47AM +0100, Ramsay Jones wrote:
> 
> I have some changes to the hdr-check Makefile target in the works, but
> these clean-ups don't have to be held up by those changes.
> 
> The 'fetch-object.h' patch is the same one I sent separately last time,
> since it only applied on 'next' at the time. The 'ewok_rlw.h' patch is
> just something I noticed while messing around the the Makefile changes.
> The 'commit-reach.h' patch is the patch #9 from the original series, but
> now with a commit message that explains the '#include "commit.h"' issue.
> 
> These changes are on top of current master (@c670b1f876) and merge
> without conflict to 'next' and with a 'minor' conflict on 'pu'.

All three of these patches look sane to me.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Git Test Coverage Report (Saturday, Oct 27)

2018-10-27 Thread brian m. carlson
On Sat, Oct 27, 2018 at 09:27:21AM -0400, Derrick Stolee wrote:
> In an effort to ensure new code is reasonably covered by the test suite, we
> now have contrib/coverage-diff.sh to combine the gcov output from 'make
> coverage-test ; make coverage-report' with the output from 'git diff A B' to
> discover _new_lines of code that are not covered. This report ignores lines
> including "BUG(".

Thanks for producing this.

> Commits introducing uncovered code:
> brian m. carlson  2f90b9d9b: sha1-file: provide functions to look up
> hash algorithms
> brian m. carlson  b3a41547c: hex: introduce functions to print arbitrary
> hashes

These two are expected.  One this series makes its way to master, I'll
send patches that use it in a bunch more places, and these functions
will be adequately covered by multiple code paths.

> Uncovered in next not in master
> ---
> 
> apply.c
> eccb5a5f3d 4071) return get_oid_hex(p->old_oid_prefix, oid);

This one is just a name change.  However, I think it's interesting that
this code path isn't covered in the normal case (and presumably isn't
covered before my patch).  From the comment, this appears to be limited
to the case where the index line in the patch contains the full object
ID.

I'll try to see if I can come up with a test to cover this case.

> builtin/repack.c
> 2f0c9e9a9b 239) die("repack: Expecting full hex object ID lines only from
> pack-objects.");
> 2f0c9e9a9b 411) die("repack: Expecting full hex object ID lines only from
> pack-objects.");

These are a change solely in text from what was there before.
Considering that git pack-objects would have to be modified to produce
invalid data in order to trigger these (as discussed in the series that
changed them), I think it's okay that these are uncovered.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


[PATCH v4 09/12] commit-graph: convert to using the_hash_algo

2018-10-24 Thread brian m. carlson
Instead of using hard-coded constants for object sizes, use
the_hash_algo to look them up.  In addition, use a function call to look
up the object ID version and produce the correct value.  For now, we use
version 1, which means to use the default algorithm used in the rest of
the repository.

Signed-off-by: brian m. carlson 
---
 commit-graph.c | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..6763d19288 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -23,16 +23,11 @@
 #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
 #define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
 
-#define GRAPH_DATA_WIDTH 36
+#define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
 
 #define GRAPH_VERSION_1 0x1
 #define GRAPH_VERSION GRAPH_VERSION_1
 
-#define GRAPH_OID_VERSION_SHA1 1
-#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ
-#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
-#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
-
 #define GRAPH_OCTOPUS_EDGES_NEEDED 0x8000
 #define GRAPH_PARENT_MISSING 0x7fff
 #define GRAPH_EDGE_LAST_MASK 0x7fff
@@ -44,13 +39,18 @@
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
-   + GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
+   + GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
 
 char *get_commit_graph_filename(const char *obj_dir)
 {
return xstrfmt("%s/info/commit-graph", obj_dir);
 }
 
+static uint8_t oid_version(void)
+{
+   return 1;
+}
+
 static struct commit_graph *alloc_commit_graph(void)
 {
struct commit_graph *g = xcalloc(1, sizeof(*g));
@@ -125,15 +125,15 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
}
 
hash_version = *(unsigned char*)(data + 5);
-   if (hash_version != GRAPH_OID_VERSION) {
+   if (hash_version != oid_version()) {
error(_("hash version %X does not match version %X"),
- hash_version, GRAPH_OID_VERSION);
+ hash_version, oid_version());
goto cleanup_fail;
}
 
graph = alloc_commit_graph();
 
-   graph->hash_len = GRAPH_OID_LEN;
+   graph->hash_len = the_hash_algo->rawsz;
graph->num_chunks = *(unsigned char*)(data + 6);
graph->graph_fd = fd;
graph->data = graph_map;
@@ -149,7 +149,7 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
 
chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 
-   if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
+   if (chunk_offset > graph_size - the_hash_algo->rawsz) {
error(_("improper chunk offset %08x%08x"), 
(uint32_t)(chunk_offset >> 32),
  (uint32_t)chunk_offset);
goto cleanup_fail;
@@ -764,6 +764,7 @@ void write_commit_graph(const char *obj_dir,
int num_extra_edges;
struct commit_list *parent;
struct progress *progress = NULL;
+   const unsigned hashsz = the_hash_algo->rawsz;
 
if (!commit_graph_compatible(the_repository))
return;
@@ -909,7 +910,7 @@ void write_commit_graph(const char *obj_dir,
hashwrite_be32(f, GRAPH_SIGNATURE);
 
hashwrite_u8(f, GRAPH_VERSION);
-   hashwrite_u8(f, GRAPH_OID_VERSION);
+   hashwrite_u8(f, oid_version());
hashwrite_u8(f, num_chunks);
hashwrite_u8(f, 0); /* unused padding byte */
 
@@ -924,8 +925,8 @@ void write_commit_graph(const char *obj_dir,
 
chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
-   chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
-   chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
+   chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
+   chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
 
for (i = 0; i <= num_chunks; i++) {
@@ -938,8 +939,8 @@ void write_commit_graph(const char *obj_dir,
}
 
write_graph_chunk_fanout(f, commits.list, commits.nr);
-   write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
-   write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
+   write_graph_chunk_oids(f, hashsz, commits.list, commits.nr);
+   write_graph_chunk_data(f, hashsz, commits.list, commits.nr);
write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
close_commit_graph(the_repository);


[PATCH v4 01/12] sha1-file: rename algorithm to "sha1"

2018-10-24 Thread brian m. carlson
The transition plan anticipates us using a syntax such as "^{sha1}" for
disambiguation.  Since this is a syntax some people will be typing a
lot, it makes sense to provide a short, easy-to-type syntax.  Omitting
the dash doesn't create any ambiguity; however, it does make the syntax
shorter and easier to type, especially for touch typists.  In addition,
the transition plan already uses "sha1" in this context.

Rename the name of SHA-1 implementation to "sha1".

Note that this change creates no backwards compatibility concerns, since
we haven't yet used this field in any configuration settings.

Signed-off-by: brian m. carlson 
---
 sha1-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1-file.c b/sha1-file.c
index dd0b6aa873..91311ebb3d 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -97,7 +97,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
NULL,
},
{
-   "sha-1",
+   "sha1",
/* "sha1", big-endian */
0x73686131,
GIT_SHA1_RAWSZ,


[PATCH v4 11/12] sha256: add an SHA-256 implementation using libgcrypt

2018-10-24 Thread brian m. carlson
Generally, one gets better performance out of cryptographic routines
written in assembly than C, and this is also true for SHA-256.  In
addition, most Linux distributions cannot distribute Git linked against
OpenSSL for licensing reasons.

Most systems with GnuPG will also have libgcrypt, since it is a
dependency of GnuPG.  libgcrypt is also faster than the SHA1DC
implementation for messages of a few KiB and larger.

For comparison, on a Core i7-6600U, this implementation processes 16 KiB
chunks at 355 MiB/s while SHA1DC processes equivalent chunks at 337
MiB/s.

In addition, libgcrypt is licensed under the LGPL 2.1, which is
compatible with the GPL.  Add an implementation of SHA-256 that uses
libgcrypt.

Signed-off-by: brian m. carlson 
---
 Makefile| 13 +++--
 hash.h  |  4 
 sha256/gcrypt.h | 30 ++
 3 files changed, 45 insertions(+), 2 deletions(-)
 create mode 100644 sha256/gcrypt.h

diff --git a/Makefile b/Makefile
index e99b7712f6..5a07e03100 100644
--- a/Makefile
+++ b/Makefile
@@ -179,6 +179,10 @@ all::
 # in one call to the platform's SHA1_Update(). e.g. APPLE_COMMON_CRYPTO
 # wants 'SHA1_MAX_BLOCK_SIZE=1024L*1024L*1024L' defined.
 #
+# Define BLK_SHA256 to use the built-in SHA-256 routines.
+#
+# Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1634,8 +1638,13 @@ endif
 endif
 endif
 
-LIB_OBJS += sha256/block/sha256.o
-BASIC_CFLAGS += -DSHA256_BLK
+ifdef GCRYPT_SHA256
+   BASIC_CFLAGS += -DSHA256_GCRYPT
+   EXTLIBS += -lgcrypt
+else
+   LIB_OBJS += sha256/block/sha256.o
+   BASIC_CFLAGS += -DSHA256_BLK
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index a9bc624020..2ef098052d 100644
--- a/hash.h
+++ b/hash.h
@@ -15,7 +15,11 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#if defined(SHA256_GCRYPT)
+#include "sha256/gcrypt.h"
+#else
 #include "sha256/block/sha256.h"
+#endif
 
 #ifndef platform_SHA_CTX
 /*
diff --git a/sha256/gcrypt.h b/sha256/gcrypt.h
new file mode 100644
index 00..09bd8bb200
--- /dev/null
+++ b/sha256/gcrypt.h
@@ -0,0 +1,30 @@
+#ifndef SHA256_GCRYPT_H
+#define SHA256_GCRYPT_H
+
+#include 
+
+#define SHA256_DIGEST_SIZE 32
+
+typedef gcry_md_hd_t gcrypt_SHA256_CTX;
+
+inline void gcrypt_SHA256_Init(gcrypt_SHA256_CTX *ctx)
+{
+   gcry_md_open(ctx, GCRY_MD_SHA256, 0);
+}
+
+inline void gcrypt_SHA256_Update(gcrypt_SHA256_CTX *ctx, const void *data, 
size_t len)
+{
+   gcry_md_write(*ctx, data, len);
+}
+
+inline void gcrypt_SHA256_Final(unsigned char *digest, gcrypt_SHA256_CTX *ctx)
+{
+   memcpy(digest, gcry_md_read(*ctx, GCRY_MD_SHA256), SHA256_DIGEST_SIZE);
+}
+
+#define platform_SHA256_CTX gcrypt_SHA256_CTX
+#define platform_SHA256_Init gcrypt_SHA256_Init
+#define platform_SHA256_Update gcrypt_SHA256_Update
+#define platform_SHA256_Final gcrypt_SHA256_Final
+
+#endif


[PATCH v4 08/12] t/helper: add a test helper to compute hash speed

2018-10-24 Thread brian m. carlson
Add a utility (which is less for the testsuite and more for developers)
that can compute hash speeds for whatever hash algorithms are
implemented.  This allows developers to test their personal systems to
determine the performance characteristics of various algorithms.

Signed-off-by: brian m. carlson 
---
 Makefile   |  1 +
 t/helper/test-hash-speed.c | 61 ++
 t/helper/test-tool.c   |  1 +
 t/helper/test-tool.h   |  1 +
 4 files changed, 64 insertions(+)
 create mode 100644 t/helper/test-hash-speed.c

diff --git a/Makefile b/Makefile
index 81dc9ac819..68169a7abb 100644
--- a/Makefile
+++ b/Makefile
@@ -716,6 +716,7 @@ TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
 TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
+TEST_BUILTINS_OBJS += test-hash-speed.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
 TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o
diff --git a/t/helper/test-hash-speed.c b/t/helper/test-hash-speed.c
new file mode 100644
index 00..432233c7f0
--- /dev/null
+++ b/t/helper/test-hash-speed.c
@@ -0,0 +1,61 @@
+#include "test-tool.h"
+#include "cache.h"
+
+#define NUM_SECONDS 3
+
+static inline void compute_hash(const struct git_hash_algo *algo, git_hash_ctx 
*ctx, uint8_t *final, const void *p, size_t len)
+{
+   algo->init_fn(ctx);
+   algo->update_fn(ctx, p, len);
+   algo->final_fn(final, ctx);
+}
+
+int cmd__hash_speed(int ac, const char **av)
+{
+   git_hash_ctx ctx;
+   unsigned char hash[GIT_MAX_RAWSZ];
+   clock_t initial, start, end;
+   unsigned bufsizes[] = { 64, 256, 1024, 8192, 16384 };
+   int i;
+   void *p;
+   const struct git_hash_algo *algo = NULL;
+
+   if (ac == 2) {
+   for (i = 1; i < GIT_HASH_NALGOS; i++) {
+   if (!strcmp(av[1], hash_algos[i].name)) {
+   algo = _algos[i];
+   break;
+   }
+   }
+   }
+   if (!algo)
+   die("usage: test-tool hash-speed algo_name");
+
+   /* Use this as an offset to make overflow less likely. */
+   initial = clock();
+
+   printf("algo: %s\n", algo->name);
+
+   for (i = 0; i < ARRAY_SIZE(bufsizes); i++) {
+   unsigned long j, kb;
+   double kb_per_sec;
+   p = xcalloc(1, bufsizes[i]);
+   start = end = clock() - initial;
+   for (j = 0; ((end - start) / CLOCKS_PER_SEC) < NUM_SECONDS; 
j++) {
+   compute_hash(algo, , hash, p, bufsizes[i]);
+
+   /*
+* Only check elapsed time every 128 iterations to avoid
+* dominating the runtime with system calls.
+*/
+   if (!(j & 127))
+   end = clock() - initial;
+   }
+   kb = j * bufsizes[i];
+   kb_per_sec = kb / (1024 * ((double)end - start) / 
CLOCKS_PER_SEC);
+   printf("size %u: %lu iters; %lu KiB; %0.2f KiB/s\n", 
bufsizes[i], j, kb, kb_per_sec);
+   free(p);
+   }
+
+   exit(0);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 6b5836dc1b..e009c8186d 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -20,6 +20,7 @@ static struct test_cmd cmds[] = {
{ "example-decorate", cmd__example_decorate },
{ "genrandom", cmd__genrandom },
{ "hashmap", cmd__hashmap },
+   { "hash-speed", cmd__hash_speed },
{ "index-version", cmd__index_version },
{ "json-writer", cmd__json_writer },
{ "lazy-init-name-hash", cmd__lazy_init_name_hash },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 29ac7b0b0d..19a7e8332a 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -16,6 +16,7 @@ int cmd__dump_untracked_cache(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
 int cmd__hashmap(int argc, const char **argv);
+int cmd__hash_speed(int argc, const char **argv);
 int cmd__index_version(int argc, const char **argv);
 int cmd__json_writer(int argc, const char **argv);
 int cmd__lazy_init_name_hash(int argc, const char **argv);


[PATCH v4 00/12] Base SHA-256 implementation

2018-10-24 Thread brian m. carlson
This series provides a functional SHA-256 implementation and wires it
up, along with some housekeeping patches to make it suitable for
testing.

While I was fixing the macros, I wondered if I could make the code a bit
cleaner by using inline functions.  I tried it and found that not only
did it make the code cleaner, it made the code significantly faster
across all sizes of output, with larger gains on larger chunks (e.g.,
214 MiB/s on 16 KiB chunks vs 151 MiB/s).  I'm unsure why this effect
occurs, but I figured nobody would complain about improved performance.

Changes from v3:
* Switch to using inline functions instead of macros in many cases.
* Undefine remaining macros at the top.

Changes from v2:
* Improve commit messages to include timing and performance information.
* Improve commit messages to be less ambiguous and more friendly to a
  wider variety of English speakers.
* Prefer functions taking struct git_hash_algo in hex.c.
* Port pieces of the block-sha1 implementation over to the block-sha256
  implementation for better compatibility.
* Drop patch 13 in favor of further discussion about the best way
  forward for versioning commit graph.
* Rename the test so as to have a different number from other tests.
* Rebase on master.

Changes from v1:
* Add a hash_to_hex function mirroring sha1_to_hex, but for
  the_hash_algo.
* Strip commit message explanation about why we chose SHA-256.
* Rebase on master
* Strip leading whitespace from commit message.
* Improve commit-graph patch to cover new code added since v1.
* Be more honest about the scope of work involved in porting the SHA-256
  implementation out of libtomcrypt.
* Revert change to limit hashcmp to 20 bytes.

brian m. carlson (12):
  sha1-file: rename algorithm to "sha1"
  sha1-file: provide functions to look up hash algorithms
  hex: introduce functions to print arbitrary hashes
  cache: make hashcmp and hasheq work with larger hashes
  t: add basic tests for our SHA-1 implementation
  t: make the sha1 test-tool helper generic
  sha1-file: add a constant for hash block size
  t/helper: add a test helper to compute hash speed
  commit-graph: convert to using the_hash_algo
  Add a base implementation of SHA-256 support
  sha256: add an SHA-256 implementation using libgcrypt
  hash: add an SHA-256 implementation using OpenSSL

 Makefile  |  22 +++
 cache.h   |  51 ---
 commit-graph.c|  33 +++--
 hash.h|  41 +-
 hex.c |  32 +++-
 sha1-file.c   |  70 -
 sha256/block/sha256.c | 201 ++
 sha256/block/sha256.h |  26 
 sha256/gcrypt.h   |  30 
 t/helper/test-hash-speed.c|  61 
 t/helper/{test-sha1.c => test-hash.c} |  19 +--
 t/helper/test-sha1.c  |  52 +--
 t/helper/test-sha256.c|   7 +
 t/helper/test-tool.c  |   2 +
 t/helper/test-tool.h  |   4 +
 t/t0015-hash.sh   |  54 +++
 16 files changed, 601 insertions(+), 104 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 sha256/gcrypt.h
 create mode 100644 t/helper/test-hash-speed.c
 copy t/helper/{test-sha1.c => test-hash.c} (65%)
 create mode 100644 t/helper/test-sha256.c
 create mode 100755 t/t0015-hash.sh

Range-diff against v3:
 1:  a004a4c982 <  -:  -- :hash-impl
 2:  cf9f7f5620 =  1:  cf9f7f5620 sha1-file: rename algorithm to "sha1"
 3:  0144deaebe =  2:  0144deaebe sha1-file: provide functions to look up hash 
algorithms
 4:  b74858fb03 =  3:  b74858fb03 hex: introduce functions to print arbitrary 
hashes
 5:  e9703017a4 =  4:  e9703017a4 cache: make hashcmp and hasheq work with 
larger hashes
 6:  ab85a834fd =  5:  ab85a834fd t: add basic tests for our SHA-1 
implementation
 7:  962f6d8903 =  6:  962f6d8903 t: make the sha1 test-tool helper generic
 8:  53addf4d58 =  7:  53addf4d58 sha1-file: add a constant for hash block size
 9:  9ace10faa2 =  8:  9ace10faa2 t/helper: add a test helper to compute hash 
speed
10:  9adc56d01e =  9:  9adc56d01e commit-graph: convert to using the_hash_algo
11:  8e82cb0dfb ! 10:  f48cb1ad27 Add a base implementation of SHA-256 support
@@ -207,6 +207,9 @@
 +#include "git-compat-util.h"
 +#include "./sha256.h"
 +
++#undef RND
++#undef BLKSIZE
++
 +#define BLKSIZE blk_SHA256_BLKSIZE
 +
 +void blk_SHA256_Init(blk_SHA256_CTX *ctx)
@@ -228,14 +231,35 @@
 +  return (x >> n) | (x << (32 - n));
 +}
 +
-+#define Ch(x,y,z)   (z ^ (x & (y ^ z)))
-+#define Maj(x,y,z)  (((x | y) & z) | (x & y))
-+#define S(x, n) ror((x),(n))
-+#define R(x, n) ((x)>>(n))
-

[PATCH v4 10/12] Add a base implementation of SHA-256 support

2018-10-24 Thread brian m. carlson
SHA-1 is weak and we need to transition to a new hash function.  For
some time, we have referred to this new function as NewHash.  Recently,
we decided to pick SHA-256 as NewHash.

Add a basic implementation of SHA-256 based off libtomcrypt, which is in
the public domain.  Optimize it and restructure it to meet our coding
standards.  Pull in the update and final functions from the SHA-1 block
implementation, as we know these function correctly with all compilers.
This implementation is slower than SHA-1, but more performant
implementations will be introduced in future commits.

Wire up SHA-256 in the list of hash algorithms, and add a test that the
algorithm works correctly.

Note that with this patch, it is still not possible to switch to using
SHA-256 in Git.  Additional patches are needed to prepare the code to
handle a larger hash algorithm and further test fixes are needed.

Signed-off-by: brian m. carlson 
---
 Makefile   |   4 +
 cache.h|  12 ++-
 hash.h |  19 +++-
 sha1-file.c|  45 +
 sha256/block/sha256.c  | 201 +
 sha256/block/sha256.h  |  26 ++
 t/helper/test-sha256.c |   7 ++
 t/helper/test-tool.c   |   1 +
 t/helper/test-tool.h   |   1 +
 t/t0015-hash.sh|  25 +
 10 files changed, 337 insertions(+), 4 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 t/helper/test-sha256.c

diff --git a/Makefile b/Makefile
index 68169a7abb..e99b7712f6 100644
--- a/Makefile
+++ b/Makefile
@@ -739,6 +739,7 @@ TEST_BUILTINS_OBJS += test-run-command.o
 TEST_BUILTINS_OBJS += test-scrap-cache-tree.o
 TEST_BUILTINS_OBJS += test-sha1.o
 TEST_BUILTINS_OBJS += test-sha1-array.o
+TEST_BUILTINS_OBJS += test-sha256.o
 TEST_BUILTINS_OBJS += test-sigchain.o
 TEST_BUILTINS_OBJS += test-strcmp-offset.o
 TEST_BUILTINS_OBJS += test-string-list.o
@@ -1633,6 +1634,9 @@ endif
 endif
 endif
 
+LIB_OBJS += sha256/block/sha256.o
+BASIC_CFLAGS += -DSHA256_BLK
+
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
BASIC_CFLAGS += -DSHA1_MAX_BLOCK_SIZE="$(SHA1_MAX_BLOCK_SIZE)"
diff --git a/cache.h b/cache.h
index 9e5d1dd85a..48ce1565e6 100644
--- a/cache.h
+++ b/cache.h
@@ -48,11 +48,17 @@ unsigned long git_deflate_bound(git_zstream *, unsigned 
long);
 /* The block size of SHA-1. */
 #define GIT_SHA1_BLKSZ 64
 
+/* The length in bytes and in hex digits of an object name (SHA-256 value). */
+#define GIT_SHA256_RAWSZ 32
+#define GIT_SHA256_HEXSZ (2 * GIT_SHA256_RAWSZ)
+/* The block size of SHA-256. */
+#define GIT_SHA256_BLKSZ 64
+
 /* The length in byte and in hex digits of the largest possible hash value. */
-#define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
-#define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+#define GIT_MAX_RAWSZ GIT_SHA256_RAWSZ
+#define GIT_MAX_HEXSZ GIT_SHA256_HEXSZ
 /* The largest possible block size for any supported hash. */
-#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
+#define GIT_MAX_BLKSZ GIT_SHA256_BLKSZ
 
 struct object_id {
unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 1bcf7ab6fd..a9bc624020 100644
--- a/hash.h
+++ b/hash.h
@@ -15,6 +15,8 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#include "sha256/block/sha256.h"
+
 #ifndef platform_SHA_CTX
 /*
  * platform's underlying implementation of SHA-1; could be OpenSSL,
@@ -34,6 +36,18 @@
 #define git_SHA1_Updateplatform_SHA1_Update
 #define git_SHA1_Final platform_SHA1_Final
 
+#ifndef platform_SHA256_CTX
+#define platform_SHA256_CTXSHA256_CTX
+#define platform_SHA256_Init   SHA256_Init
+#define platform_SHA256_Update SHA256_Update
+#define platform_SHA256_Final  SHA256_Final
+#endif
+
+#define git_SHA256_CTX platform_SHA256_CTX
+#define git_SHA256_Initplatform_SHA256_Init
+#define git_SHA256_Update  platform_SHA256_Update
+#define git_SHA256_Final   platform_SHA256_Final
+
 #ifdef SHA1_MAX_BLOCK_SIZE
 #include "compat/sha1-chunked.h"
 #undef git_SHA1_Update
@@ -52,12 +66,15 @@
 #define GIT_HASH_UNKNOWN 0
 /* SHA-1 */
 #define GIT_HASH_SHA1 1
+/* SHA-256  */
+#define GIT_HASH_SHA256 2
 /* Number of algorithms supported (including unknown). */
-#define GIT_HASH_NALGOS (GIT_HASH_SHA1 + 1)
+#define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
 
 /* A suitably aligned type for stack allocations of hash contexts. */
 union git_hash_ctx {
git_SHA_CTX sha1;
+   git_SHA256_CTX sha256;
 };
 typedef union git_hash_ctx git_hash_ctx;
 
diff --git a/sha1-file.c b/sha1-file.c
index 9bdd04ea45..c97d93a14a 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -40,10 +40,20 @@
 #define EMPTY_TREE_SHA1_BIN_LITERAL \
 "\x4b\x82\x5d\xc6\x42\xcb\x6e\xb9\xa0\x60" \
 "\xe5\x4b\xf8\xd6\x92\x88\xfb\xee\x49\x04"
+#define EMPTY_TREE_SHA256_BIN_LITERAL \
+   "\x6e\xf1\x9b\x41\x22\x5c\x53\x69\xf1\xc1" \
+   "\x04\xd4\x5d\x8d\x

[PATCH v4 06/12] t: make the sha1 test-tool helper generic

2018-10-24 Thread brian m. carlson
Since we're going to have multiple hash algorithms to test, it makes
sense to share as much of the test code as possible.  Convert the sha1
helper for the test-tool to be generic and move it out into its own
module.  This will allow us to share most of this code with our NewHash
implementation.

Signed-off-by: brian m. carlson 
---
 Makefile  |  1 +
 t/helper/{test-sha1.c => test-hash.c} | 19 +-
 t/helper/test-sha1.c  | 52 +--
 t/helper/test-tool.h  |  2 ++
 4 files changed, 14 insertions(+), 60 deletions(-)
 copy t/helper/{test-sha1.c => test-hash.c} (65%)

diff --git a/Makefile b/Makefile
index d18ab0fe78..81dc9ac819 100644
--- a/Makefile
+++ b/Makefile
@@ -714,6 +714,7 @@ TEST_BUILTINS_OBJS += test-dump-split-index.o
 TEST_BUILTINS_OBJS += test-dump-untracked-cache.o
 TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
+TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
diff --git a/t/helper/test-sha1.c b/t/helper/test-hash.c
similarity index 65%
copy from t/helper/test-sha1.c
copy to t/helper/test-hash.c
index 1ba0675c75..0a31de66f3 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-hash.c
@@ -1,13 +1,14 @@
 #include "test-tool.h"
 #include "cache.h"
 
-int cmd__sha1(int ac, const char **av)
+int cmd_hash_impl(int ac, const char **av, int algo)
 {
-   git_SHA_CTX ctx;
-   unsigned char sha1[20];
+   git_hash_ctx ctx;
+   unsigned char hash[GIT_MAX_HEXSZ];
unsigned bufsz = 8192;
int binary = 0;
char *buffer;
+   const struct git_hash_algo *algop = _algos[algo];
 
if (ac == 2) {
if (!strcmp(av[1], "-b"))
@@ -26,7 +27,7 @@ int cmd__sha1(int ac, const char **av)
die("OOPS");
}
 
-   git_SHA1_Init();
+   algop->init_fn();
 
while (1) {
ssize_t sz, this_sz;
@@ -38,20 +39,20 @@ int cmd__sha1(int ac, const char **av)
if (sz == 0)
break;
if (sz < 0)
-   die_errno("test-sha1");
+   die_errno("test-hash");
this_sz += sz;
cp += sz;
room -= sz;
}
if (this_sz == 0)
break;
-   git_SHA1_Update(, buffer, this_sz);
+   algop->update_fn(, buffer, this_sz);
}
-   git_SHA1_Final(sha1, );
+   algop->final_fn(hash, );
 
if (binary)
-   fwrite(sha1, 1, 20, stdout);
+   fwrite(hash, 1, algop->rawsz, stdout);
else
-   puts(sha1_to_hex(sha1));
+   puts(hash_to_hex_algop(hash, algop));
exit(0);
 }
diff --git a/t/helper/test-sha1.c b/t/helper/test-sha1.c
index 1ba0675c75..d860c387c3 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-sha1.c
@@ -3,55 +3,5 @@
 
 int cmd__sha1(int ac, const char **av)
 {
-   git_SHA_CTX ctx;
-   unsigned char sha1[20];
-   unsigned bufsz = 8192;
-   int binary = 0;
-   char *buffer;
-
-   if (ac == 2) {
-   if (!strcmp(av[1], "-b"))
-   binary = 1;
-   else
-   bufsz = strtoul(av[1], NULL, 10) * 1024 * 1024;
-   }
-
-   if (!bufsz)
-   bufsz = 8192;
-
-   while ((buffer = malloc(bufsz)) == NULL) {
-   fprintf(stderr, "bufsz %u is too big, halving...\n", bufsz);
-   bufsz /= 2;
-   if (bufsz < 1024)
-   die("OOPS");
-   }
-
-   git_SHA1_Init();
-
-   while (1) {
-   ssize_t sz, this_sz;
-   char *cp = buffer;
-   unsigned room = bufsz;
-   this_sz = 0;
-   while (room) {
-   sz = xread(0, cp, room);
-   if (sz == 0)
-   break;
-   if (sz < 0)
-   die_errno("test-sha1");
-   this_sz += sz;
-   cp += sz;
-   room -= sz;
-   }
-   if (this_sz == 0)
-   break;
-   git_SHA1_Update(, buffer, this_sz);
-   }
-   git_SHA1_Final(sha1, );
-
-   if (binary)
-   fwrite(sha1, 1, 20, stdout);
-   else
-   puts(sha1_to_hex(sha1));
-   exit(0);
+   return cmd_hash_impl(ac, av, GIT_HASH_SHA1);
 }
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index e4890566da..29ac7b0b0d 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -50

[PATCH v4 05/12] t: add basic tests for our SHA-1 implementation

2018-10-24 Thread brian m. carlson
We have in the past had some unfortunate endianness issues with some
SHA-1 implementations we ship, especially on big-endian machines.  Add
an explicit test using the test helper to catch these issues and point
them out prominently.  This test can also be used as a staging ground
for people testing additional algorithms to verify that their
implementations are working as expected.

Signed-off-by: brian m. carlson 
---
 t/t0015-hash.sh | 29 +
 1 file changed, 29 insertions(+)
 create mode 100755 t/t0015-hash.sh

diff --git a/t/t0015-hash.sh b/t/t0015-hash.sh
new file mode 100755
index 00..8e763c2c3d
--- /dev/null
+++ b/t/t0015-hash.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+test_description='test basic hash implementation'
+. ./test-lib.sh
+
+
+test_expect_success 'test basic SHA-1 hash values' '
+   test-tool sha1 actual &&
+   grep da39a3ee5e6b4b0d3255bfef95601890afd80709 actual &&
+   printf "a" | test-tool sha1 >actual &&
+   grep 86f7e437faa5a7fce15d1ddcb9eaeaea377667b8 actual &&
+   printf "abc" | test-tool sha1 >actual &&
+   grep a9993e364706816aba3e25717850c26c9cd0d89d actual &&
+   printf "message digest" | test-tool sha1 >actual &&
+   grep c12252ceda8be8994d5fa0290a47231c1d16aae3 actual &&
+   printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha1 >actual &&
+   grep 32d10c7b8cf96570ca04ce37f2a19d84240d3a89 actual &&
+   perl -E "for (1..10) { print q{aa}; }" | \
+   test-tool sha1 >actual &&
+   grep 34aa973cd4c4daa4f61eeb2bdbad27316534016f actual &&
+   printf "blob 0\0" | test-tool sha1 >actual &&
+   grep e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 actual &&
+   printf "blob 3\0abc" | test-tool sha1 >actual &&
+   grep f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f actual &&
+   printf "tree 0\0" | test-tool sha1 >actual &&
+   grep 4b825dc642cb6eb9a060e54bf8d69288fbee4904 actual
+'
+
+test_done


[PATCH v4 04/12] cache: make hashcmp and hasheq work with larger hashes

2018-10-24 Thread brian m. carlson
In 183a638b7d ("hashcmp: assert constant hash size", 2018-08-23), we
modified hashcmp to assert that the hash size was always 20 to help it
optimize and inline calls to memcmp.  In a future series, we replaced
many calls to hashcmp and oidcmp with calls to hasheq and oideq to
improve inlining further.

However, we want to support hash algorithms other than SHA-1, namely
SHA-256.  When doing so, we must handle the case where these values are
32 bytes long as well as 20.  Adjust hashcmp to handle two cases:
20-byte matches, and maximum-size matches.  Therefore, when we include
SHA-256, we'll automatically handle it properly, while at the same time
teaching the compiler that there are only two possible options to
consider.  This will allow the compiler to write the most efficient
possible code.

Copy similar code into hasheq and perform an identical transformation.
At least with GCC 8.2.0, making hasheq defer to hashcmp when there are
two branches prevents the compiler from inlining the comparison, while
the code in this patch is inlined properly.  Add a comment to avoid an
accidental performance regression from well-intentioned refactoring.

Signed-off-by: brian m. carlson 
---
 cache.h | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index 51580c4b77..bab8e8964f 100644
--- a/cache.h
+++ b/cache.h
@@ -1027,16 +1027,12 @@ extern const struct object_id null_oid;
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
/*
-* This is a temporary optimization hack. By asserting the size here,
-* we let the compiler know that it's always going to be 20, which lets
-* it turn this fixed-size memcmp into a few inline instructions.
-*
-* This will need to be extended or ripped out when we learn about
-* hashes of different sizes.
+* Teach the compiler that there are only two possibilities of hash size
+* here, so that it can optimize for this case as much as possible.
 */
-   if (the_hash_algo->rawsz != 20)
-   BUG("hash size not yet supported by hashcmp");
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
+   return memcmp(sha1, sha2, GIT_MAX_RAWSZ);
+   return memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)
@@ -1046,7 +1042,13 @@ static inline int oidcmp(const struct object_id *oid1, 
const struct object_id *o
 
 static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return !hashcmp(sha1, sha2);
+   /*
+* We write this here instead of deferring to hashcmp so that the
+* compiler can properly inline it and avoid calling memcmp.
+*/
+   if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
+   return !memcmp(sha1, sha2, GIT_MAX_RAWSZ);
+   return !memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oideq(const struct object_id *oid1, const struct object_id 
*oid2)


[PATCH v4 02/12] sha1-file: provide functions to look up hash algorithms

2018-10-24 Thread brian m. carlson
There are several ways we might refer to a hash algorithm: by name, such
as in the config file; by format ID, such as in a pack; or internally,
by a pointer to the hash_algos array.  Provide functions to look up hash
algorithms based on these various forms and return the internal constant
used for them.  If conversion to another form is necessary, this
internal constant can be used to look up the proper data in the
hash_algos array.

Signed-off-by: brian m. carlson 
---
 hash.h  | 13 +
 sha1-file.c | 21 +
 2 files changed, 34 insertions(+)

diff --git a/hash.h b/hash.h
index 7c8238bc2e..80881eea47 100644
--- a/hash.h
+++ b/hash.h
@@ -98,4 +98,17 @@ struct git_hash_algo {
 };
 extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS];
 
+/*
+ * Return a GIT_HASH_* constant based on the name.  Returns GIT_HASH_UNKNOWN if
+ * the name doesn't match a known algorithm.
+ */
+int hash_algo_by_name(const char *name);
+/* Identical, except based on the format ID. */
+int hash_algo_by_id(uint32_t format_id);
+/* Identical, except for a pointer to struct git_hash_algo. */
+static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
+{
+   return p - hash_algos;
+}
+
 #endif
diff --git a/sha1-file.c b/sha1-file.c
index 91311ebb3d..7e9dedc744 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -122,6 +122,27 @@ const char *empty_blob_oid_hex(void)
return oid_to_hex_r(buf, the_hash_algo->empty_blob);
 }
 
+int hash_algo_by_name(const char *name)
+{
+   int i;
+   if (!name)
+   return GIT_HASH_UNKNOWN;
+   for (i = 1; i < GIT_HASH_NALGOS; i++)
+   if (!strcmp(name, hash_algos[i].name))
+   return i;
+   return GIT_HASH_UNKNOWN;
+}
+
+int hash_algo_by_id(uint32_t format_id)
+{
+   int i;
+   for (i = 1; i < GIT_HASH_NALGOS; i++)
+   if (format_id == hash_algos[i].format_id)
+   return i;
+   return GIT_HASH_UNKNOWN;
+}
+
+
 /*
  * This is meant to hold a *small* number of objects that you would
  * want read_sha1_file() to be able to return, but yet you do not want


[PATCH v4 03/12] hex: introduce functions to print arbitrary hashes

2018-10-24 Thread brian m. carlson
Currently, we have functions that turn an arbitrary SHA-1 value or an
object ID into hex format, either using a static buffer or with a
user-provided buffer.  Add variants of these functions that can handle
an arbitrary hash algorithm, specified by constant.  Update the
documentation as well.

While we're at it, remove the "extern" declaration from this family of
functions, since it's not needed and our style now recommends against
it.

We use the variant taking the algorithm structure pointer as the
internal variant, since taking an algorithm pointer is the easiest way
to handle all of the variants in use.

Note that we maintain these functions because there are hashes which
must change based on the hash algorithm in use but are not object IDs
(such as pack checksums).

Signed-off-by: brian m. carlson 
---
 cache.h | 15 +--
 hex.c   | 32 
 2 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/cache.h b/cache.h
index 59c8a93046..51580c4b77 100644
--- a/cache.h
+++ b/cache.h
@@ -1364,9 +1364,9 @@ extern int get_oid_hex(const char *hex, struct object_id 
*sha1);
 extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
 
 /*
- * Convert a binary sha1 to its hex equivalent. The `_r` variant is reentrant,
+ * Convert a binary hash to its hex equivalent. The `_r` variant is reentrant,
  * and writes the NUL-terminated output to the buffer `out`, which must be at
- * least `GIT_SHA1_HEXSZ + 1` bytes, and returns a pointer to out for
+ * least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for
  * convenience.
  *
  * The non-`_r` variant returns a static buffer, but uses a ring of 4
@@ -1374,10 +1374,13 @@ extern int hex_to_bytes(unsigned char *binary, const 
char *hex, size_t len);
  *
  *   printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two));
  */
-extern char *sha1_to_hex_r(char *out, const unsigned char *sha1);
-extern char *oid_to_hex_r(char *out, const struct object_id *oid);
-extern char *sha1_to_hex(const unsigned char *sha1);   /* static buffer 
result! */
-extern char *oid_to_hex(const struct object_id *oid);  /* same static buffer 
as sha1_to_hex */
+char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const 
struct git_hash_algo *);
+char *sha1_to_hex_r(char *out, const unsigned char *sha1);
+char *oid_to_hex_r(char *out, const struct object_id *oid);
+char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo 
*);  /* static buffer result! */
+char *sha1_to_hex(const unsigned char *sha1);  
/* same static buffer */
+char *hash_to_hex(const unsigned char *hash);  
/* same static buffer */
+char *oid_to_hex(const struct object_id *oid); 
/* same static buffer */
 
 /*
  * Parse a 40-character hexadecimal object ID starting from hex, updating the
diff --git a/hex.c b/hex.c
index 10af1a29e8..d2e8bb9540 100644
--- a/hex.c
+++ b/hex.c
@@ -73,14 +73,15 @@ int parse_oid_hex(const char *hex, struct object_id *oid, 
const char **end)
return ret;
 }
 
-char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
+inline char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
+   const struct git_hash_algo *algop)
 {
static const char hex[] = "0123456789abcdef";
char *buf = buffer;
int i;
 
-   for (i = 0; i < the_hash_algo->rawsz; i++) {
-   unsigned int val = *sha1++;
+   for (i = 0; i < algop->rawsz; i++) {
+   unsigned int val = *hash++;
*buf++ = hex[val >> 4];
*buf++ = hex[val & 0xf];
}
@@ -89,20 +90,35 @@ char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
return buffer;
 }
 
-char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
 {
-   return sha1_to_hex_r(buffer, oid->hash);
+   return hash_to_hex_algop_r(buffer, sha1, _algos[GIT_HASH_SHA1]);
 }
 
-char *sha1_to_hex(const unsigned char *sha1)
+char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+{
+   return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo);
+}
+
+char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo 
*algop)
 {
static int bufno;
static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
-   return sha1_to_hex_r(hexbuffer[bufno], sha1);
+   return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop);
+}
+
+char *sha1_to_hex(const unsigned char *sha1)
+{
+   return hash_to_hex_algop(sha1, _algos[GIT_HASH_SHA1]);
+}
+
+char *hash_to_hex(const unsigned char *hash)
+{
+   return hash_to_hex_algop(hash, the_hash_algo);
 }
 
 char *oid_to_hex(const struct object_id *oid)
 {
-   return

[PATCH v4 12/12] hash: add an SHA-256 implementation using OpenSSL

2018-10-24 Thread brian m. carlson
We already have OpenSSL routines available for SHA-1, so add routines
for SHA-256 as well.

On a Core i7-6600U, this SHA-256 implementation compares favorably to
the SHA1DC SHA-1 implementation:

SHA-1: 157 MiB/s (64 byte chunks); 337 MiB/s (16 KiB chunks)
SHA-256: 165 MiB/s (64 byte chunks); 408 MiB/s (16 KiB chunks)

Signed-off-by: brian m. carlson 
---
 Makefile | 7 +++
 hash.h   | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/Makefile b/Makefile
index 5a07e03100..36fd3a149b 100644
--- a/Makefile
+++ b/Makefile
@@ -183,6 +183,8 @@ all::
 #
 # Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
 #
+# Define OPENSSL_SHA256 to use the SHA-256 routines in OpenSSL.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1638,6 +1640,10 @@ endif
 endif
 endif
 
+ifdef OPENSSL_SHA256
+   EXTLIBS += $(LIB_4_CRYPTO)
+   BASIC_CFLAGS += -DSHA256_OPENSSL
+else
 ifdef GCRYPT_SHA256
BASIC_CFLAGS += -DSHA256_GCRYPT
EXTLIBS += -lgcrypt
@@ -1645,6 +1651,7 @@ else
LIB_OBJS += sha256/block/sha256.o
BASIC_CFLAGS += -DSHA256_BLK
 endif
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index 2ef098052d..adde708cf2 100644
--- a/hash.h
+++ b/hash.h
@@ -17,6 +17,8 @@
 
 #if defined(SHA256_GCRYPT)
 #include "sha256/gcrypt.h"
+#elif defined(SHA256_OPENSSL)
+#include 
 #else
 #include "sha256/block/sha256.h"
 #endif


[PATCH v4 07/12] sha1-file: add a constant for hash block size

2018-10-24 Thread brian m. carlson
There is one place we need the hash algorithm block size: the HMAC code
for push certs.  Expose this constant in struct git_hash_algo and expose
values for SHA-1 and for the largest value of any hash.

Signed-off-by: brian m. carlson 
---
 cache.h | 4 
 hash.h  | 3 +++
 sha1-file.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/cache.h b/cache.h
index bab8e8964f..9e5d1dd85a 100644
--- a/cache.h
+++ b/cache.h
@@ -45,10 +45,14 @@ unsigned long git_deflate_bound(git_zstream *, unsigned 
long);
 /* The length in bytes and in hex digits of an object name (SHA-1 value). */
 #define GIT_SHA1_RAWSZ 20
 #define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
+/* The block size of SHA-1. */
+#define GIT_SHA1_BLKSZ 64
 
 /* The length in byte and in hex digits of the largest possible hash value. */
 #define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
 #define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+/* The largest possible block size for any supported hash. */
+#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
 
 struct object_id {
unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 80881eea47..1bcf7ab6fd 100644
--- a/hash.h
+++ b/hash.h
@@ -81,6 +81,9 @@ struct git_hash_algo {
/* The length of the hash in hex characters. */
size_t hexsz;
 
+   /* The block size of the hash. */
+   size_t blksz;
+
/* The hash initialization function. */
git_hash_init_fn init_fn;
 
diff --git a/sha1-file.c b/sha1-file.c
index 7e9dedc744..9bdd04ea45 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -90,6 +90,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
0x,
0,
0,
+   0,
git_hash_unknown_init,
git_hash_unknown_update,
git_hash_unknown_final,
@@ -102,6 +103,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
0x73686131,
GIT_SHA1_RAWSZ,
GIT_SHA1_HEXSZ,
+   GIT_SHA1_BLKSZ,
git_hash_sha1_init,
git_hash_sha1_update,
git_hash_sha1_final,


Re: [PATCH 2/3] mingw: replace MSVCRT's fstat() with a Win32-based implementation

2018-10-24 Thread brian m. carlson
On Wed, Oct 24, 2018 at 09:37:43AM +0200, Johannes Schindelin wrote:
> Hi brian,
> 
> On Wed, 24 Oct 2018, brian m. carlson wrote:
> > These lines strike me as a bit odd.  As far as I'm aware, Unix systems
> > don't return anything useful in this field when calling fstat on a pipe.
> > Is there a reason we fill this in on Windows?  If so, could the commit
> > message explain what that is?
> 
> AFAICT the idea was to imitate MSVCRT's fstat() in these cases.
> 
> But a quick web search suggests that you are right:
> https://bugzilla.redhat.com/show_bug.cgi?id=58768#c4 (I could not find any
> official documentation talking about fstat() and pipes, but I trust Alan
> to know their stuff).

Yeah, that behavior is quite old.  I'm surprised that Linux ever did
that.

> Do note, please, that according to the issue described in that link, at
> least *some* glibc/Linux combinations behave in exactly the way this patch
> implements it.
> 
> At this point, I am wary of changing this, too, as the code in question
> has been in production (read: tested thoroughly) in the current form for
> *years*, and I am really loathe to introduce a bug where even
> Windows-specific code in compat/ might rely on this behavior. (And no, I
> do not trust our test suite to find all of those use cases.)

I don't feel strongly either way.  I feel confident the rest of Git
doesn't use that field, so I don't see any downsides to keeping it other
than the slight overhead of populating it.  I just thought I'd ask in
case there was something important I was missing.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Add config option to make "--keep-index" the default for "git stash push"

2018-10-23 Thread brian m. carlson
On Mon, Oct 22, 2018 at 10:11:50AM +0200, Ralf Jung wrote:
> Hi all,
> 
> I have been repeatedly bitten by the default behavior of `git stash` to stash
> not just the unstaged but also the staged changes, and then `git stash pop`
> making all changes unstaged. This means `git stash && git stash pop`, per
> default, loses information, and when I just spent 10min carefully selecting
> hunks to be staged that's quite frustrating.
> 
> I found that for myself, I usually expect `git stash` to only stash the 
> changes
> that are not yet staged.  So I'd like to configure git such that 
> `--keep-index`
> is the default.  That would also fix the information loss I mentioned above.
> (By now I am also aware of `git stash pop --index`, but (a) that's not the
> default either and (b) its documentation indicates it might not always work 
> very
> well.)  However, going over the `git-config` man page, I have not found any 
> way
> to change the default behavior.  Is that possible somehow?  And if not, could
> you consider this a feature request to add such an option?

First, let me say that I'm speaking for myself and not the whole list.
Other people may have other opinions, and some may want to see a patch
before deciding.

Personally, I am hesitant about such an option, since I know people use
stash in scripts and hooks (whether that's a good idea or not), and
ending up with an unclean tree after git stash would be surprising and
could potentially cause data loss.  (I have a co-worker who is doing
exactly this.)

We do have a --no-keep-index option, so there is an out for people who
don't want that behavior, and maybe that's enough to avoid problems.  I
usually deal with this situation by using an alias, which lets me have
more control over the exact behavior I want.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 2/3] mingw: replace MSVCRT's fstat() with a Win32-based implementation

2018-10-23 Thread brian m. carlson
On Tue, Oct 23, 2018 at 03:23:21AM -0700, Karsten Blees via GitGitGadget wrote:
> - if (!get_file_info_by_handle(fh, buf))
> + case FILE_TYPE_CHAR:
> + case FILE_TYPE_PIPE:
> + /* initialize stat fields */
> + memset(buf, 0, sizeof(*buf));
> + buf->st_nlink = 1;
> +
> + if (type == FILE_TYPE_CHAR) {
> + buf->st_mode = _S_IFCHR;
> + } else {
> + buf->st_mode = _S_IFIFO;
> + if (PeekNamedPipe(fh, NULL, 0, NULL, , NULL))
> + buf->st_size = avail;

These lines strike me as a bit odd.  As far as I'm aware, Unix systems
don't return anything useful in this field when calling fstat on a pipe.
Is there a reason we fill this in on Windows?  If so, could the commit
message explain what that is?
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: git pull defaults for recursesubmodules

2018-10-23 Thread brian m. carlson
On Wed, Oct 24, 2018 at 12:04:06AM +0300, Tommi Vainikainen wrote:
> I configured my local git to fetch with recurseSubmodules = on-demand,
> which I found the most convenient setting. However then I noticed that
> I mostly use git pull actually to fetch from remotes, but git pull
> does not utilize any recurseSubmoddules setting now, or at least I
> could not find such.
> 
> I would expect that if git-config has fetch.recurseSubmodules set,
> also git pull should use this setting, or at least similar option such
> as pull.recurseSubmodules should be available. I'd prefer sharing
> fetch.recurseSubmodules setting here.
> 
> I've attached a minimal patch, which I believe implements this
> configuration usage, and a test case to show my expected behavior for
> git pull.

Typically, we prefer patches to be inline; descriptive content like this
goes after the --- line in the patch, or in a cover letter.

> From e4483ec5b3d9c38a6e30aa0ab9fa521cba582345 Mon Sep 17 00:00:00 2001
> From: Tommi Vainikainen 
> Date: Tue, 23 Oct 2018 23:47:58 +0300
> Subject: [PATCH 1/1] pull: obey fetch.recurseSubmodules when fetching
> 
> "git pull" now uses same recurse-submodules config for fetching as "git
> fetch" by default if not overridden from command line.1

You have an extra '1" at the end of this line.

Also, missing sign-off.  See Documentation/SubmittingPatches.

> diff --git a/builtin/pull.c b/builtin/pull.c
> index 798ecf7faf..ed39b0e8ed 100644
> --- a/builtin/pull.c
> +++ b/builtin/pull.c
> @@ -347,6 +347,9 @@ static int git_pull_config(const char *var, const char 
> *value, void *cb)
>   recurse_submodules = git_config_bool(var, value) ?
>   RECURSE_SUBMODULES_ON : RECURSE_SUBMODULES_OFF;
>   return 0;
> + } else if (!strcmp(var, "fetch.recursesubmodules")) {
> + recurse_submodules = parse_fetch_recurse_submodules_arg(var, 
> value);
> + return 0;
>   }
>   return git_default_config(var, value, cb);
>  }
> diff --git a/t/t5572-pull-submodule.sh b/t/t5572-pull-submodule.sh
> index f916729a12..579ae5c374 100755
> --- a/t/t5572-pull-submodule.sh
> +++ b/t/t5572-pull-submodule.sh
> @@ -75,6 +75,17 @@ test_expect_success "submodule.recurse option triggers 
> recursive pull" '
>   test_path_is_file super/sub/merge_strategy_2.t
>  '
>  
> +test_expect_success "fetch.recurseSubmodules=true triggers recursive pull" '
> + test_commit -C child fetch_recurse_submodules &&
> + git -C parent submodule update --remote &&
> + git -C parent add sub &&
> + git -C parent commit -m "update submodule" &&
> +
> +     git -C super config fetch.recurseSubmodules true &&
> + git -C super pull --no-rebase &&
> + test_path_is_file super/sub/fetch_recurse_submodules.t
> +'

Can we have a test that --no-recurse-submodules overrides
fetch.recurseSubmodules?
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 1/3] gpg-interface.c: use flags to determine key/signer info presence

2018-10-23 Thread brian m. carlson
On Mon, Oct 22, 2018 at 06:38:19PM +0200, Michał Górny wrote:
> Replace the logic used to determine whether key and signer information
> is present to use explicit flags in sigcheck_gpg_status[] array.  This
> is more future-proof, since it makes it possible to add additional
> statuses without having to explicitly update the conditions.

This series looks good to me.  I was going to ask after patch 2 whether
you were printing the subkey or primary key fingerprint, and then you
answered my question in patch 3.  Thanks for including both.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v3 10/12] Add a base implementation of SHA-256 support

2018-10-22 Thread brian m. carlson
On Mon, Oct 22, 2018 at 11:44:40AM +0200, SZEDER Gábor wrote:
> To protect us from potential "macro redefinition" errors, these
> #undefs should come before the #defines above.  Note also that BLKSIZE
> is not #undef-ed.

Ah, okay.  I think I misread your suggestion.  I'll see if anyone has
more comments, and then reroll with that change.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


[PATCH v3 09/12] commit-graph: convert to using the_hash_algo

2018-10-21 Thread brian m. carlson
Instead of using hard-coded constants for object sizes, use
the_hash_algo to look them up.  In addition, use a function call to look
up the object ID version and produce the correct value.  For now, we use
version 1, which means to use the default algorithm used in the rest of
the repository.

Signed-off-by: brian m. carlson 
---
 commit-graph.c | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..6763d19288 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -23,16 +23,11 @@
 #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
 #define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
 
-#define GRAPH_DATA_WIDTH 36
+#define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
 
 #define GRAPH_VERSION_1 0x1
 #define GRAPH_VERSION GRAPH_VERSION_1
 
-#define GRAPH_OID_VERSION_SHA1 1
-#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ
-#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
-#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
-
 #define GRAPH_OCTOPUS_EDGES_NEEDED 0x8000
 #define GRAPH_PARENT_MISSING 0x7fff
 #define GRAPH_EDGE_LAST_MASK 0x7fff
@@ -44,13 +39,18 @@
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
-   + GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
+   + GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
 
 char *get_commit_graph_filename(const char *obj_dir)
 {
return xstrfmt("%s/info/commit-graph", obj_dir);
 }
 
+static uint8_t oid_version(void)
+{
+   return 1;
+}
+
 static struct commit_graph *alloc_commit_graph(void)
 {
struct commit_graph *g = xcalloc(1, sizeof(*g));
@@ -125,15 +125,15 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
}
 
hash_version = *(unsigned char*)(data + 5);
-   if (hash_version != GRAPH_OID_VERSION) {
+   if (hash_version != oid_version()) {
error(_("hash version %X does not match version %X"),
- hash_version, GRAPH_OID_VERSION);
+ hash_version, oid_version());
goto cleanup_fail;
}
 
graph = alloc_commit_graph();
 
-   graph->hash_len = GRAPH_OID_LEN;
+   graph->hash_len = the_hash_algo->rawsz;
graph->num_chunks = *(unsigned char*)(data + 6);
graph->graph_fd = fd;
graph->data = graph_map;
@@ -149,7 +149,7 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
 
chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 
-   if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
+   if (chunk_offset > graph_size - the_hash_algo->rawsz) {
error(_("improper chunk offset %08x%08x"), 
(uint32_t)(chunk_offset >> 32),
  (uint32_t)chunk_offset);
goto cleanup_fail;
@@ -764,6 +764,7 @@ void write_commit_graph(const char *obj_dir,
int num_extra_edges;
struct commit_list *parent;
struct progress *progress = NULL;
+   const unsigned hashsz = the_hash_algo->rawsz;
 
if (!commit_graph_compatible(the_repository))
return;
@@ -909,7 +910,7 @@ void write_commit_graph(const char *obj_dir,
hashwrite_be32(f, GRAPH_SIGNATURE);
 
hashwrite_u8(f, GRAPH_VERSION);
-   hashwrite_u8(f, GRAPH_OID_VERSION);
+   hashwrite_u8(f, oid_version());
hashwrite_u8(f, num_chunks);
hashwrite_u8(f, 0); /* unused padding byte */
 
@@ -924,8 +925,8 @@ void write_commit_graph(const char *obj_dir,
 
chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
-   chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
-   chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
+   chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
+   chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
 
for (i = 0; i <= num_chunks; i++) {
@@ -938,8 +939,8 @@ void write_commit_graph(const char *obj_dir,
}
 
write_graph_chunk_fanout(f, commits.list, commits.nr);
-   write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
-   write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
+   write_graph_chunk_oids(f, hashsz, commits.list, commits.nr);
+   write_graph_chunk_data(f, hashsz, commits.list, commits.nr);
write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
close_commit_graph(the_repository);


[PATCH v3 06/12] t: make the sha1 test-tool helper generic

2018-10-21 Thread brian m. carlson
Since we're going to have multiple hash algorithms to test, it makes
sense to share as much of the test code as possible.  Convert the sha1
helper for the test-tool to be generic and move it out into its own
module.  This will allow us to share most of this code with our NewHash
implementation.

Signed-off-by: brian m. carlson 
---
 Makefile  |  1 +
 t/helper/{test-sha1.c => test-hash.c} | 19 +-
 t/helper/test-sha1.c  | 52 +--
 t/helper/test-tool.h  |  2 ++
 4 files changed, 14 insertions(+), 60 deletions(-)
 copy t/helper/{test-sha1.c => test-hash.c} (65%)

diff --git a/Makefile b/Makefile
index d18ab0fe78..81dc9ac819 100644
--- a/Makefile
+++ b/Makefile
@@ -714,6 +714,7 @@ TEST_BUILTINS_OBJS += test-dump-split-index.o
 TEST_BUILTINS_OBJS += test-dump-untracked-cache.o
 TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
+TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
diff --git a/t/helper/test-sha1.c b/t/helper/test-hash.c
similarity index 65%
copy from t/helper/test-sha1.c
copy to t/helper/test-hash.c
index 1ba0675c75..0a31de66f3 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-hash.c
@@ -1,13 +1,14 @@
 #include "test-tool.h"
 #include "cache.h"
 
-int cmd__sha1(int ac, const char **av)
+int cmd_hash_impl(int ac, const char **av, int algo)
 {
-   git_SHA_CTX ctx;
-   unsigned char sha1[20];
+   git_hash_ctx ctx;
+   unsigned char hash[GIT_MAX_HEXSZ];
unsigned bufsz = 8192;
int binary = 0;
char *buffer;
+   const struct git_hash_algo *algop = _algos[algo];
 
if (ac == 2) {
if (!strcmp(av[1], "-b"))
@@ -26,7 +27,7 @@ int cmd__sha1(int ac, const char **av)
die("OOPS");
}
 
-   git_SHA1_Init();
+   algop->init_fn();
 
while (1) {
ssize_t sz, this_sz;
@@ -38,20 +39,20 @@ int cmd__sha1(int ac, const char **av)
if (sz == 0)
break;
if (sz < 0)
-   die_errno("test-sha1");
+   die_errno("test-hash");
this_sz += sz;
cp += sz;
room -= sz;
}
if (this_sz == 0)
break;
-   git_SHA1_Update(, buffer, this_sz);
+   algop->update_fn(, buffer, this_sz);
}
-   git_SHA1_Final(sha1, );
+   algop->final_fn(hash, );
 
if (binary)
-   fwrite(sha1, 1, 20, stdout);
+   fwrite(hash, 1, algop->rawsz, stdout);
else
-   puts(sha1_to_hex(sha1));
+   puts(hash_to_hex_algop(hash, algop));
exit(0);
 }
diff --git a/t/helper/test-sha1.c b/t/helper/test-sha1.c
index 1ba0675c75..d860c387c3 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-sha1.c
@@ -3,55 +3,5 @@
 
 int cmd__sha1(int ac, const char **av)
 {
-   git_SHA_CTX ctx;
-   unsigned char sha1[20];
-   unsigned bufsz = 8192;
-   int binary = 0;
-   char *buffer;
-
-   if (ac == 2) {
-   if (!strcmp(av[1], "-b"))
-   binary = 1;
-   else
-   bufsz = strtoul(av[1], NULL, 10) * 1024 * 1024;
-   }
-
-   if (!bufsz)
-   bufsz = 8192;
-
-   while ((buffer = malloc(bufsz)) == NULL) {
-   fprintf(stderr, "bufsz %u is too big, halving...\n", bufsz);
-   bufsz /= 2;
-   if (bufsz < 1024)
-   die("OOPS");
-   }
-
-   git_SHA1_Init();
-
-   while (1) {
-   ssize_t sz, this_sz;
-   char *cp = buffer;
-   unsigned room = bufsz;
-   this_sz = 0;
-   while (room) {
-   sz = xread(0, cp, room);
-   if (sz == 0)
-   break;
-   if (sz < 0)
-   die_errno("test-sha1");
-   this_sz += sz;
-   cp += sz;
-   room -= sz;
-   }
-   if (this_sz == 0)
-   break;
-   git_SHA1_Update(, buffer, this_sz);
-   }
-   git_SHA1_Final(sha1, );
-
-   if (binary)
-   fwrite(sha1, 1, 20, stdout);
-   else
-   puts(sha1_to_hex(sha1));
-   exit(0);
+   return cmd_hash_impl(ac, av, GIT_HASH_SHA1);
 }
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index e4890566da..29ac7b0b0d 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -50

[PATCH v3 02/12] sha1-file: provide functions to look up hash algorithms

2018-10-21 Thread brian m. carlson
There are several ways we might refer to a hash algorithm: by name, such
as in the config file; by format ID, such as in a pack; or internally,
by a pointer to the hash_algos array.  Provide functions to look up hash
algorithms based on these various forms and return the internal constant
used for them.  If conversion to another form is necessary, this
internal constant can be used to look up the proper data in the
hash_algos array.

Signed-off-by: brian m. carlson 
---
 hash.h  | 13 +
 sha1-file.c | 21 +
 2 files changed, 34 insertions(+)

diff --git a/hash.h b/hash.h
index 7c8238bc2e..80881eea47 100644
--- a/hash.h
+++ b/hash.h
@@ -98,4 +98,17 @@ struct git_hash_algo {
 };
 extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS];
 
+/*
+ * Return a GIT_HASH_* constant based on the name.  Returns GIT_HASH_UNKNOWN if
+ * the name doesn't match a known algorithm.
+ */
+int hash_algo_by_name(const char *name);
+/* Identical, except based on the format ID. */
+int hash_algo_by_id(uint32_t format_id);
+/* Identical, except for a pointer to struct git_hash_algo. */
+static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
+{
+   return p - hash_algos;
+}
+
 #endif
diff --git a/sha1-file.c b/sha1-file.c
index 91311ebb3d..7e9dedc744 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -122,6 +122,27 @@ const char *empty_blob_oid_hex(void)
return oid_to_hex_r(buf, the_hash_algo->empty_blob);
 }
 
+int hash_algo_by_name(const char *name)
+{
+   int i;
+   if (!name)
+   return GIT_HASH_UNKNOWN;
+   for (i = 1; i < GIT_HASH_NALGOS; i++)
+   if (!strcmp(name, hash_algos[i].name))
+   return i;
+   return GIT_HASH_UNKNOWN;
+}
+
+int hash_algo_by_id(uint32_t format_id)
+{
+   int i;
+   for (i = 1; i < GIT_HASH_NALGOS; i++)
+   if (format_id == hash_algos[i].format_id)
+   return i;
+   return GIT_HASH_UNKNOWN;
+}
+
+
 /*
  * This is meant to hold a *small* number of objects that you would
  * want read_sha1_file() to be able to return, but yet you do not want


[PATCH v3 12/12] hash: add an SHA-256 implementation using OpenSSL

2018-10-21 Thread brian m. carlson
We already have OpenSSL routines available for SHA-1, so add routines
for SHA-256 as well.

On a Core i7-6600U, this SHA-256 implementation compares favorably to
the SHA1DC SHA-1 implementation:

SHA-1: 157 MiB/s (64 byte chunks); 337 MiB/s (16 KiB chunks)
SHA-256: 165 MiB/s (64 byte chunks); 408 MiB/s (16 KiB chunks)

Signed-off-by: brian m. carlson 
---
 Makefile | 7 +++
 hash.h   | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/Makefile b/Makefile
index 5a07e03100..36fd3a149b 100644
--- a/Makefile
+++ b/Makefile
@@ -183,6 +183,8 @@ all::
 #
 # Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
 #
+# Define OPENSSL_SHA256 to use the SHA-256 routines in OpenSSL.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1638,6 +1640,10 @@ endif
 endif
 endif
 
+ifdef OPENSSL_SHA256
+   EXTLIBS += $(LIB_4_CRYPTO)
+   BASIC_CFLAGS += -DSHA256_OPENSSL
+else
 ifdef GCRYPT_SHA256
BASIC_CFLAGS += -DSHA256_GCRYPT
EXTLIBS += -lgcrypt
@@ -1645,6 +1651,7 @@ else
LIB_OBJS += sha256/block/sha256.o
BASIC_CFLAGS += -DSHA256_BLK
 endif
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index 2ef098052d..adde708cf2 100644
--- a/hash.h
+++ b/hash.h
@@ -17,6 +17,8 @@
 
 #if defined(SHA256_GCRYPT)
 #include "sha256/gcrypt.h"
+#elif defined(SHA256_OPENSSL)
+#include 
 #else
 #include "sha256/block/sha256.h"
 #endif


[PATCH v3 04/12] cache: make hashcmp and hasheq work with larger hashes

2018-10-21 Thread brian m. carlson
In 183a638b7d ("hashcmp: assert constant hash size", 2018-08-23), we
modified hashcmp to assert that the hash size was always 20 to help it
optimize and inline calls to memcmp.  In a future series, we replaced
many calls to hashcmp and oidcmp with calls to hasheq and oideq to
improve inlining further.

However, we want to support hash algorithms other than SHA-1, namely
SHA-256.  When doing so, we must handle the case where these values are
32 bytes long as well as 20.  Adjust hashcmp to handle two cases:
20-byte matches, and maximum-size matches.  Therefore, when we include
SHA-256, we'll automatically handle it properly, while at the same time
teaching the compiler that there are only two possible options to
consider.  This will allow the compiler to write the most efficient
possible code.

Copy similar code into hasheq and perform an identical transformation.
At least with GCC 8.2.0, making hasheq defer to hashcmp when there are
two branches prevents the compiler from inlining the comparison, while
the code in this patch is inlined properly.  Add a comment to avoid an
accidental performance regression from well-intentioned refactoring.

Signed-off-by: brian m. carlson 
---
 cache.h | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index 51580c4b77..bab8e8964f 100644
--- a/cache.h
+++ b/cache.h
@@ -1027,16 +1027,12 @@ extern const struct object_id null_oid;
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
/*
-* This is a temporary optimization hack. By asserting the size here,
-* we let the compiler know that it's always going to be 20, which lets
-* it turn this fixed-size memcmp into a few inline instructions.
-*
-* This will need to be extended or ripped out when we learn about
-* hashes of different sizes.
+* Teach the compiler that there are only two possibilities of hash size
+* here, so that it can optimize for this case as much as possible.
 */
-   if (the_hash_algo->rawsz != 20)
-   BUG("hash size not yet supported by hashcmp");
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
+   return memcmp(sha1, sha2, GIT_MAX_RAWSZ);
+   return memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)
@@ -1046,7 +1042,13 @@ static inline int oidcmp(const struct object_id *oid1, 
const struct object_id *o
 
 static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return !hashcmp(sha1, sha2);
+   /*
+* We write this here instead of deferring to hashcmp so that the
+* compiler can properly inline it and avoid calling memcmp.
+*/
+   if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
+   return !memcmp(sha1, sha2, GIT_MAX_RAWSZ);
+   return !memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oideq(const struct object_id *oid1, const struct object_id 
*oid2)


[PATCH v3 08/12] t/helper: add a test helper to compute hash speed

2018-10-21 Thread brian m. carlson
Add a utility (which is less for the testsuite and more for developers)
that can compute hash speeds for whatever hash algorithms are
implemented.  This allows developers to test their personal systems to
determine the performance characteristics of various algorithms.

Signed-off-by: brian m. carlson 
---
 Makefile   |  1 +
 t/helper/test-hash-speed.c | 61 ++
 t/helper/test-tool.c   |  1 +
 t/helper/test-tool.h   |  1 +
 4 files changed, 64 insertions(+)
 create mode 100644 t/helper/test-hash-speed.c

diff --git a/Makefile b/Makefile
index 81dc9ac819..68169a7abb 100644
--- a/Makefile
+++ b/Makefile
@@ -716,6 +716,7 @@ TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
 TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
+TEST_BUILTINS_OBJS += test-hash-speed.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
 TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o
diff --git a/t/helper/test-hash-speed.c b/t/helper/test-hash-speed.c
new file mode 100644
index 00..432233c7f0
--- /dev/null
+++ b/t/helper/test-hash-speed.c
@@ -0,0 +1,61 @@
+#include "test-tool.h"
+#include "cache.h"
+
+#define NUM_SECONDS 3
+
+static inline void compute_hash(const struct git_hash_algo *algo, git_hash_ctx 
*ctx, uint8_t *final, const void *p, size_t len)
+{
+   algo->init_fn(ctx);
+   algo->update_fn(ctx, p, len);
+   algo->final_fn(final, ctx);
+}
+
+int cmd__hash_speed(int ac, const char **av)
+{
+   git_hash_ctx ctx;
+   unsigned char hash[GIT_MAX_RAWSZ];
+   clock_t initial, start, end;
+   unsigned bufsizes[] = { 64, 256, 1024, 8192, 16384 };
+   int i;
+   void *p;
+   const struct git_hash_algo *algo = NULL;
+
+   if (ac == 2) {
+   for (i = 1; i < GIT_HASH_NALGOS; i++) {
+   if (!strcmp(av[1], hash_algos[i].name)) {
+   algo = _algos[i];
+   break;
+   }
+   }
+   }
+   if (!algo)
+   die("usage: test-tool hash-speed algo_name");
+
+   /* Use this as an offset to make overflow less likely. */
+   initial = clock();
+
+   printf("algo: %s\n", algo->name);
+
+   for (i = 0; i < ARRAY_SIZE(bufsizes); i++) {
+   unsigned long j, kb;
+   double kb_per_sec;
+   p = xcalloc(1, bufsizes[i]);
+   start = end = clock() - initial;
+   for (j = 0; ((end - start) / CLOCKS_PER_SEC) < NUM_SECONDS; 
j++) {
+   compute_hash(algo, , hash, p, bufsizes[i]);
+
+   /*
+* Only check elapsed time every 128 iterations to avoid
+* dominating the runtime with system calls.
+*/
+   if (!(j & 127))
+   end = clock() - initial;
+   }
+   kb = j * bufsizes[i];
+   kb_per_sec = kb / (1024 * ((double)end - start) / 
CLOCKS_PER_SEC);
+   printf("size %u: %lu iters; %lu KiB; %0.2f KiB/s\n", 
bufsizes[i], j, kb, kb_per_sec);
+   free(p);
+   }
+
+   exit(0);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 6b5836dc1b..e009c8186d 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -20,6 +20,7 @@ static struct test_cmd cmds[] = {
{ "example-decorate", cmd__example_decorate },
{ "genrandom", cmd__genrandom },
{ "hashmap", cmd__hashmap },
+   { "hash-speed", cmd__hash_speed },
{ "index-version", cmd__index_version },
{ "json-writer", cmd__json_writer },
{ "lazy-init-name-hash", cmd__lazy_init_name_hash },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 29ac7b0b0d..19a7e8332a 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -16,6 +16,7 @@ int cmd__dump_untracked_cache(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
 int cmd__hashmap(int argc, const char **argv);
+int cmd__hash_speed(int argc, const char **argv);
 int cmd__index_version(int argc, const char **argv);
 int cmd__json_writer(int argc, const char **argv);
 int cmd__lazy_init_name_hash(int argc, const char **argv);


[PATCH v3 01/12] sha1-file: rename algorithm to "sha1"

2018-10-21 Thread brian m. carlson
The transition plan anticipates us using a syntax such as "^{sha1}" for
disambiguation.  Since this is a syntax some people will be typing a
lot, it makes sense to provide a short, easy-to-type syntax.  Omitting
the dash doesn't create any ambiguity; however, it does make the syntax
shorter and easier to type, especially for touch typists.  In addition,
the transition plan already uses "sha1" in this context.

Rename the name of SHA-1 implementation to "sha1".

Note that this change creates no backwards compatibility concerns, since
we haven't yet used this field in any configuration settings.

Signed-off-by: brian m. carlson 
---
 sha1-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1-file.c b/sha1-file.c
index dd0b6aa873..91311ebb3d 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -97,7 +97,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
NULL,
},
{
-   "sha-1",
+   "sha1",
/* "sha1", big-endian */
0x73686131,
GIT_SHA1_RAWSZ,


[PATCH v3 00/12] Base SHA-256 implementation

2018-10-21 Thread brian m. carlson
This series provides a functional SHA-256 implementation and wires it
up, along with some housekeeping patches to make it suitable for
testing.

Changes from v2:
* Improve commit messages to include timing and performance information.
* Improve commit messages to be less ambiguous and more friendly to a
  wider variety of English speakers.
* Prefer functions taking struct git_hash_algo in hex.c.
* Port pieces of the block-sha1 implementation over to the block-sha256
  implementation for better compatibility.
* Drop patch 13 in favor of further discussion about the best way
  forward for versioning commit graph.
* Rename the test so as to have a different number from other tests.
* Rebase on master.

Changes from v1:
* Add a hash_to_hex function mirroring sha1_to_hex, but for
  the_hash_algo.
* Strip commit message explanation about why we chose SHA-256.
* Rebase on master
* Strip leading whitespace from commit message.
* Improve commit-graph patch to cover new code added since v1.
* Be more honest about the scope of work involved in porting the SHA-256
  implementation out of libtomcrypt.
* Revert change to limit hashcmp to 20 bytes.

brian m. carlson (12):
  sha1-file: rename algorithm to "sha1"
  sha1-file: provide functions to look up hash algorithms
  hex: introduce functions to print arbitrary hashes
  cache: make hashcmp and hasheq work with larger hashes
  t: add basic tests for our SHA-1 implementation
  t: make the sha1 test-tool helper generic
  sha1-file: add a constant for hash block size
  t/helper: add a test helper to compute hash speed
  commit-graph: convert to using the_hash_algo
  Add a base implementation of SHA-256 support
  sha256: add an SHA-256 implementation using libgcrypt
  hash: add an SHA-256 implementation using OpenSSL

 Makefile  |  22 +++
 cache.h   |  51 ---
 commit-graph.c|  33 ++---
 hash.h|  41 +-
 hex.c |  32 +++--
 sha1-file.c   |  70 +-
 sha256/block/sha256.c | 186 ++
 sha256/block/sha256.h |  26 
 sha256/gcrypt.h   |  30 +
 t/helper/test-hash-speed.c|  61 +
 t/helper/{test-sha1.c => test-hash.c} |  19 +--
 t/helper/test-sha1.c  |  52 +--
 t/helper/test-sha256.c|   7 +
 t/helper/test-tool.c  |   2 +
 t/helper/test-tool.h  |   4 +
 t/t0015-hash.sh   |  54 
 16 files changed, 586 insertions(+), 104 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 sha256/gcrypt.h
 create mode 100644 t/helper/test-hash-speed.c
 copy t/helper/{test-sha1.c => test-hash.c} (65%)
 create mode 100644 t/helper/test-sha256.c
 create mode 100755 t/t0015-hash.sh

Range-diff against v2:
 1:  b5845548ac <  -:  -- :hash-impl
 2:  804ec2fd27 !  1:  cf9f7f5620 sha1-file: rename algorithm to "sha1"
@@ -5,14 +5,14 @@
 The transition plan anticipates us using a syntax such as "^{sha1}" for
 disambiguation.  Since this is a syntax some people will be typing a
 lot, it makes sense to provide a short, easy-to-type syntax.  Omitting
-the dash doesn't create any ambiguity, but it does make it shorter and
-easier to type, especially for touch typists.  In addition, the
-transition plan already uses "sha1" in this context.
+the dash doesn't create any ambiguity; however, it does make the syntax
+shorter and easier to type, especially for touch typists.  In addition,
+the transition plan already uses "sha1" in this context.
 
 Rename the name of SHA-1 implementation to "sha1".
 
 Note that this change creates no backwards compatibility concerns, 
since
-we haven't yet used this field in any serialized data formats.
+we haven't yet used this field in any configuration settings.
 
 Signed-off-by: brian m. carlson 
 
 3:  5196e00b26 !  2:  0144deaebe sha1-file: provide functions to look up hash 
algorithms
@@ -27,7 +27,7 @@
 +/* Identical, except based on the format ID. */
 +int hash_algo_by_id(uint32_t format_id);
 +/* Identical, except for a pointer to struct git_hash_algo. */
-+inline int hash_algo_by_ptr(const struct git_hash_algo *p)
++static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
 +{
 +  return p - hash_algos;
 +}
 4:  5873510d0a !  3:  b74858fb03 hex: introduce functions to print arbitrary 
hashes
@@ -45,13 +45,13 @@
 -extern char *oid_to_hex_r(char *out, const struct object_id *oid);
 -extern char *sha1_to_hex(const unsigned char *sha1);  /* static 
buffer result! */
 -extern char

[PATCH v3 11/12] sha256: add an SHA-256 implementation using libgcrypt

2018-10-21 Thread brian m. carlson
Generally, one gets better performance out of cryptographic routines
written in assembly than C, and this is also true for SHA-256.  In
addition, most Linux distributions cannot distribute Git linked against
OpenSSL for licensing reasons.

Most systems with GnuPG will also have libgcrypt, since it is a
dependency of GnuPG.  libgcrypt is also faster than the SHA1DC
implementation for messages of a few KiB and larger.

For comparison, on a Core i7-6600U, this implementation processes 16 KiB
chunks at 355 MiB/s while SHA1DC processes equivalent chunks at 337
MiB/s.

In addition, libgcrypt is licensed under the LGPL 2.1, which is
compatible with the GPL.  Add an implementation of SHA-256 that uses
libgcrypt.

Signed-off-by: brian m. carlson 
---
 Makefile| 13 +++--
 hash.h  |  4 
 sha256/gcrypt.h | 30 ++
 3 files changed, 45 insertions(+), 2 deletions(-)
 create mode 100644 sha256/gcrypt.h

diff --git a/Makefile b/Makefile
index e99b7712f6..5a07e03100 100644
--- a/Makefile
+++ b/Makefile
@@ -179,6 +179,10 @@ all::
 # in one call to the platform's SHA1_Update(). e.g. APPLE_COMMON_CRYPTO
 # wants 'SHA1_MAX_BLOCK_SIZE=1024L*1024L*1024L' defined.
 #
+# Define BLK_SHA256 to use the built-in SHA-256 routines.
+#
+# Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1634,8 +1638,13 @@ endif
 endif
 endif
 
-LIB_OBJS += sha256/block/sha256.o
-BASIC_CFLAGS += -DSHA256_BLK
+ifdef GCRYPT_SHA256
+   BASIC_CFLAGS += -DSHA256_GCRYPT
+   EXTLIBS += -lgcrypt
+else
+   LIB_OBJS += sha256/block/sha256.o
+   BASIC_CFLAGS += -DSHA256_BLK
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index a9bc624020..2ef098052d 100644
--- a/hash.h
+++ b/hash.h
@@ -15,7 +15,11 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#if defined(SHA256_GCRYPT)
+#include "sha256/gcrypt.h"
+#else
 #include "sha256/block/sha256.h"
+#endif
 
 #ifndef platform_SHA_CTX
 /*
diff --git a/sha256/gcrypt.h b/sha256/gcrypt.h
new file mode 100644
index 00..09bd8bb200
--- /dev/null
+++ b/sha256/gcrypt.h
@@ -0,0 +1,30 @@
+#ifndef SHA256_GCRYPT_H
+#define SHA256_GCRYPT_H
+
+#include 
+
+#define SHA256_DIGEST_SIZE 32
+
+typedef gcry_md_hd_t gcrypt_SHA256_CTX;
+
+inline void gcrypt_SHA256_Init(gcrypt_SHA256_CTX *ctx)
+{
+   gcry_md_open(ctx, GCRY_MD_SHA256, 0);
+}
+
+inline void gcrypt_SHA256_Update(gcrypt_SHA256_CTX *ctx, const void *data, 
size_t len)
+{
+   gcry_md_write(*ctx, data, len);
+}
+
+inline void gcrypt_SHA256_Final(unsigned char *digest, gcrypt_SHA256_CTX *ctx)
+{
+   memcpy(digest, gcry_md_read(*ctx, GCRY_MD_SHA256), SHA256_DIGEST_SIZE);
+}
+
+#define platform_SHA256_CTX gcrypt_SHA256_CTX
+#define platform_SHA256_Init gcrypt_SHA256_Init
+#define platform_SHA256_Update gcrypt_SHA256_Update
+#define platform_SHA256_Final gcrypt_SHA256_Final
+
+#endif


[PATCH v3 05/12] t: add basic tests for our SHA-1 implementation

2018-10-21 Thread brian m. carlson
We have in the past had some unfortunate endianness issues with some
SHA-1 implementations we ship, especially on big-endian machines.  Add
an explicit test using the test helper to catch these issues and point
them out prominently.  This test can also be used as a staging ground
for people testing additional algorithms to verify that their
implementations are working as expected.

Signed-off-by: brian m. carlson 
---
 t/t0015-hash.sh | 29 +
 1 file changed, 29 insertions(+)
 create mode 100755 t/t0015-hash.sh

diff --git a/t/t0015-hash.sh b/t/t0015-hash.sh
new file mode 100755
index 00..8e763c2c3d
--- /dev/null
+++ b/t/t0015-hash.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+test_description='test basic hash implementation'
+. ./test-lib.sh
+
+
+test_expect_success 'test basic SHA-1 hash values' '
+   test-tool sha1 actual &&
+   grep da39a3ee5e6b4b0d3255bfef95601890afd80709 actual &&
+   printf "a" | test-tool sha1 >actual &&
+   grep 86f7e437faa5a7fce15d1ddcb9eaeaea377667b8 actual &&
+   printf "abc" | test-tool sha1 >actual &&
+   grep a9993e364706816aba3e25717850c26c9cd0d89d actual &&
+   printf "message digest" | test-tool sha1 >actual &&
+   grep c12252ceda8be8994d5fa0290a47231c1d16aae3 actual &&
+   printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha1 >actual &&
+   grep 32d10c7b8cf96570ca04ce37f2a19d84240d3a89 actual &&
+   perl -E "for (1..10) { print q{aa}; }" | \
+   test-tool sha1 >actual &&
+   grep 34aa973cd4c4daa4f61eeb2bdbad27316534016f actual &&
+   printf "blob 0\0" | test-tool sha1 >actual &&
+   grep e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 actual &&
+   printf "blob 3\0abc" | test-tool sha1 >actual &&
+   grep f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f actual &&
+   printf "tree 0\0" | test-tool sha1 >actual &&
+   grep 4b825dc642cb6eb9a060e54bf8d69288fbee4904 actual
+'
+
+test_done


[PATCH v3 03/12] hex: introduce functions to print arbitrary hashes

2018-10-21 Thread brian m. carlson
Currently, we have functions that turn an arbitrary SHA-1 value or an
object ID into hex format, either using a static buffer or with a
user-provided buffer.  Add variants of these functions that can handle
an arbitrary hash algorithm, specified by constant.  Update the
documentation as well.

While we're at it, remove the "extern" declaration from this family of
functions, since it's not needed and our style now recommends against
it.

We use the variant taking the algorithm structure pointer as the
internal variant, since taking an algorithm pointer is the easiest way
to handle all of the variants in use.

Note that we maintain these functions because there are hashes which
must change based on the hash algorithm in use but are not object IDs
(such as pack checksums).

Signed-off-by: brian m. carlson 
---
 cache.h | 15 +--
 hex.c   | 32 
 2 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/cache.h b/cache.h
index 59c8a93046..51580c4b77 100644
--- a/cache.h
+++ b/cache.h
@@ -1364,9 +1364,9 @@ extern int get_oid_hex(const char *hex, struct object_id 
*sha1);
 extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
 
 /*
- * Convert a binary sha1 to its hex equivalent. The `_r` variant is reentrant,
+ * Convert a binary hash to its hex equivalent. The `_r` variant is reentrant,
  * and writes the NUL-terminated output to the buffer `out`, which must be at
- * least `GIT_SHA1_HEXSZ + 1` bytes, and returns a pointer to out for
+ * least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for
  * convenience.
  *
  * The non-`_r` variant returns a static buffer, but uses a ring of 4
@@ -1374,10 +1374,13 @@ extern int hex_to_bytes(unsigned char *binary, const 
char *hex, size_t len);
  *
  *   printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two));
  */
-extern char *sha1_to_hex_r(char *out, const unsigned char *sha1);
-extern char *oid_to_hex_r(char *out, const struct object_id *oid);
-extern char *sha1_to_hex(const unsigned char *sha1);   /* static buffer 
result! */
-extern char *oid_to_hex(const struct object_id *oid);  /* same static buffer 
as sha1_to_hex */
+char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const 
struct git_hash_algo *);
+char *sha1_to_hex_r(char *out, const unsigned char *sha1);
+char *oid_to_hex_r(char *out, const struct object_id *oid);
+char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo 
*);  /* static buffer result! */
+char *sha1_to_hex(const unsigned char *sha1);  
/* same static buffer */
+char *hash_to_hex(const unsigned char *hash);  
/* same static buffer */
+char *oid_to_hex(const struct object_id *oid); 
/* same static buffer */
 
 /*
  * Parse a 40-character hexadecimal object ID starting from hex, updating the
diff --git a/hex.c b/hex.c
index 10af1a29e8..d2e8bb9540 100644
--- a/hex.c
+++ b/hex.c
@@ -73,14 +73,15 @@ int parse_oid_hex(const char *hex, struct object_id *oid, 
const char **end)
return ret;
 }
 
-char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
+inline char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
+   const struct git_hash_algo *algop)
 {
static const char hex[] = "0123456789abcdef";
char *buf = buffer;
int i;
 
-   for (i = 0; i < the_hash_algo->rawsz; i++) {
-   unsigned int val = *sha1++;
+   for (i = 0; i < algop->rawsz; i++) {
+   unsigned int val = *hash++;
*buf++ = hex[val >> 4];
*buf++ = hex[val & 0xf];
}
@@ -89,20 +90,35 @@ char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
return buffer;
 }
 
-char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
 {
-   return sha1_to_hex_r(buffer, oid->hash);
+   return hash_to_hex_algop_r(buffer, sha1, _algos[GIT_HASH_SHA1]);
 }
 
-char *sha1_to_hex(const unsigned char *sha1)
+char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+{
+   return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo);
+}
+
+char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo 
*algop)
 {
static int bufno;
static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
-   return sha1_to_hex_r(hexbuffer[bufno], sha1);
+   return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop);
+}
+
+char *sha1_to_hex(const unsigned char *sha1)
+{
+   return hash_to_hex_algop(sha1, _algos[GIT_HASH_SHA1]);
+}
+
+char *hash_to_hex(const unsigned char *hash)
+{
+   return hash_to_hex_algop(hash, the_hash_algo);
 }
 
 char *oid_to_hex(const struct object_id *oid)
 {
-   return

[PATCH v3 10/12] Add a base implementation of SHA-256 support

2018-10-21 Thread brian m. carlson
SHA-1 is weak and we need to transition to a new hash function.  For
some time, we have referred to this new function as NewHash.  Recently,
we decided to pick SHA-256 as NewHash.

Add a basic implementation of SHA-256 based off libtomcrypt, which is in
the public domain.  Optimize it and restructure it to meet our coding
standards.  Pull in the update and final functions from the SHA-1 block
implementation, as we know these function correctly with all compilers.
This implementation is slower than SHA-1, but more performant
implementations will be introduced in future commits.

Wire up SHA-256 in the list of hash algorithms, and add a test that the
algorithm works correctly.

Note that with this patch, it is still not possible to switch to using
SHA-256 in Git.  Additional patches are needed to prepare the code to
handle a larger hash algorithm and further test fixes are needed.

Signed-off-by: brian m. carlson 
---
 Makefile   |   4 +
 cache.h|  12 ++-
 hash.h |  19 -
 sha1-file.c|  45 ++
 sha256/block/sha256.c  | 186 +
 sha256/block/sha256.h  |  26 ++
 t/helper/test-sha256.c |   7 ++
 t/helper/test-tool.c   |   1 +
 t/helper/test-tool.h   |   1 +
 t/t0015-hash.sh|  25 ++
 10 files changed, 322 insertions(+), 4 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 t/helper/test-sha256.c

diff --git a/Makefile b/Makefile
index 68169a7abb..e99b7712f6 100644
--- a/Makefile
+++ b/Makefile
@@ -739,6 +739,7 @@ TEST_BUILTINS_OBJS += test-run-command.o
 TEST_BUILTINS_OBJS += test-scrap-cache-tree.o
 TEST_BUILTINS_OBJS += test-sha1.o
 TEST_BUILTINS_OBJS += test-sha1-array.o
+TEST_BUILTINS_OBJS += test-sha256.o
 TEST_BUILTINS_OBJS += test-sigchain.o
 TEST_BUILTINS_OBJS += test-strcmp-offset.o
 TEST_BUILTINS_OBJS += test-string-list.o
@@ -1633,6 +1634,9 @@ endif
 endif
 endif
 
+LIB_OBJS += sha256/block/sha256.o
+BASIC_CFLAGS += -DSHA256_BLK
+
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
BASIC_CFLAGS += -DSHA1_MAX_BLOCK_SIZE="$(SHA1_MAX_BLOCK_SIZE)"
diff --git a/cache.h b/cache.h
index 9e5d1dd85a..48ce1565e6 100644
--- a/cache.h
+++ b/cache.h
@@ -48,11 +48,17 @@ unsigned long git_deflate_bound(git_zstream *, unsigned 
long);
 /* The block size of SHA-1. */
 #define GIT_SHA1_BLKSZ 64
 
+/* The length in bytes and in hex digits of an object name (SHA-256 value). */
+#define GIT_SHA256_RAWSZ 32
+#define GIT_SHA256_HEXSZ (2 * GIT_SHA256_RAWSZ)
+/* The block size of SHA-256. */
+#define GIT_SHA256_BLKSZ 64
+
 /* The length in byte and in hex digits of the largest possible hash value. */
-#define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
-#define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+#define GIT_MAX_RAWSZ GIT_SHA256_RAWSZ
+#define GIT_MAX_HEXSZ GIT_SHA256_HEXSZ
 /* The largest possible block size for any supported hash. */
-#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
+#define GIT_MAX_BLKSZ GIT_SHA256_BLKSZ
 
 struct object_id {
unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 1bcf7ab6fd..a9bc624020 100644
--- a/hash.h
+++ b/hash.h
@@ -15,6 +15,8 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#include "sha256/block/sha256.h"
+
 #ifndef platform_SHA_CTX
 /*
  * platform's underlying implementation of SHA-1; could be OpenSSL,
@@ -34,6 +36,18 @@
 #define git_SHA1_Updateplatform_SHA1_Update
 #define git_SHA1_Final platform_SHA1_Final
 
+#ifndef platform_SHA256_CTX
+#define platform_SHA256_CTXSHA256_CTX
+#define platform_SHA256_Init   SHA256_Init
+#define platform_SHA256_Update SHA256_Update
+#define platform_SHA256_Final  SHA256_Final
+#endif
+
+#define git_SHA256_CTX platform_SHA256_CTX
+#define git_SHA256_Initplatform_SHA256_Init
+#define git_SHA256_Update  platform_SHA256_Update
+#define git_SHA256_Final   platform_SHA256_Final
+
 #ifdef SHA1_MAX_BLOCK_SIZE
 #include "compat/sha1-chunked.h"
 #undef git_SHA1_Update
@@ -52,12 +66,15 @@
 #define GIT_HASH_UNKNOWN 0
 /* SHA-1 */
 #define GIT_HASH_SHA1 1
+/* SHA-256  */
+#define GIT_HASH_SHA256 2
 /* Number of algorithms supported (including unknown). */
-#define GIT_HASH_NALGOS (GIT_HASH_SHA1 + 1)
+#define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
 
 /* A suitably aligned type for stack allocations of hash contexts. */
 union git_hash_ctx {
git_SHA_CTX sha1;
+   git_SHA256_CTX sha256;
 };
 typedef union git_hash_ctx git_hash_ctx;
 
diff --git a/sha1-file.c b/sha1-file.c
index 9bdd04ea45..c97d93a14a 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -40,10 +40,20 @@
 #define EMPTY_TREE_SHA1_BIN_LITERAL \
 "\x4b\x82\x5d\xc6\x42\xcb\x6e\xb9\xa0\x60" \
 "\xe5\x4b\xf8\xd6\x92\x88\xfb\xee\x49\x04"
+#define EMPTY_TREE_SHA256_BIN_LITERAL \
+   "\x6e\xf1\x9b\x41\x22\x5c\x53\x69\xf1\xc1" \
+   "\x04\xd4\x5d\x8d\x

[PATCH v3 07/12] sha1-file: add a constant for hash block size

2018-10-21 Thread brian m. carlson
There is one place we need the hash algorithm block size: the HMAC code
for push certs.  Expose this constant in struct git_hash_algo and expose
values for SHA-1 and for the largest value of any hash.

Signed-off-by: brian m. carlson 
---
 cache.h | 4 
 hash.h  | 3 +++
 sha1-file.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/cache.h b/cache.h
index bab8e8964f..9e5d1dd85a 100644
--- a/cache.h
+++ b/cache.h
@@ -45,10 +45,14 @@ unsigned long git_deflate_bound(git_zstream *, unsigned 
long);
 /* The length in bytes and in hex digits of an object name (SHA-1 value). */
 #define GIT_SHA1_RAWSZ 20
 #define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
+/* The block size of SHA-1. */
+#define GIT_SHA1_BLKSZ 64
 
 /* The length in byte and in hex digits of the largest possible hash value. */
 #define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
 #define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+/* The largest possible block size for any supported hash. */
+#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
 
 struct object_id {
unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 80881eea47..1bcf7ab6fd 100644
--- a/hash.h
+++ b/hash.h
@@ -81,6 +81,9 @@ struct git_hash_algo {
/* The length of the hash in hex characters. */
size_t hexsz;
 
+   /* The block size of the hash. */
+   size_t blksz;
+
/* The hash initialization function. */
git_hash_init_fn init_fn;
 
diff --git a/sha1-file.c b/sha1-file.c
index 7e9dedc744..9bdd04ea45 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -90,6 +90,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
0x,
0,
0,
+   0,
git_hash_unknown_init,
git_hash_unknown_update,
git_hash_unknown_final,
@@ -102,6 +103,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
0x73686131,
GIT_SHA1_RAWSZ,
GIT_SHA1_HEXSZ,
+   GIT_SHA1_BLKSZ,
git_hash_sha1_init,
git_hash_sha1_update,
git_hash_sha1_final,


Re: Shouldn't git be able to apply diffs that it created with --ignore-whitespace?

2018-10-19 Thread brian m. carlson
On Thu, Oct 18, 2018 at 03:12:09PM -0500, Mahmoud Al-Qudsi wrote:
> Hello again all,
> 
> I think I've previously broached this subject before, but I think I perhaps
> wasn't clear enough about what I was trying to do or why I feel that git is at
> fault here.
> 
> (I'm running git 2.19.1)
> 
> Starting with a fully-committed, not-dirty codebase, I open(ed) a poorly
> formatted, mixed-whitespace file (that I absolutely did not author!) under
> version control and make some very localized changes. My editor, being very
> smart and helpful, fixes up the line ending on save, and I exit.
> 
> At this point, my source file contains a) deliberate changes I want, and b)
> whitespace changes I wish I could commit but that should not be a part of my
> patch.
> 
> Shouldn't the following workflow be supported:
> 
> ~> git diff -w > foo.diff
> ~> git reset --hard
> ~> git apply [--ignore-whitespace] < foo.diff

In general, git diff -w doesn't produce a patch that can be applied.
That's because it ignores all whitespace changes in a particular way.
The diff output is rendered as if the whitespace in the lines were
written as it is in the postimage (the changed file), not the preimage
(the original file).

This is useful because usually if you're, say, indenting a block of
code, you want to see the output properly indented with the new lines of
code (say, the loop or conditional) you wrote around it.  And since this
feature is designed for visual inspection, it makes sense to do it this
way.  However, it essentially means that your changes can't, in general,
be applied.

There are, of course, situations in which this might have worked in the
past, and it may indeed work for some situations still, but it won't in
the general case.  git apply --ignore-whitespace only modifies context
lines, so it doesn't affect the actual content lines in the diff.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256

2018-10-19 Thread brian m. carlson
On Thu, Oct 18, 2018 at 09:03:22AM -0400, Derrick Stolee wrote:
> We should coordinate before incrementing the version number. I was working
> on making the file formats incremental (think split-index) but couldn't come
> up with a way to do it without incrementing the file format. It would be
> best to combine these features into v2 of each format.

For the moment, I'm happy to leave things as they are, and I'll
interpret version 1 of the hash version field as whatever hash is in use
elsewhere in .git.  If you're going to bump the version in the future,
feel free to CC me on the patches, and we'll make sure that we serialize
the format_version field in the file then.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256

2018-10-17 Thread brian m. carlson
On Wed, Oct 17, 2018 at 04:31:19PM +0200, Duy Nguyen wrote:
> On Wed, Oct 17, 2018 at 12:44 AM brian m. carlson
>  wrote:
> > Honestly, anything in the .git directory that is not the v3 pack indexes
> > or the loose object file should be in exactly one hash algorithm.  We
> > could simply just leave this value at 1 all the time and ignore the
> > field, since we already know what algorithm it will use.
> 
> In this particular case, I agree, but not as a general principle. It's
> nice to have independence for fsck-like tools. I don't know if we have
> a tool that simply validates commit-graph file format (and not trying
> to access any real object). But for such a tool, I guess we can just
> pass the hash algorithm from command line. The user would have to
> guess a bit.

I'm going to drop this patch for now.  I'll send a follow-up series
later which bumps the format version for this and the multi-pack index
and serializes them with the four-byte value.  I probably should have
caught this earlier, but unfortunately I don't always have the time to
look at every series that hits the list.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v2 03/13] hex: introduce functions to print arbitrary hashes

2018-10-17 Thread brian m. carlson
On Tue, Oct 16, 2018 at 10:54:23AM +0900, Junio C Hamano wrote:
> Even though in hex.c I see mixture of *_algo and *_algop helper
> functions, I see only "algo" variants above.  Is it our official
> stance to use primarily the integer index into the algo array when
> specifying the hash, and when a caller into 'multi-hash' API happens
> to have other things, it should use functions in 2/13 to convert it
> to the canonical "int algo" beforehand?

That was my intention, yes.

> I am not saying it is bad or good to choose the index into the algo
> array as the primary way to specify the algorithm.  I think it is a
> good idea to pick one and stick to it, and I wanted to make sure
> that the choice we made here is clearly communicated to any future
> developer who read this code.

Yeah, that was my feeling as well.  I wanted to pick something fixed and
stick to it.

> Having said the above, seeing the use of hash_algo_by_ptr() here
> makes me suspect if it makes more sense to use the algop as the
> primary way to specify which algorithm the caller wants to use.
> IOW, making the set of helpers in 02/13 to allow quering by name,
> format-id, or the integer index and have them all return a pointer
> to "const struct git_hash_algo".  Two immediate downsides I can see
> is that it exposes the actual structure to the callers (but is it
> really a problem?  Outside callers learn hash sizes etc. by accessing
> its fields anyway without treating the algo struct as opaque.), and
> passing an 8-byte pointer may be more costly than passing a small
> integer index that ranges between 0 and 1 at most (assuming that
> we'd only use SHA-1 and "the current NewHash" in the code).

I thought about this.  The one downside to this is that we can't use
those values anywhere we need an integer constant expression, like a
switch.  I suppose that just means we need hash_algo_by_ptr in those
cases, and not everywhere else, which would make the code cleaner.

Let me reroll with that change, and we'll see if we like it better.  If
we don't, I can pull the old version out of history.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v2 10/13] Add a base implementation of SHA-256 support

2018-10-17 Thread brian m. carlson
On Wed, Oct 17, 2018 at 06:12:41PM +0200, SZEDER Gábor wrote:
> On macOS there is a MIN macro already defined in the system headers,
> resulting in the following error:
> 
>   CC sha256/block/sha256.o
>   sha256/block/sha256.c:133:9: error: 'MIN' macro redefined 
> [-Werror,-Wmacro-redefined]
>   #define MIN(x, y) ((x) < (y) ? (x) : (y))
>   ^
>   /usr/include/sys/param.h:215:9: note: previous definition is here
>   #define MIN(a,b) (((a)<(b))?(a):(b))
>   ^
>   1 error generated.
>   make: *** [sha256/block/sha256.o] Error 1
> 
> A simple "#undef MIN" solves this issue.  However, I wonder whether we
> should #undef the other #define directives as well, just to be sure
> (and perhaps overly cautious).

I'll undefine the macros at the top.

For MIN, I'm going to go with the simple approach and pull pieces from
the block-sha1 implementation, which doesn't require this code path...

> Some GCC versions (e.g. gcc-4.8 with -O2 -Wall -Werror) complain about
> the above line:
> 
>   CC sha256/block/sha256.o
>   sha256/block/sha256.c: In function ‘blk_SHA256_Final’:
>   sha256/block/sha256.c:174:2: error: dereferencing type-punned pointer will 
> break strict-aliasing rules [-Werror=strict-aliasing]
> put_be64(ctx->buf + trip, ctx->length);
> ^
>   cc1: all warnings being treated as errors
>   make: *** [sha256/block/sha256.o] Error 1
> 
> Something like this makes it compile:
> 
>   void *ptr = ctx->buf + trip;
>   put_be64(ptr, ctx->length);
> 
> However, it's not immediately obvious to me why the compiler
> complains, or why that intermediate void* variable makes any
> difference, but now it's not the time to put on my language lawyer
> hat.
> 
> Perhaps an old compiler bug?  Clang in general, newer GCC versions, or
> gcc-4.8 with -Wall -Werror but without -O2 don't seem to be affected.

...or this code path.  Presumably GCC 4.8 and macOS are happy with the
code that we already have in block-sha1.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v2 01/13] sha1-file: rename algorithm to "sha1"

2018-10-17 Thread brian m. carlson
On Tue, Oct 16, 2018 at 05:17:53PM +0200, Duy Nguyen wrote:
> On Mon, Oct 15, 2018 at 4:21 AM brian m. carlson
>  wrote:
> >
> > The transition plan anticipates us using a syntax such as "^{sha1}" for
> > disambiguation.  Since this is a syntax some people will be typing a
> > lot, it makes sense to provide a short, easy-to-type syntax.  Omitting
> > the dash doesn't create any ambiguity, but it does make it shorter and
> 
> "but" or "and"? I think both clauses are on the same side ... or did
> you mean omitting the dash does create ambiguity?

I think "but" is correct here.  This is a standard "This doesn't make it
worse, but it does make it better" phrase.  The "but" creates a contrast
between what it doesn't do and what it does.

I'm trying to come up with a different way to say this that may be
easier to understand, but I'm failing to do so in a natural-sounding
way.  Does the following sound better?

  Omitting the dash doesn't introduce any ambiguity; however, it does
  make the text shorter and easier to type, especially for touch
  typists.

> > easier to type, especially for touch typists.  In addition, the
> > transition plan already uses "sha1" in this context.
> >
> > Rename the name of SHA-1 implementation to "sha1".
> >
> > Note that this change creates no backwards compatibility concerns, since
> > we haven't yet used this field in any serialized data formats.
> 
> But we're not going to use this _string_ in any data format either
> because we'll stick to format_id field anyway, right?

We'll use it in extensions.objectFormat and other config files.  But in
anything binary, we'll use format_id.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256

2018-10-17 Thread brian m. carlson
On Wed, Oct 17, 2018 at 08:21:42AM -0400, Derrick Stolee wrote:
> On 10/14/2018 10:19 PM, brian m. carlson wrote:
> > Since the commit-graph code wants to serialize the hash algorithm into
> > the data store, specify a version number for each supported algorithm.
> > Note that we don't use the values of the constants themselves, as they
> > are internal and could change in the future.
> > 
> > Signed-off-by: brian m. carlson 
> > ---
> >   commit-graph.c | 9 -
> >   1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/commit-graph.c b/commit-graph.c
> > index 7a28fbb03f..e587c21bb6 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir)
> >   static uint8_t oid_version(void)
> >   {
> > -   return 1;
> > +   switch (hash_algo_by_ptr(the_hash_algo)) {
> hash_algo_by_ptr is specified as an inline function in hash.h, so this leads
> to a linking error in jch and pu right now.
> 
> I think the fix is simply to add '#include "hash.h"' to the list of
> includes.

hash.h is already included by cache.h, so it should be present.  We
probably need to make it static.  I'll make that change; can you see if
it fixes the problem for you?
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: problem with not being able to enforce git content filters

2018-10-16 Thread brian m. carlson
On Tue, Oct 16, 2018 at 04:48:13PM -0700, Stas Bekman wrote:
> 
> >> And the devs honestly try to do their best to remember to configure the
> >> filters, but for some reason they disappear for them, don't ask me why,
> >> I don't know. This is an open source project team, not a work place.
> > 
> > This sounds like it could be easily solved by continuous integration.
> > You could set up a job on any of a variety of services that checks that
> > a pull request or other commit is clean when when the filter runs.  If
> > it doesn't pass, the code doesn't merge.
> 
> This is an excellent idea wrt to PRs. Thank you, Brian! I will implement
> that.
> 
> It doesn't help with direct commits to master, since CI would be
> detecting it after it was committed. And when that happens we all know
> that already because 'git pull' fails.

Typically projects that have CI set up don't allow direct pushes to
master, since that tends to allow to bypass CI, or they allow it only in
exceptional circumstances (e.g., reverts).  Also, typically most
projects want to have some sort of review before code (resp. documents,
other contributions) are merged.  Most Git hosting platforms, including
GitHub, offer protected branches, so it's something to consider.

There is a possible alternative, and that's that if your project has
some sort of build file (e.g., a Makefile), you can set it up to
automatically insert hooks or git configuration into developers'
systems, optionally only if it's in a Git repository.  For example, you
could add a pre-commit hook that fails if the filters are disabled.

These are the approaches that tend to work well for most projects.  I
tend to prefer the CI approach with restricted branches because often I
want to customize my hooks, but your project will decide what works best
for it.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 2/3] pack-objects (mingw): demonstrate a segmentation fault with large deltas

2018-10-16 Thread brian m. carlson
On Tue, Oct 16, 2018 at 02:02:50PM -0700, Johannes Schindelin via GitGitGadget 
wrote:
> From: Johannes Schindelin 
> 
> There is a problem in the way 9ac3f0e5b3e4 (pack-objects: fix
> performance issues on packing large deltas, 2018-07-22) initializes that
> mutex in the `packing_data` struct. The problem manifests in a
> segmentation fault on Windows, when a mutex (AKA critical section) is
> accessed without being initialized. (With pthreads, you apparently do
> not really have to initialize them?)

This is a good catch.  You do have to initialize a pthread mutex, but on
amd64 Linux the default initializer is all zeros, so since we use a
static variable, it happens to coincidentally have the right value, so
we don't notice.

Thanks for fixing this.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: problem with not being able to enforce git content filters

2018-10-16 Thread brian m. carlson
On Tue, Oct 16, 2018 at 01:36:37PM -0700, Stas Bekman wrote:
> The problem is that it can't be enforced.
> 
> When it's not enforced, we end up with some devs using it and others
> don't, or more often is the same dev sometimes doesn't have it configured.
> 
> When a person has a stripped out notebook checked out, when another
> person commits un-stripped out notebook, it leads to: invalid `git
> status` reports, `git pull` breaks, `git stash` doesn't work, since it
> tries to stash using the filters, and `git pull' can never succeed
> because it thinks that it'll overwrite the local changes, but `git diff`
> returns no changes.
> 
> So the only solution when this happens is to disable the filters, clean
> up the mess, re-enable the filters. Many people just make a new clone -
> ouch!
> 
> And the biggest problem is that it affects all users who may have the
> filters enabled, e.g. because they worked on a PR, and not just devs -
> i.e. the repercussions are much bigger than just a few devs affected.
> 
> We can't use server-side hooks to enforce this because the project is on
> github.
> 
> And the devs honestly try to do their best to remember to configure the
> filters, but for some reason they disappear for them, don't ask me why,
> I don't know. This is an open source project team, not a work place.

This sounds like it could be easily solved by continuous integration.
You could set up a job on any of a variety of services that checks that
a pull request or other commit is clean when when the filter runs.  If
it doesn't pass, the code doesn't merge.

This is what other projects do for style-related and tidiness issues.
Similar approaches can be used in other situations to enforce that all
line endings are LF, or whatever your project desires.

I don't think it's a good idea to provide Git configuration to end
users, even with prompting, since there are many novice users who don't
know what the security implications of various config options are.  I
also personally never would want to be prompted for such a thing, so
even if that were a feature, people would turn if off, and you'd be no
better off than you were before.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v2 00/13] Base SHA-256 implementation

2018-10-16 Thread brian m. carlson
On Tue, Oct 16, 2018 at 01:01:07PM +0900, Junio C Hamano wrote:
> "brian m. carlson"  writes:
> 
> >  t/t0014-hash.sh   |  54 
> >  create mode 100755 t/t0014-hash.sh
> 
> If I am not mistaken, 0014 is already used by another topic in
> flight, and will cause test-lint failure on 'pu'.

This series was written a while back.  I'll rename it.  Feel free to
wait for v3 before picking it up.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256

2018-10-16 Thread brian m. carlson
On Tue, Oct 16, 2018 at 06:09:41PM +0200, Duy Nguyen wrote:
> On Tue, Oct 16, 2018 at 6:01 PM Derrick Stolee  wrote:
> >
> > On 10/16/2018 11:35 AM, Duy Nguyen wrote:
> > > On Mon, Oct 15, 2018 at 4:23 AM brian m. carlson
> > >  wrote:
> > >> Since the commit-graph code wants to serialize the hash algorithm into
> > >> the data store, specify a version number for each supported algorithm.
> > >> Note that we don't use the values of the constants themselves, as they
> > >> are internal and could change in the future.
> > >>
> > >> Signed-off-by: brian m. carlson 
> > >> ---
> > >>   commit-graph.c | 9 -
> > >>   1 file changed, 8 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/commit-graph.c b/commit-graph.c
> > >> index 7a28fbb03f..e587c21bb6 100644
> > >> --- a/commit-graph.c
> > >> +++ b/commit-graph.c
> > >> @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir)
> > >>
> > >>   static uint8_t oid_version(void)
> > >>   {
> > >> -   return 1;
> > >> +   switch (hash_algo_by_ptr(the_hash_algo)) {
> > >> +   case GIT_HASH_SHA1:
> > >> +   return 1;
> > >> +   case GIT_HASH_SHA256:
> > >> +   return 2;
> > > Should we just increase this field to uint32_t and store format_id
> > > instead? That will keep oid version unique in all data formats.
> > Both the commit-graph and multi-pack-index store a single byte for the
> > hash version, so that ship has sailed (without incrementing the full
> > file version number in each format).
> 
> And it's probably premature to add the oid version field when multiple
> hash support has not been fully realized. Now we have different ways
> of storing hash id and need separate mappings.

Honestly, anything in the .git directory that is not the v3 pack indexes
or the loose object file should be in exactly one hash algorithm.  We
could simply just leave this value at 1 all the time and ignore the
field, since we already know what algorithm it will use.

> I would go for incrementing file version. Otherwise maybe we just
> update format_id to be one byte instead, and the way of storing hash
> version in commit-graph will be used everywhere.

It needs to be four bytes for pack files so that it's four-byte aligned.
Otherwise accessing pack index fields will cause alignment issues if we
don't massively rewrite the pack handling code.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


  1   2   3   4   5   6   7   8   9   10   >