Re: [RFC PATCH 0/9] Use a structure for object IDs.

2014-05-04 Thread Michael Haggerty
On 05/03/2014 10:12 PM, brian m. carlson wrote:
 This is a preliminary RFC patch series to move all the relevant uses of
 unsigned char [20] to struct object_id.  It should not be applied to any
 branch yet.
 
 The goal of this series to improve type-checking in the codebase and to
 make it easier to move to a different hash function if the project
 decides to do that.  This series does not convert all of the codebase,
 but only parts.  I'm looking for feedback to see if there is consensus
 that this is the right direction before investing a large amount of
 time.
 
 Certain parts of the code have to be converted before others to keep the
 patch sizes small, maintainable, and bisectable, so functions and
 structures that are used across the codebase (e.g. hashcmp and struct
 object) will be converted later.  Conversion has been done in a roughly
 alphabetical order by name of file.
 
 The constants for raw and hex sizes of SHA-1 values are maintained.
 These constants are used where the quantity is the size of an SHA-1
 value, and sizeof(struct object_id) is used wherever memory is to be
 allocated.  This is done to permit the struct to turn into a union later
 if multiple hashes are supported.  I left the names at GIT_OID_RAWSZ and
 GIT_OID_HEXSZ because that's what libgit2 uses and what Junio seemed to
 prefer, but they can be changed later if there's a desire to do that.
 
 I called the structure member oid because it was easily grepable and
 distinct from the rest of the codebase.  It, too, can be changed if we
 decide on a better name.  I specifically did not choose sha1 since it
 looks weird to have sha1-sha1 and I didn't want to rename lots of
 variables.

That means that we will have sha1-oid all over the place, right?
That's unfortunate, because it is exactly backwards from what we would
want in a hypothetical future where OIDs are not necessarily SHA-1s.  In
that future we would certainly have to support SHA-1s in parallel with
the new hash.  So (in that hypothetical future) we will probably want
these expressions to look like oid-sha1, to allow, say, a second struct
or union field oid-sha256 [1].

If that future would come to pass, then we would also want to have
distinct constants like GIT_SHA1_RAWSZ and GIT_SHA256_RAWSZ rather than
the generically-named GIT_OID_RAWSZ.

I think that this patch series will improve the code clarity and type
safety independent of thoughts about supporting different hash
algorithms, so I'm not objecting to your naming decision.  But *if* such
support is part of your long-term hope, then you might ease the future
transition by choosing different names now.

(Maybe renaming local variables sha1 - oid might be a handy way of
making clear which code has been converted to the new style.)

Just to be clear, the above are just some random thoughts for your
consideration, but feel free to disregard them.

In any case, it sure will be a lot of code churn.  If you succeed in
this project, then git blame will probably consider you the author of
about 2/3 of git :-)

Michael

[1] I'm certainly not advocating that we want to support a different
hash, let alone that that hash should be SHA-256; these examples are
just for illustration.

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/9] Use a structure for object IDs.

2014-05-04 Thread Johannes Sixt

Am 04.05.2014 08:35, schrieb Michael Haggerty:

On 05/03/2014 10:12 PM, brian m. carlson wrote:

I specifically did not choose sha1 since it
looks weird to have sha1-sha1 and I didn't want to rename lots of
variables.


That means that we will have sha1-oid all over the place, right?


Only during the transition period. When all functions that currently take 
unsigned char[20] are converted to struct object_id *, this additional 
dereferences go away again.


-- Hannes

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/9] Use a structure for object IDs.

2014-05-04 Thread brian m. carlson
On Sun, May 04, 2014 at 08:35:00AM +0200, Michael Haggerty wrote:
 On 05/03/2014 10:12 PM, brian m. carlson wrote:
  I called the structure member oid because it was easily grepable and
  distinct from the rest of the codebase.  It, too, can be changed if we
  decide on a better name.  I specifically did not choose sha1 since it
  looks weird to have sha1-sha1 and I didn't want to rename lots of
  variables.
 
 That means that we will have sha1-oid all over the place, right?
 That's unfortunate, because it is exactly backwards from what we would
 want in a hypothetical future where OIDs are not necessarily SHA-1s.  In
 that future we would certainly have to support SHA-1s in parallel with
 the new hash.  So (in that hypothetical future) we will probably want
 these expressions to look like oid-sha1, to allow, say, a second struct
 or union field oid-sha256 [1].

As Johannes pointed out, only during the transition period.

 If that future would come to pass, then we would also want to have
 distinct constants like GIT_SHA1_RAWSZ and GIT_SHA256_RAWSZ rather than
 the generically-named GIT_OID_RAWSZ.

You have a point.  I'll make the change.

 I think that this patch series will improve the code clarity and type
 safety independent of thoughts about supporting different hash
 algorithms, so I'm not objecting to your naming decision.  But *if* such
 support is part of your long-term hope, then you might ease the future
 transition by choosing different names now.

It is an eventual goal, but without this series, it's not even worth
discussing since it's too hard to implement.  Even if that doesn't
happen, my hope is that we'll at least improve the safety of the code
and hopefully avoid a bug or two out of it.

 (Maybe renaming local variables sha1 - oid might be a handy way of
 making clear which code has been converted to the new style.)

This is a good idea as well.  I'll walk through the patches and fix
that.

 Just to be clear, the above are just some random thoughts for your
 consideration, but feel free to disregard them.

I appreciate the well-thought-out response.

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187


signature.asc
Description: Digital signature


[RFC PATCH 0/9] Use a structure for object IDs.

2014-05-03 Thread brian m. carlson
This is a preliminary RFC patch series to move all the relevant uses of
unsigned char [20] to struct object_id.  It should not be applied to any
branch yet.

The goal of this series to improve type-checking in the codebase and to
make it easier to move to a different hash function if the project
decides to do that.  This series does not convert all of the codebase,
but only parts.  I'm looking for feedback to see if there is consensus
that this is the right direction before investing a large amount of
time.

Certain parts of the code have to be converted before others to keep the
patch sizes small, maintainable, and bisectable, so functions and
structures that are used across the codebase (e.g. hashcmp and struct
object) will be converted later.  Conversion has been done in a roughly
alphabetical order by name of file.

The constants for raw and hex sizes of SHA-1 values are maintained.
These constants are used where the quantity is the size of an SHA-1
value, and sizeof(struct object_id) is used wherever memory is to be
allocated.  This is done to permit the struct to turn into a union later
if multiple hashes are supported.  I left the names at GIT_OID_RAWSZ and
GIT_OID_HEXSZ because that's what libgit2 uses and what Junio seemed to
prefer, but they can be changed later if there's a desire to do that.

I called the structure member oid because it was easily grepable and
distinct from the rest of the codebase.  It, too, can be changed if we
decide on a better name.  I specifically did not choose sha1 since it
looks weird to have sha1-sha1 and I didn't want to rename lots of
variables.

Comments?

brian m. carlson (9):
  Define a structure for object IDs.
  bisect.c: convert to use struct object_id
  archive.c: convert to use struct object_id
  zip: use GIT_OID_HEXSZ for trailers
  branch.c: convert to use struct object_id
  bulk-checkin.c: convert to use struct object_id
  bundle.c: convert leaf functions to struct object_id
  cache-tree: convert struct cache_tree to use object_id
  diff: convert struct combine_diff_path to object_id

 archive-zip.c  |  4 ++--
 archive.c  | 16 +++
 archive.h  |  1 +
 bisect.c   | 30 ++--
 branch.c   | 16 +++
 builtin/commit.c   |  2 +-
 builtin/fsck.c |  4 ++--
 bulk-checkin.c | 12 +--
 bundle.c   | 38 +--
 cache-tree.c   | 30 ++--
 cache-tree.h   |  3 ++-
 combine-diff.c | 54 +-
 diff-lib.c | 10 +-
 diff.h |  5 +++--
 merge-recursive.c  |  2 +-
 object.h   | 13 +++-
 reachable.c|  2 +-
 sequencer.c|  2 +-
 test-dump-cache-tree.c |  4 ++--
 19 files changed, 131 insertions(+), 117 deletions(-)

-- 
2.0.0.rc0

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/9] Use a structure for object IDs.

2014-05-03 Thread brian m. carlson
On Sat, May 03, 2014 at 08:12:13PM +, brian m. carlson wrote:
 This is a preliminary RFC patch series to move all the relevant uses of
 unsigned char [20] to struct object_id.  It should not be applied to any
 branch yet.
 
 The goal of this series to improve type-checking in the codebase and to
 make it easier to move to a different hash function if the project
 decides to do that.  This series does not convert all of the codebase,
 but only parts.  I'm looking for feedback to see if there is consensus
 that this is the right direction before investing a large amount of
 time.

I would like to point out that to get something with as few calls as
hashclr converted to use struct object_id requires an insane amount of
work, because often major parts of several files have to be converted
first.  So the list should be aware that this will likely be an
extensive series, although it is bisectable, so it could theoretically
be done in batches.

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187


signature.asc
Description: Digital signature