On 4/28/2018 6:35 PM, Jakub Narebski wrote:
Derrick Stolee <dsto...@microsoft.com> writes:

The generation number of a commit is defined recursively as follows:

* If a commit A has no parents, then the generation number of A is one.
* If a commit A has parents, then the generation number of A is one
   more than the maximum generation number among the parents of A.
Very minor nitpick: it would be more readable wrapped differently:

   * If a commit A has parents, then the generation number of A is
     one more than the maximum generation number among parents of A.

Very minor nitpick: possibly "parents", not "the parents", but I am
not native English speaker.

Add a uint32_t generation field to struct commit so we can pass this
information to revision walks. We use three special values to signal
the generation number is invalid:

GENERATION_NUMBER_INFINITY 0xFFFFFFFF
GENERATION_NUMBER_MAX 0x3FFFFFFF
GENERATION_NUMBER_ZERO 0

The first (_INFINITY) means the generation number has not been loaded or
computed. The second (_MAX) means the generation number is too large to
store in the commit-graph file. The third (_ZERO) means the generation
number was loaded from a commit graph file that was written by a version
of git that did not support generation numbers.
Good explanation; I wonder if we want to have it in some shortened form
also in comments, and not only in the commit message.

Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
  alloc.c        | 1 +
  commit-graph.c | 2 ++
  commit.h       | 4 ++++
  3 files changed, 7 insertions(+)
I have reordered patches to make it easier to review.

diff --git a/commit.h b/commit.h
index 23a3f364ed..aac3b8c56f 100644
--- a/commit.h
+++ b/commit.h
@@ -10,6 +10,9 @@
  #include "pretty.h"
#define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF
+#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF
+#define GENERATION_NUMBER_MAX 0x3FFFFFFF
+#define GENERATION_NUMBER_ZERO 0
I wonder if it wouldn't be good to have some short in-line comments
explaining those constants, or a block comment above them.

struct commit_list {
        struct commit *item;
@@ -30,6 +33,7 @@ struct commit {
         */
        struct tree *maybe_tree;
        uint32_t graph_pos;
+       uint32_t generation;
  };
extern int save_commit_buffer;
All right, simple addition of the new field.  Nothing to go wrong here.

Sidenote: With 0x7FFFFFFF being (if I am not wrong) maximum graph_pos
and maximum number of nodes in commit graph, we won't hit 0x3FFFFFFF
generation number limit for all except very, very linear histories.

Both of these limits are far away from being realistic. But we could extend the maximum graph_pos independently from the maximum generation number now that we have the "capped" logic.


diff --git a/alloc.c b/alloc.c
index cf4f8b61e1..e8ab14f4a1 100644
--- a/alloc.c
+++ b/alloc.c
@@ -94,6 +94,7 @@ void *alloc_commit_node(void)
        c->object.type = OBJ_COMMIT;
        c->index = alloc_commit_index();
        c->graph_pos = COMMIT_NOT_FROM_GRAPH;
+       c->generation = GENERATION_NUMBER_INFINITY;
        return c;
  }
All right, start with initializing it with "not from commit-graph" value
after allocation.

diff --git a/commit-graph.c b/commit-graph.c
index 70fa1b25fd..9ad21c3ffb 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -262,6 +262,8 @@ static int fill_commit_in_graph(struct commit *item, struct 
commit_graph *g, uin
        date_low = get_be32(commit_data + g->hash_len + 12);
        item->date = (timestamp_t)((date_high << 32) | date_low);
+ item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+
I guess we should not worry about these "magical constants" sprinkled
here, like "+ 8" above.

Let's examine how it goes, taking a look at commit-graph-format.txt
in Documentation/technical/commit-graph-format.txt

  * The first H (g->hash_len) bytes are for the OID of the root tree.
  * The next 8 bytes are for the positions of the first two parents [...]

So 'commit_data + g->hash_len + 8' is our offset from the start of
commit data.  All right.

   * The next 8 bytes store the generation number of the commit and
     the commit time in seconds since EPOCH.  The generation number
     uses the higher 30 bits of the first 4 bytes. [...]

The higher 30 bits of the 4 bytes, which is 32 bits, means that we need
to shift 32-bit value 2 bits right, so that we get lower 30 bits of
32-bit value.  All right.

   All 4-byte numbers are in network order.

Shouldn't it be ntohl() to convert from network order to host order, and
not get_be32()?  I guess they are the same (network order is big-endian
order), and get_be32() is what rest of git uses...

ntohl() takes a 32-bit value, while get_be32() takes a pointer. This makes pulling network-bytes out of streams much cleaner with get_be32(), so I try to use that whenever possible.

Reply via email to