Re: [PATCH v4 04/13] commit-graph: implement write_commit_graph()

2018-02-28 Thread Junio C Hamano
Derrick Stolee  writes:

> diff --git a/commit-graph.h b/commit-graph.h
> new file mode 100644
> index 000..dc8c73a
> --- /dev/null
> +++ b/commit-graph.h
> @@ -0,0 +1,7 @@
> +#ifndef COMMIT_GRAPH_H
> +#define COMMIT_GRAPH_H
> +
> +extern char *write_commit_graph(const char *obj_dir);
> +
> +#endif
> +

Trailing blank line at the end of the file.  Remove it.

t5318 has the same issue.

Thanks.


Re: [PATCH v4 04/13] commit-graph: implement write_commit_graph()

2018-02-26 Thread SZEDER Gábor
On Mon, Feb 19, 2018 at 7:53 PM, Derrick Stolee  wrote:

> +static int if_packed_commit_add_to_list(const struct object_id *oid,
> +   struct packed_git *pack,
> +   uint32_t pos,
> +   void *data)
> +{
> +   struct packed_oid_list *list = (struct packed_oid_list*)data;
> +   enum object_type type;
> +   unsigned long size;
> +   void *inner_data;
> +   off_t offset = nth_packed_object_offset(pack, pos);
> +   inner_data = unpack_entry(pack, offset, , );
> +
> +   if (inner_data)
> +   free(inner_data);

The condition is unnecessary, free() can handle a NULL argument just fine.

(Suggested by Coccinelle and 'contrib/coccinelle/free.cocci.patch'.)


Re: [PATCH v4 04/13] commit-graph: implement write_commit_graph()

2018-02-23 Thread Derrick Stolee



On 2/23/2018 2:30 PM, Junio C Hamano wrote:

Derrick Stolee  writes:


jt/binsearch-with-fanout introduces one when there is a 256-entry
fanout table (not the case here).

The bsearch() method in search.h (and used in
pack-write.c:need_large_offset) does not return the _position_ of a
found element.

Neither of these suit my needs, but I could just be searching for the
wrong strings. Also, I could divert my energies in this area to create
a generic search in the style of jt/binsearch-with-fanout.

... me goes and digs ...

What I had in mind was the one in sha1-lookup.c, actually.  Having
said that, hand-rolling another one is not too huge a deal;
eventually people will notice and consolidate code after the series
stabilizes anyway ;-)


Ah, sha1_pos(). That definitely satisfies my use case. Thanks! My local 
branch has this replacement.





+   num_large_edges++;
+   parent = parent->next;
+   } while (parent);

It feels somewhat wasteful to traverse the commit's parents list
only to count, without populating the octopus table (which I
understand is assumed to be minority case under this design).

Since we are writing the commit graph file in-order, we cannot write
the octopus table until after the chunk lengths are known.

Oh, my "minority case" comment was me wondering "since we expect
there are only a few, why not keep them in memory while we discover
them here, so that the writing phase that come later do not have to
go through all commits again counting their parents?  would it be
more performant and a better trade-off?"  We can certainly do such
an optimization later (iow, it is not all that crucial issue and
certainly I didn't mention the above as something that needs to be
"fixed"--there is nothing broken).


store the octopus table in memory and then dump it into the file
later, but walking the parents is quite fast after all the commits are
loaded. I'm not sure the time optimization merits the extra complexity
here. (I'm happy to revisit this if we do see this performance
lacking.)

P.S. I really like the name "octopus table" and will use that for
informal discussions of this format.

I actually do not mind that name used as the official term.  I find
it far more descriptive and understandable than "long edge" / "large
edge" at least to a Git person.


I will consider using this in the format and design documents.




You're right that int_id isn't great, and your more-specific
"oid_table_pos" shows an extra reason why it isn't great: when we add
the GRAPH_LAST_EDGE bit or set it to GRAPH_PARENT_MISSING, the value
is NOT a table position.

Perhaps I am somewhat biased, but it is quite natural for our
codebase and internal API to say something like this:

 x_pos(table, key) function's return value is the non-negative
 position for the key in the table when the key is there; when it
 returns a negative value, it is (-1 - position) where the "position"
 is the position in the table they key would have been found if
 it was in there.

and store the return value from such a function in a variable called
"pos".  Surely, sometimes "pos" does not have _the_ position, but
that does not make it a bad name.

Saying "MISSING is a special value that denotes 'nothing is here'"
and allowing it to be set to a variable that meant to hold the
position is not such a bad thing, though.  After all, that is how
you use NULL as a special value for a pointer variable ;-).

Same for using the high bit to mean something else.  Taking these
together you would explain "low 31-bit of pos holds the position for
the item in the table.  MISSING is a special value that you can use
to denote there is nothing.  The LAST_EDGE bit denotes that one
group of positions ends there", or something like that.


I think the current name makes the following call very clear:

It is still a strange name nevertheless.


+char *write_commit_graph(const char *obj_dir)
+{
+   struct packed_oid_list oids;
+   struct packed_commit_list commits;
+   struct sha1file *f;
+   int i, count_distinct = 0;
+   DIR *info_dir;
+   struct strbuf tmp_file = STRBUF_INIT;
+   struct strbuf graph_file = STRBUF_INIT;
+   unsigned char final_hash[GIT_MAX_RAWSZ];
+   char *graph_name;
+   int fd;
+   uint32_t chunk_ids[5];
+   uint64_t chunk_offsets[5];
+   int num_chunks;
+   int num_long_edges;
+   struct commit_list *parent;
+
+   oids.nr = 0;
+   oids.alloc = (int)(0.15 * approximate_object_count());

Heh, traditionalist would probably avoid unnecessary use of float
and use something like 1/4 or 1/8 ;-)  After all, it is merely a
ballpark guestimate.


+   num_long_edges = 0;

This again is about naming, but I find it a bit unnatural to call
the edge between a chind and its octopus parents "long".  Individual
edges are not long--the only thing that is long is your "list of

Re: [PATCH v4 04/13] commit-graph: implement write_commit_graph()

2018-02-23 Thread Junio C Hamano
Junio C Hamano  writes:

>> I think the current name makes the following call very clear:
>
> It is still a strange name nevertheless.

Sorry for simply repeating "strange" without spelling out why in the
previous message.  This certainly is subjective and depends on your
cultural background, but in our codebase, I tried to name functions
after "what" they do and "why", rather than "how" they do so.  In a
way, it's the same kind of uneasiness I feel when I see variables
named in hungarian notation.

You would inspect the object and treat 'data' as a list and add to
the object if it is a commit, and if_packed_commit_add_to_list()
certainly is a name that describes all of that well, but does it
give readers a good answer when they wonder why the function is
doing so?  You described with the name of the function how it
collects commits that are in the pack, without explicitly saying
that you want to collect packed commits and that is why you are
inspecting for type and doing so only for commit (i.e.
"if_packed_commit" part of the name) and why you are adding to a
list.


Re: [PATCH v4 04/13] commit-graph: implement write_commit_graph()

2018-02-23 Thread Junio C Hamano
Derrick Stolee  writes:

> jt/binsearch-with-fanout introduces one when there is a 256-entry
> fanout table (not the case here).
>
> The bsearch() method in search.h (and used in
> pack-write.c:need_large_offset) does not return the _position_ of a
> found element.
>
> Neither of these suit my needs, but I could just be searching for the
> wrong strings. Also, I could divert my energies in this area to create
> a generic search in the style of jt/binsearch-with-fanout.

... me goes and digs ...

What I had in mind was the one in sha1-lookup.c, actually.  Having
said that, hand-rolling another one is not too huge a deal;
eventually people will notice and consolidate code after the series
stabilizes anyway ;-)

>>> +   num_large_edges++;
>>> +   parent = parent->next;
>>> +   } while (parent);
>> It feels somewhat wasteful to traverse the commit's parents list
>> only to count, without populating the octopus table (which I
>> understand is assumed to be minority case under this design).
>
> Since we are writing the commit graph file in-order, we cannot write
> the octopus table until after the chunk lengths are known.

Oh, my "minority case" comment was me wondering "since we expect
there are only a few, why not keep them in memory while we discover
them here, so that the writing phase that come later do not have to
go through all commits again counting their parents?  would it be
more performant and a better trade-off?"  We can certainly do such
an optimization later (iow, it is not all that crucial issue and
certainly I didn't mention the above as something that needs to be
"fixed"--there is nothing broken).

> store the octopus table in memory and then dump it into the file
> later, but walking the parents is quite fast after all the commits are
> loaded. I'm not sure the time optimization merits the extra complexity
> here. (I'm happy to revisit this if we do see this performance
> lacking.)
>
> P.S. I really like the name "octopus table" and will use that for
> informal discussions of this format.

I actually do not mind that name used as the official term.  I find
it far more descriptive and understandable than "long edge" / "large
edge" at least to a Git person.

> You're right that int_id isn't great, and your more-specific
> "oid_table_pos" shows an extra reason why it isn't great: when we add
> the GRAPH_LAST_EDGE bit or set it to GRAPH_PARENT_MISSING, the value
> is NOT a table position.

Perhaps I am somewhat biased, but it is quite natural for our
codebase and internal API to say something like this:

x_pos(table, key) function's return value is the non-negative
position for the key in the table when the key is there; when it
returns a negative value, it is (-1 - position) where the "position"
is the position in the table they key would have been found if
it was in there.

and store the return value from such a function in a variable called
"pos".  Surely, sometimes "pos" does not have _the_ position, but
that does not make it a bad name.

Saying "MISSING is a special value that denotes 'nothing is here'"
and allowing it to be set to a variable that meant to hold the
position is not such a bad thing, though.  After all, that is how
you use NULL as a special value for a pointer variable ;-).

Same for using the high bit to mean something else.  Taking these
together you would explain "low 31-bit of pos holds the position for
the item in the table.  MISSING is a special value that you can use
to denote there is nothing.  The LAST_EDGE bit denotes that one
group of positions ends there", or something like that.

> I think the current name makes the following call very clear:

It is still a strange name nevertheless.

>>> +char *write_commit_graph(const char *obj_dir)
>>> +{
>>> +   struct packed_oid_list oids;
>>> +   struct packed_commit_list commits;
>>> +   struct sha1file *f;
>>> +   int i, count_distinct = 0;
>>> +   DIR *info_dir;
>>> +   struct strbuf tmp_file = STRBUF_INIT;
>>> +   struct strbuf graph_file = STRBUF_INIT;
>>> +   unsigned char final_hash[GIT_MAX_RAWSZ];
>>> +   char *graph_name;
>>> +   int fd;
>>> +   uint32_t chunk_ids[5];
>>> +   uint64_t chunk_offsets[5];
>>> +   int num_chunks;
>>> +   int num_long_edges;
>>> +   struct commit_list *parent;
>>> +
>>> +   oids.nr = 0;
>>> +   oids.alloc = (int)(0.15 * approximate_object_count());
>> Heh, traditionalist would probably avoid unnecessary use of float
>> and use something like 1/4 or 1/8 ;-)  After all, it is merely a
>> ballpark guestimate.
>>
>>> +   num_long_edges = 0;
>> This again is about naming, but I find it a bit unnatural to call
>> the edge between a chind and its octopus parents "long".  Individual
>> edges are not long--the only thing that is long is your "list of
>> edges".  Some other codepaths in this patch seems to call the same
>> concept with s/long/large/, which I found somewhat puzzling.
>
> How about 

Re: [PATCH v4 04/13] commit-graph: implement write_commit_graph()

2018-02-23 Thread Derrick Stolee

On 2/20/2018 5:57 PM, Junio C Hamano wrote:

Derrick Stolee  writes:


+#define GRAPH_OID_VERSION_SHA1 1
+#define GRAPH_OID_LEN_SHA1 20

This hardcoded 20 on the right hand side of this #define is probably
problematic.   Unless you are planning to possibly store truncated
hash value for some future hash algorithm, GRAPH_OID_LEN_$HASH should
always be the same as GIT_$HASH_RAWSZ, I would think.  IOW

 #define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ

perhaps?


Yes.




+static void write_graph_chunk_fanout(struct sha1file *f,
+struct commit **commits,
+int nr_commits)
+{
+   uint32_t i, count = 0;
+   struct commit **list = commits;
+   struct commit **last = commits + nr_commits;
+
+   /*
+* Write the first-level table (the list is sorted,
+* but we use a 256-entry lookup to be able to avoid
+* having to do eight extra binary search iterations).
+*/
+   for (i = 0; i < 256; i++) {
+   while (list < last) {
+   if ((*list)->object.oid.hash[0] != i)
+   break;
+   count++;
+   list++;
+   }

If count and list are always incremented in unison, perhaps you do
not need an extra variable "last".  If typeof(nr_commits) is wider
than typeof(count), this loop and the next write-be32 is screwed
anyway ;-)

This comment probably applies equally to some other uses of the same
"compute last pointer to compare with running pointer for
termination" pattern in this patch.


Yes. Also turning i and count into int to match nr_commits.




+   sha1write_be32(f, count);
+   }
+}
+static int commit_pos(struct commit **commits, int nr_commits,
+ const struct object_id *oid, uint32_t *pos)
+{

It is a bit unusual to see something_pos() that returns an integer
that does *NOT* return the position as its return value.  Dropping
the *pos parameter, and returning "mid" when commits[mid] is what we
wanted to see, and otherwise returning "-1 - first" to signal the
position at which we _would_ have found the object, if it were in
the table, would make it more consistent with the usual convention.


I can make this change. I found the boolean return to make the 
consumer's logic simpler, but it isn't that much simpler.



Don't we even have such a generalized binary search helper already
somewhere in the system?


jt/binsearch-with-fanout introduces one when there is a 256-entry fanout 
table (not the case here).


The bsearch() method in search.h (and used in 
pack-write.c:need_large_offset) does not return the _position_ of a 
found element.


Neither of these suit my needs, but I could just be searching for the 
wrong strings. Also, I could divert my energies in this area to create a 
generic search in the style of jt/binsearch-with-fanout.





+static void write_graph_chunk_data(struct sha1file *f, int hash_len,
+  struct commit **commits, int nr_commits)
+{
+   struct commit **list = commits;
+   struct commit **last = commits + nr_commits;
+   uint32_t num_large_edges = 0;
+
+   while (list < last) {
+   struct commit_list *parent;
+   uint32_t int_id;
+   uint32_t packedDate[2];
+
+...
+   if (!parent)
+   int_id = GRAPH_PARENT_NONE;
+   else if (parent->next)
+   int_id = GRAPH_LARGE_EDGES_NEEDED | num_large_edges;
+   else if (!commit_pos(commits, nr_commits,
+   &(parent->item->object.oid), _id))
+   int_id = GRAPH_PARENT_MISSING;
+
+   sha1write_be32(f, int_id);
+
+   if (parent && parent->next) {

This is equivalent to checking "int_id & GRAPH_LARGE_EDGES_NEEDED",
right?  Not suggesting to use the other form of checks, but trying
to see what's the best way to express it in the most readable way.


You're right, we already set the bit above, so let's make use of that 
check. Important to note that GRAPH_LARGE_EDGES_NEEDED & 
GRAPH_PARENT_MISSING == 0.





+   do {
+   num_large_edges++;
+   parent = parent->next;
+   } while (parent);

It feels somewhat wasteful to traverse the commit's parents list
only to count, without populating the octopus table (which I
understand is assumed to be minority case under this design).


Since we are writing the commit graph file in-order, we cannot write the 
octopus table until after the chunk lengths are known. We could store 
the octopus table in memory and then dump it into the file later, but 
walking the parents is quite fast after all the commits are loaded. I'm 
not sure the time optimization merits the extra complexity here. (I'm 
happy to revisit this if we do see this 

Re: [PATCH v4 04/13] commit-graph: implement write_commit_graph()

2018-02-20 Thread Junio C Hamano
Derrick Stolee  writes:

> +#define GRAPH_OID_VERSION_SHA1 1
> +#define GRAPH_OID_LEN_SHA1 20

This hardcoded 20 on the right hand side of this #define is probably
problematic.   Unless you are planning to possibly store truncated
hash value for some future hash algorithm, GRAPH_OID_LEN_$HASH should
always be the same as GIT_$HASH_RAWSZ, I would think.  IOW

#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ

perhaps?

> +static void write_graph_chunk_fanout(struct sha1file *f,
> +  struct commit **commits,
> +  int nr_commits)
> +{
> + uint32_t i, count = 0;
> + struct commit **list = commits;
> + struct commit **last = commits + nr_commits;
> +
> + /*
> +  * Write the first-level table (the list is sorted,
> +  * but we use a 256-entry lookup to be able to avoid
> +  * having to do eight extra binary search iterations).
> +  */
> + for (i = 0; i < 256; i++) {
> + while (list < last) {
> + if ((*list)->object.oid.hash[0] != i)
> + break;
> + count++;
> + list++;
> + }

If count and list are always incremented in unison, perhaps you do
not need an extra variable "last".  If typeof(nr_commits) is wider
than typeof(count), this loop and the next write-be32 is screwed
anyway ;-)

This comment probably applies equally to some other uses of the same
"compute last pointer to compare with running pointer for
termination" pattern in this patch.

> + sha1write_be32(f, count);
> + }
> +}

> +static int commit_pos(struct commit **commits, int nr_commits,
> +   const struct object_id *oid, uint32_t *pos)
> +{

It is a bit unusual to see something_pos() that returns an integer
that does *NOT* return the position as its return value.  Dropping
the *pos parameter, and returning "mid" when commits[mid] is what we
wanted to see, and otherwise returning "-1 - first" to signal the
position at which we _would_ have found the object, if it were in
the table, would make it more consistent with the usual convention.

Don't we even have such a generalized binary search helper already
somewhere in the system?

> +static void write_graph_chunk_data(struct sha1file *f, int hash_len,
> +struct commit **commits, int nr_commits)
> +{
> + struct commit **list = commits;
> + struct commit **last = commits + nr_commits;
> + uint32_t num_large_edges = 0;
> +
> + while (list < last) {
> + struct commit_list *parent;
> + uint32_t int_id;
> + uint32_t packedDate[2];
> +
> +...
> + if (!parent)
> + int_id = GRAPH_PARENT_NONE;
> + else if (parent->next)
> + int_id = GRAPH_LARGE_EDGES_NEEDED | num_large_edges;
> + else if (!commit_pos(commits, nr_commits,
> + &(parent->item->object.oid), _id))
> + int_id = GRAPH_PARENT_MISSING;
> +
> + sha1write_be32(f, int_id);
> +
> + if (parent && parent->next) {

This is equivalent to checking "int_id & GRAPH_LARGE_EDGES_NEEDED",
right?  Not suggesting to use the other form of checks, but trying
to see what's the best way to express it in the most readable way.

> + do {
> + num_large_edges++;
> + parent = parent->next;
> + } while (parent);

It feels somewhat wasteful to traverse the commit's parents list
only to count, without populating the octopus table (which I
understand is assumed to be minority case under this design).

> + }
> +
> + if (sizeof((*list)->date) > 4)
> + packedDate[0] = htonl(((*list)->date >> 32) & 0x3);
> + else
> + packedDate[0] = 0;

OK, the undefined pattern in the previous round is now gone ;-)  Good.

> + packedDate[1] = htonl((*list)->date);
> + sha1write(f, packedDate, 8);
> +
> + list++;
> + }
> +}
> +
> +static void write_graph_chunk_large_edges(struct sha1file *f,
> +   struct commit **commits,
> +   int nr_commits)
> +{
> + struct commit **list = commits;
> + struct commit **last = commits + nr_commits;
> + struct commit_list *parent;
> +
> + while (list < last) {
> + int num_parents = 0;
> + for (parent = (*list)->parents; num_parents < 3 && parent;
> +  parent = parent->next)
> + num_parents++;
> +
> + if (num_parents <= 2) {
> + list++;
> + continue;
> + }
> +
> + /* Since num_parents > 2, this initializer is safe. */
> + for (parent = 

[PATCH v4 04/13] commit-graph: implement write_commit_graph()

2018-02-19 Thread Derrick Stolee
Teach Git to write a commit graph file by checking all packed objects
to see if they are commits, then store the file in the given object
directory.

Signed-off-by: Derrick Stolee 
---
 Makefile   |   1 +
 commit-graph.c | 370 +
 commit-graph.h |   7 ++
 3 files changed, 378 insertions(+)
 create mode 100644 commit-graph.c
 create mode 100644 commit-graph.h

diff --git a/Makefile b/Makefile
index fc40b81..eeaeb6a 100644
--- a/Makefile
+++ b/Makefile
@@ -761,6 +761,7 @@ LIB_OBJS += color.o
 LIB_OBJS += column.o
 LIB_OBJS += combine-diff.o
 LIB_OBJS += commit.o
+LIB_OBJS += commit-graph.o
 LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += config.o
diff --git a/commit-graph.c b/commit-graph.c
new file mode 100644
index 000..f9e39b0
--- /dev/null
+++ b/commit-graph.c
@@ -0,0 +1,370 @@
+#include "cache.h"
+#include "config.h"
+#include "git-compat-util.h"
+#include "pack.h"
+#include "packfile.h"
+#include "commit.h"
+#include "object.h"
+#include "revision.h"
+#include "sha1-lookup.h"
+#include "commit-graph.h"
+
+#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
+#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
+#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
+#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
+#define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
+
+#define GRAPH_DATA_WIDTH 36
+
+#define GRAPH_VERSION_1 0x1
+#define GRAPH_VERSION GRAPH_VERSION_1
+
+#define GRAPH_OID_VERSION_SHA1 1
+#define GRAPH_OID_LEN_SHA1 20
+#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
+#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
+
+#define GRAPH_LARGE_EDGES_NEEDED 0x8000
+#define GRAPH_PARENT_MISSING 0x7fff
+#define GRAPH_EDGE_LAST_MASK 0x7fff
+#define GRAPH_PARENT_NONE 0x7000
+
+#define GRAPH_LAST_EDGE 0x8000
+
+#define GRAPH_FANOUT_SIZE (4 * 256)
+#define GRAPH_CHUNKLOOKUP_WIDTH 12
+#define GRAPH_CHUNKLOOKUP_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH)
+#define GRAPH_MIN_SIZE (GRAPH_CHUNKLOOKUP_SIZE + GRAPH_FANOUT_SIZE + \
+   GRAPH_OID_LEN + 8)
+
+static void write_graph_chunk_fanout(struct sha1file *f,
+struct commit **commits,
+int nr_commits)
+{
+   uint32_t i, count = 0;
+   struct commit **list = commits;
+   struct commit **last = commits + nr_commits;
+
+   /*
+* Write the first-level table (the list is sorted,
+* but we use a 256-entry lookup to be able to avoid
+* having to do eight extra binary search iterations).
+*/
+   for (i = 0; i < 256; i++) {
+   while (list < last) {
+   if ((*list)->object.oid.hash[0] != i)
+   break;
+   count++;
+   list++;
+   }
+
+   sha1write_be32(f, count);
+   }
+}
+
+static void write_graph_chunk_oids(struct sha1file *f, int hash_len,
+  struct commit **commits, int nr_commits)
+{
+   struct commit **list, **last = commits + nr_commits;
+   for (list = commits; list < last; list++)
+   sha1write(f, (*list)->object.oid.hash, (int)hash_len);
+}
+
+static int commit_pos(struct commit **commits, int nr_commits,
+ const struct object_id *oid, uint32_t *pos)
+{
+   uint32_t first = 0, last = nr_commits;
+
+   while (first < last) {
+   uint32_t mid = first + (last - first) / 2;
+   struct object_id *current;
+   int cmp;
+
+   current = &(commits[mid]->object.oid);
+   cmp = oidcmp(oid, current);
+   if (!cmp) {
+   *pos = mid;
+   return 1;
+   }
+   if (cmp > 0) {
+   first = mid + 1;
+   continue;
+   }
+   last = mid;
+   }
+
+   *pos = first;
+   return 0;
+}
+
+static void write_graph_chunk_data(struct sha1file *f, int hash_len,
+  struct commit **commits, int nr_commits)
+{
+   struct commit **list = commits;
+   struct commit **last = commits + nr_commits;
+   uint32_t num_large_edges = 0;
+
+   while (list < last) {
+   struct commit_list *parent;
+   uint32_t int_id;
+   uint32_t packedDate[2];
+
+   parse_commit(*list);
+   sha1write(f, (*list)->tree->object.oid.hash, hash_len);
+
+   parent = (*list)->parents;
+
+   if (!parent)
+   int_id = GRAPH_PARENT_NONE;
+   else if (!commit_pos(commits, nr_commits,
+&(parent->item->object.oid), _id))
+   int_id = GRAPH_PARENT_MISSING;
+
+   sha1write_be32(f, int_id);
+
+   if (parent)
+