Re: [PATCH v7 10/13] pack-objects: clarify the use of object_entry::size

2018-03-31 Thread Jeff King
On Sat, Mar 31, 2018 at 06:35:40AM +0200, Duy Nguyen wrote:

> On Fri, Mar 30, 2018 at 11:04 PM, Jeff King  wrote:
> > The subject says "clarify" so I was a little surprised to see code
> > changes. It looks like we're just avoiding reassigning on top of the
> > value repeatedly, which is part of that clarification. It looks like a
> > noop to me.
> 
> Oh well... I was counting on the new name (in_pack_size, which follows
> in_pack_type naming convention) to emphasize it (and the new "delta
> size" comment to point out where in_pack_size contains a delta size.

Just to be clear, my final "it looks like a noop" means "good, it looks
like it is a pure cosmetic change and no change to the behavior."

-Peff


Re: [PATCH v7 10/13] pack-objects: clarify the use of object_entry::size

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 11:04 PM, Jeff King  wrote:
> The subject says "clarify" so I was a little surprised to see code
> changes. It looks like we're just avoiding reassigning on top of the
> value repeatedly, which is part of that clarification. It looks like a
> noop to me.

Oh well... I was counting on the new name (in_pack_size, which follows
in_pack_type naming convention) to emphasize it (and the new "delta
size" comment to point out where in_pack_size contains a delta size.
-- 
Duy


Re: [PATCH v7 10/13] pack-objects: clarify the use of object_entry::size

2018-03-30 Thread Jeff King
On Sat, Mar 24, 2018 at 07:33:50AM +0100, Nguyễn Thái Ngọc Duy wrote:

> While this field most of the time contains the canonical object size,
> there is one case it does not: when we have found that the base object
> of the delta in question is also to be packed, we will very happily
> reuse the delta by copying it over instead of regenerating the new
> delta.
> 
> "size" in this case will record the delta size, not canonical object
> size. Later on in write_reuse_object(), we reconstruct the delta
> header and "size" is used for this purpose. When this happens, the
> "type" field contains a delta type instead of a canonical type.
> Highlight this in the code since it could be tricky to see.

Thanks for digging down here. I have definitely been confused by this in
the past.

The subject says "clarify" so I was a little surprised to see code
changes. It looks like we're just avoiding reassigning on top of the
value repeatedly, which is part of that clarification. It looks like a
noop to me.

-Peff


[PATCH v7 10/13] pack-objects: clarify the use of object_entry::size

2018-03-24 Thread Nguyễn Thái Ngọc Duy
While this field most of the time contains the canonical object size,
there is one case it does not: when we have found that the base object
of the delta in question is also to be packed, we will very happily
reuse the delta by copying it over instead of regenerating the new
delta.

"size" in this case will record the delta size, not canonical object
size. Later on in write_reuse_object(), we reconstruct the delta
header and "size" is used for this purpose. When this happens, the
"type" field contains a delta type instead of a canonical type.
Highlight this in the code since it could be tricky to see.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 11 ---
 pack-objects.h |  4 +++-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index da010f7d19..f054ba9dfa 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1417,6 +1417,7 @@ static void check_object(struct object_entry *entry)
off_t ofs;
unsigned char *buf, c;
enum object_type type;
+   unsigned long in_pack_size;
 
buf = use_pack(p, _curs, entry->in_pack_offset, );
 
@@ -1426,7 +1427,7 @@ static void check_object(struct object_entry *entry)
 */
used = unpack_object_header_buffer(buf, avail,
   ,
-  >size);
+  _pack_size);
if (used == 0)
goto give_up;
 
@@ -1443,6 +1444,7 @@ static void check_object(struct object_entry *entry)
default:
/* Not a delta hence we've already got all we need. */
oe_set_type(entry, entry->in_pack_type);
+   entry->size = in_pack_size;
entry->in_pack_header_size = used;
if (oe_type(entry) < OBJ_COMMIT || oe_type(entry) > 
OBJ_BLOB)
goto give_up;
@@ -1499,6 +1501,7 @@ static void check_object(struct object_entry *entry)
 * circular deltas.
 */
oe_set_type(entry, entry->in_pack_type);
+   entry->size = in_pack_size; /* delta size */
SET_DELTA(entry, base_entry);
entry->delta_size = entry->size;
entry->delta_sibling_idx = base_entry->delta_child_idx;
@@ -1508,13 +1511,15 @@ static void check_object(struct object_entry *entry)
}
 
if (oe_type(entry)) {
+   off_t delta_pos;
+
/*
 * This must be a delta and we already know what the
 * final object type is.  Let's extract the actual
 * object size from the delta header.
 */
-   entry->size = get_size_from_delta(p, _curs,
-   entry->in_pack_offset + 
entry->in_pack_header_size);
+   delta_pos = entry->in_pack_offset + 
entry->in_pack_header_size;
+   entry->size = get_size_from_delta(p, _curs, 
delta_pos);
if (entry->size == 0)
goto give_up;
unuse_pack(_curs);
diff --git a/pack-objects.h b/pack-objects.h
index 415b961245..d23e17050c 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -34,7 +34,9 @@ enum dfs_state {
  *
  * "size" is the uncompressed object size. Compressed size of the raw
  * data for an object in a pack is not stored anywhere but is computed
- * and made available when reverse .idx is made.
+ * and made available when reverse .idx is made. Note that when an
+ * delta is reused, "size" is the uncompressed _delta_ size, not the
+ * canonical one after the delta has been applied.
  *
  * "hash" contains a path name hash which is used for sorting the
  * delta list and also during delta searching. Once prepare_pack()
-- 
2.17.0.rc0.348.gd5a49e0b6f