Re: [PATCH v7 06/13] pack-objects: move in_pack out of struct object_entry

2018-03-31 Thread Duy Nguyen
On Sat, Mar 31, 2018 at 06:20:07AM -0400, Jeff King wrote:
> On Sat, Mar 31, 2018 at 06:51:10AM +0200, Duy Nguyen wrote:
> 
> > >> +#define IN_PACK(obj) oe_in_pack(_pack, obj)
> > >
> > > How come this one gets a macro, but the earlier conversions don't?
> > >
> > > I guess the problem is that oe_in_pack() is defined in the generic
> > > pack-objects.h, but _pack is only in builtin/pack-objects.c?
> > >
> > > I wonder if it would be that bad to just say oe_in_pack(_pack, obj)
> > > everywhere. It's longer, but it makes the code slightly less magical to
> > > read.
> > 
> > Longer was exactly why I added these macros (with the hope that the
> > macro upper case names already ring a "it's magical" bell). Should I
> > drop all these macros? Some code becomes a lot more verbose though.
> 
> I'm on the fence. I agree that the macro screams "magical". I just
> sometimes see a macro and think something really weird and
> unfunction-like is going on. But really we're just replacing a default
> parameter.
> 
> So I dunno. If you get rid of the macros and I look at it, I give even
> odds that I'll say "yech, put them back!". :)

It would look like this (on top of v8). I think the "_pack" part is
most distracting when it's used as part of an expression (or a
function argument). I probably went overboard with SET_ macros though.

-- 8< --
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index b5bba2c228..dec849b755 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -29,18 +29,6 @@
 #include "list.h"
 #include "packfile.h"
 
-#define IN_PACK(obj) oe_in_pack(_pack, obj)
-#define SIZE(obj) oe_size(_pack, obj)
-#define SET_SIZE(obj,size) oe_set_size(_pack, obj, size)
-#define DELTA_SIZE(obj) oe_delta_size(_pack, obj)
-#define DELTA(obj) oe_delta(_pack, obj)
-#define DELTA_CHILD(obj) oe_delta_child(_pack, obj)
-#define DELTA_SIBLING(obj) oe_delta_sibling(_pack, obj)
-#define SET_DELTA(obj, val) oe_set_delta(_pack, obj, val)
-#define SET_DELTA_SIZE(obj, val) oe_set_delta_size(_pack, obj, val)
-#define SET_DELTA_CHILD(obj, val) oe_set_delta_child(_pack, obj, val)
-#define SET_DELTA_SIBLING(obj, val) oe_set_delta_sibling(_pack, obj, val)
-
 static const char *pack_usage[] = {
N_("git pack-objects --stdout [...] [<  | < 
]"),
N_("git pack-objects [...]  [<  | < 
]"),
@@ -137,14 +125,14 @@ static void *get_delta(struct object_entry *entry)
buf = read_sha1_file(entry->idx.oid.hash, , );
if (!buf)
die("unable to read %s", oid_to_hex(>idx.oid));
-   base_buf = read_sha1_file(DELTA(entry)->idx.oid.hash, ,
+   base_buf = read_sha1_file(oe_delta(_pack, entry)->idx.oid.hash, 
,
  _size);
if (!base_buf)
die("unable to read %s",
-   oid_to_hex((entry)->idx.oid));
+   oid_to_hex(_delta(_pack, entry)->idx.oid));
delta_buf = diff_delta(base_buf, base_size,
   buf, size, _size, 0);
-   if (!delta_buf || delta_size != DELTA_SIZE(entry))
+   if (!delta_buf || delta_size != oe_delta_size(_pack, entry))
die("delta size changed");
free(buf);
free(base_buf);
@@ -295,15 +283,15 @@ static unsigned long write_no_reuse_object(struct 
hashfile *f, struct object_ent
FREE_AND_NULL(entry->delta_data);
entry->z_delta_size = 0;
} else if (entry->delta_data) {
-   size = DELTA_SIZE(entry);
+   size = oe_delta_size(_pack, entry);
buf = entry->delta_data;
entry->delta_data = NULL;
-   type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
+   type = (allow_ofs_delta && oe_delta(_pack, 
entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
} else {
buf = get_delta(entry);
-   size = DELTA_SIZE(entry);
-   type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
+   size = oe_delta_size(_pack, entry);
+   type = (allow_ofs_delta && oe_delta(_pack, 
entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
}
 
@@ -327,7 +315,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
 * encoding of the relative offset for the delta
 * base from this object's position in the pack.
 */
-   off_t ofs = entry->idx.offset - DELTA(entry)->idx.offset;
+   off_t ofs = entry->idx.offset - oe_delta(_pack, 
entry)->idx.offset;
unsigned pos = sizeof(dheader) - 1;
dheader[pos] = ofs & 127;
while (ofs >>= 7)
@@ -353,7 +341,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
return 0;
}
hashwrite(f, header, hdrlen);
-   hashwrite(f, 

Re: [PATCH v7 06/13] pack-objects: move in_pack out of struct object_entry

2018-03-31 Thread Jeff King
On Sat, Mar 31, 2018 at 06:51:10AM +0200, Duy Nguyen wrote:

> >> +#define IN_PACK(obj) oe_in_pack(_pack, obj)
> >
> > How come this one gets a macro, but the earlier conversions don't?
> >
> > I guess the problem is that oe_in_pack() is defined in the generic
> > pack-objects.h, but _pack is only in builtin/pack-objects.c?
> >
> > I wonder if it would be that bad to just say oe_in_pack(_pack, obj)
> > everywhere. It's longer, but it makes the code slightly less magical to
> > read.
> 
> Longer was exactly why I added these macros (with the hope that the
> macro upper case names already ring a "it's magical" bell). Should I
> drop all these macros? Some code becomes a lot more verbose though.

I'm on the fence. I agree that the macro screams "magical". I just
sometimes see a macro and think something really weird and
unfunction-like is going on. But really we're just replacing a default
parameter.

So I dunno. If you get rid of the macros and I look at it, I give even
odds that I'll say "yech, put them back!". :)

-Peff


Re: [PATCH v7 06/13] pack-objects: move in_pack out of struct object_entry

2018-03-30 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 10:48 PM, Jeff King  wrote:
> On Sat, Mar 24, 2018 at 07:33:46AM +0100, Nguyễn Thái Ngọc Duy wrote:
>
>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>> index e1244918a5..b41610569e 100644
>> --- a/builtin/pack-objects.c
>> +++ b/builtin/pack-objects.c
>> @@ -29,6 +29,8 @@
>>  #include "list.h"
>>  #include "packfile.h"
>>
>> +#define IN_PACK(obj) oe_in_pack(_pack, obj)
>
> How come this one gets a macro, but the earlier conversions don't?
>
> I guess the problem is that oe_in_pack() is defined in the generic
> pack-objects.h, but _pack is only in builtin/pack-objects.c?
>
> I wonder if it would be that bad to just say oe_in_pack(_pack, obj)
> everywhere. It's longer, but it makes the code slightly less magical to
> read.

Longer was exactly why I added these macros (with the hope that the
macro upper case names already ring a "it's magical" bell). Should I
drop all these macros? Some code becomes a lot more verbose though.

>> +static void prepare_in_pack_by_idx(struct packing_data *pdata)
>> +{
>> + struct packed_git **mapping, *p;
>> + int cnt = 0, nr = 1 << OE_IN_PACK_BITS;
>> +
>> + if (getenv("GIT_TEST_FULL_IN_PACK_ARRAY")) {
>> + /*
>> +  * leave in_pack_by_idx NULL to force in_pack[] to be
>> +  * used instead
>> +  */
>> + return;
>> + }
>> +
>> + ALLOC_ARRAY(mapping, nr);
>> + mapping[cnt++] = NULL; /* zero index must be mapped to NULL */
>
> Why? I guess because index==0 is a sentinel for "we're using the small
> index numbers?"

No because by default all values in object_entry is zero (or NULL). If
I remember correctly, some code will skip setting in_pack pointer to
leave it NULL. When we convert it to an index, it should also point to
NULL.

>> + prepare_packed_git();
>> + for (p = packed_git; p; p = p->next, cnt++) {
>> + if (cnt == nr) {
>> + free(mapping);
>> + return;
>> + }
>> + p->index = cnt;
>> + mapping[cnt] = p;
>> + }
>> + pdata->in_pack_by_idx = mapping;
>> +}
>
> What happens if we later have to reprepare_packed_git() and end up with
> more packs? We only call this for the first pack.
>
> It may well be handled, but I'm having trouble following the code to see
> if it is. And I doubt that case is covered by our test suite (since it
> inherently involves a race).

I don't think I covered this case. But since "index" field in
packed_git should be zero for the new packs, we could check and either
add it to in_pack_by_idx[].

>>  /*
>> + * The size of struct nearly determines pack-objects's memory
>> + * consumption. This struct is packed tight for that reason. When you
>> + * add or reorder something in this struct, think a bit about this.
>> + *
>
> It's funny to see this warning come in the middle. Should it be part of
> the final struct reordering at the end?

It was at the end in some version, the I shuffled the patches and
forgot about this one :)
-- 
Duy


Re: [PATCH v7 06/13] pack-objects: move in_pack out of struct object_entry

2018-03-30 Thread Jeff King
On Sat, Mar 24, 2018 at 07:33:46AM +0100, Nguyễn Thái Ngọc Duy wrote:

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index e1244918a5..b41610569e 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -29,6 +29,8 @@
>  #include "list.h"
>  #include "packfile.h"
>  
> +#define IN_PACK(obj) oe_in_pack(_pack, obj)

How come this one gets a macro, but the earlier conversions don't?

I guess the problem is that oe_in_pack() is defined in the generic
pack-objects.h, but _pack is only in builtin/pack-objects.c?

I wonder if it would be that bad to just say oe_in_pack(_pack, obj)
everywhere. It's longer, but it makes the code slightly less magical to
read.

> @@ -1074,7 +1076,7 @@ static void create_object_entry(const struct object_id 
> *oid,
>   else
>   nr_result++;
>   if (found_pack) {
> - entry->in_pack = found_pack;
> + oe_set_in_pack(_pack, entry, found_pack);
>   entry->in_pack_offset = found_offset;
>   }

it's funny to see in_pack as an external thing, but in_pack_offset still
in the struct. I guess there's nothing to be gained there, since the
offset really does need to be individual (and large).

> diff --git a/cache.h b/cache.h
> index 862bdff83a..b90feb3802 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1635,6 +1635,7 @@ extern struct packed_git {
>   int index_version;
>   time_t mtime;
>   int pack_fd;
> + int index;  /* for builtin/pack-objects.c */
>   unsigned pack_local:1,
>pack_keep:1,
>freshened:1,

It's pretty gross to infect this global struct. But I'm not sure there's
an easier way to do it with constant-time lookups. You'd have to build
the packed_git index preemptively in pack-objects, and then always just
pass around the index numbers.  And even that is kind of dicey, since
the packed_git list can grow while we're running.

The alternative is a hash table mapping packed_git pointers into numeric
indices. Yuck.

> +static void prepare_in_pack_by_idx(struct packing_data *pdata)
> +{
> + struct packed_git **mapping, *p;
> + int cnt = 0, nr = 1 << OE_IN_PACK_BITS;
> +
> + if (getenv("GIT_TEST_FULL_IN_PACK_ARRAY")) {
> + /*
> +  * leave in_pack_by_idx NULL to force in_pack[] to be
> +  * used instead
> +  */
> + return;
> + }
> +
> + ALLOC_ARRAY(mapping, nr);
> + mapping[cnt++] = NULL; /* zero index must be mapped to NULL */

Why? I guess because index==0 is a sentinel for "we're using the small
index numbers?"

> + prepare_packed_git();
> + for (p = packed_git; p; p = p->next, cnt++) {
> + if (cnt == nr) {
> + free(mapping);
> + return;
> + }
> + p->index = cnt;
> + mapping[cnt] = p;
> + }
> + pdata->in_pack_by_idx = mapping;
> +}

What happens if we later have to reprepare_packed_git() and end up with
more packs? We only call this for the first pack.

It may well be handled, but I'm having trouble following the code to see
if it is. And I doubt that case is covered by our test suite (since it
inherently involves a race).

>  /*
> + * The size of struct nearly determines pack-objects's memory
> + * consumption. This struct is packed tight for that reason. When you
> + * add or reorder something in this struct, think a bit about this.
> + *

It's funny to see this warning come in the middle. Should it be part of
the final struct reordering at the end?

-Peff


Re: [PATCH v7 06/13] pack-objects: move in_pack out of struct object_entry

2018-03-24 Thread Duy Nguyen
On Sat, Mar 24, 2018 at 10:42 AM, Ævar Arnfjörð Bjarmason
 wrote:
>
> On Sat, Mar 24 2018, Nguyễn Thái Ngọc Duy wrote:
>
>> + if (pack->in_pack_by_idx) {
>> + if (p->index <= 0)
>> + die("BUG: found_pack should be NULL "
>> + "instead of having non-positive 
>> index");
>> + e->in_pack_idx = p->index;
>> + } else
>
> The indentation after the die() here is wrong. GCC complaining about it:
>
> ./pack-objects.h: In function ‘oe_set_in_pack’:
> ./pack-objects.h:203:3: warning: this ‘if’ clause does not guard... 
> [-Wmisleading-indentation]
>if (p->index <= 0)
>^~
> ./pack-objects.h:206:4: note: ...this statement, but the latter is 
> misleadingly indented as if it were guarded by the ‘if’
> e->in_pack_idx = p->index;
> ^

Thanks. My gcc reported the same thing but only when not used with ccache, hm...
-- 
Duy


Re: [PATCH v7 06/13] pack-objects: move in_pack out of struct object_entry

2018-03-24 Thread Ævar Arnfjörð Bjarmason

On Sat, Mar 24 2018, Nguyễn Thái Ngọc Duy wrote:

> Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
> pack. Use an index instead since the number of packs should be
> relatively small.
>
> This limits the number of packs we can handle to 1k. Since we can't be
> sure people can never run into the situation where they have more than
> 1k pack files. Provide a fall back route for it.
>
> If we find out they have too many packs, the new in_pack_by_idx[]
> array (which has at most 1k elements) will not be used. Instead we
> allocate in_pack[] array that holds nr_objects elements. This is
> similar to how the optional in_pack_pos field is handled.
>
> The new simple test is just to make sure the too-many-packs code path
> is at least executed. The true test is running
>
> make test GIT_TEST_FULL_IN_PACK_ARRAY=1

Aside from the tiny nit in 87efk9yfm2@evledraar.gmail.com this looks
good to me.

I've tested this with the same method noted in
87vadpxv27@evledraar.gmail.com against the version before it on
similar test data, and got:

 * Reduction in user time by 0.42%
 * Reduction in system time by 3.17%
 * Reduction in RSS by 0.003209%
 * Reduction in page faults by 0% & 0.006539% (time(1) reports two different 
numbers)
 * Reduction in the I of I/O by 99.504132% (note: from 4840 bytes to 24 bytes, 
so some fluke...)
 * Reduction in the O of I/O by 0%

I.e. there's no notable change at all, but I thought it would be useful
to re-run this for context.


Re: [PATCH v7 06/13] pack-objects: move in_pack out of struct object_entry

2018-03-24 Thread Ævar Arnfjörð Bjarmason

On Sat, Mar 24 2018, Nguyễn Thái Ngọc Duy wrote:

> + if (pack->in_pack_by_idx) {
> + if (p->index <= 0)
> + die("BUG: found_pack should be NULL "
> + "instead of having non-positive index");
> + e->in_pack_idx = p->index;
> + } else

The indentation after the die() here is wrong. GCC complaining about it:

./pack-objects.h: In function ‘oe_set_in_pack’:
./pack-objects.h:203:3: warning: this ‘if’ clause does not guard... 
[-Wmisleading-indentation]
   if (p->index <= 0)
   ^~
./pack-objects.h:206:4: note: ...this statement, but the latter is 
misleadingly indented as if it were guarded by the ‘if’
e->in_pack_idx = p->index;
^