Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation

2017-10-11 Thread Derrick Stolee

On 10/10/2017 9:30 AM, Jeff King wrote:

On Tue, Oct 10, 2017 at 09:11:15AM -0400, Derrick Stolee wrote:


On 10/10/2017 8:56 AM, Junio C Hamano wrote:

Jeff King  writes:


OK, I think that makes more sense. But note the p->num_objects thing I
mentioned. If I do:

git pack-objects .git/objects/pack/pack num_objects)
return;

Technically that also covers open_pack_index() failure, too, but that's
a subtlety I don't think we should rely on.

True.  I notice that the early part of the two functions look almost
identical.  Do we need error condition handling for the other one,
too?

I prefer to fix the problem in all code clones when they cause review
friction, so I'll send a fifth commit showing just the diff for these
packfile issues in sha1_name.c. See patch below.

Ah, that answers my earlier question. Junio mean unique_in_pack(). And
yeah, I think it suffers from the same problem.


Should open_pack_index() return a non-zero status if the packfile is empty?
Or, is there a meaningful reason to have empty packfiles?

I can't think of a compelling reason to have an empty packfile. But nor
do I think we should consider them an error when we can handle them
quietly (and returning non-zero status would cause Git to complain on
many operations in a repository that has such a file).

-Peff


Thanks for the comments. I found some typos in my commit messages, too.

I plan to send a (hopefully) final version tomorrow (Thursday). It will 
include:


* Make 'pos' unsigned in get_hex_char_from_oid()

* Check response from open_pack_index()

* Small typos in commit messages

If there are other issues, then please let me know.

Thanks,
-Stolee


Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation

2017-10-10 Thread Jeff King
On Tue, Oct 10, 2017 at 09:11:15AM -0400, Derrick Stolee wrote:

> On 10/10/2017 8:56 AM, Junio C Hamano wrote:
> > Jeff King  writes:
> > 
> > > OK, I think that makes more sense. But note the p->num_objects thing I
> > > mentioned. If I do:
> > > 
> > >git pack-objects .git/objects/pack/pack  > > 
> > > then I have a pack with zero objects, which I think we'd similarly want
> > > to return early from. I.e., I think we need:
> > > 
> > >if (p->num_objects)
> > >   return;
> > > 
> > > Technically that also covers open_pack_index() failure, too, but that's
> > > a subtlety I don't think we should rely on.
> > True.  I notice that the early part of the two functions look almost
> > identical.  Do we need error condition handling for the other one,
> > too?
> 
> I prefer to fix the problem in all code clones when they cause review
> friction, so I'll send a fifth commit showing just the diff for these
> packfile issues in sha1_name.c. See patch below.

Ah, that answers my earlier question. Junio mean unique_in_pack(). And
yeah, I think it suffers from the same problem.

> Should open_pack_index() return a non-zero status if the packfile is empty?
> Or, is there a meaningful reason to have empty packfiles?

I can't think of a compelling reason to have an empty packfile. But nor
do I think we should consider them an error when we can handle them
quietly (and returning non-zero status would cause Git to complain on
many operations in a repository that has such a file).

-Peff


Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation

2017-10-10 Thread Derrick Stolee

On 10/10/2017 8:56 AM, Junio C Hamano wrote:

Jeff King  writes:


OK, I think that makes more sense. But note the p->num_objects thing I
mentioned. If I do:

   git pack-objects .git/objects/pack/pack num_objects)
return;

Technically that also covers open_pack_index() failure, too, but that's
a subtlety I don't think we should rely on.

True.  I notice that the early part of the two functions look almost
identical.  Do we need error condition handling for the other one,
too?


I prefer to fix the problem in all code clones when they cause review 
friction, so I'll send a fifth commit showing just the diff for these 
packfile issues in sha1_name.c. See patch below.


Should open_pack_index() return a non-zero status if the packfile is 
empty? Or, is there a meaningful reason to have empty packfiles?


Thanks,
-Stolee


diff --git a/sha1_name.c b/sha1_name.c
index 49ba67955..9f8a33e82 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -153,7 +153,9 @@ static void unique_in_pack(struct packed_git *p,
    uint32_t num, last, i, first = 0;
    const struct object_id *current = NULL;

-   open_pack_index(p);
+   if (open_pack_index(p) || !p->num_objects)
+   return;
+
    num = p->num_objects;
    last = num;
    while (first < last) {
@@ -513,7 +515,9 @@ static void find_abbrev_len_for_pack(struct 
packed_git *p,

    uint32_t num, last, first = 0;
    struct object_id oid;

-   open_pack_index(p);
+   if (open_pack_index(p) || !p->num_objects)
+   return;
+
    num = p->num_objects;
    last = num;
    while (first < last) {



Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation

2017-10-10 Thread Jeff King
On Tue, Oct 10, 2017 at 09:56:38PM +0900, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > OK, I think that makes more sense. But note the p->num_objects thing I
> > mentioned. If I do:
> >
> >   git pack-objects .git/objects/pack/pack  >
> > then I have a pack with zero objects, which I think we'd similarly want
> > to return early from. I.e., I think we need:
> >
> >   if (p->num_objects)
> > return;
> >
> > Technically that also covers open_pack_index() failure, too, but that's
> > a subtlety I don't think we should rely on.
> 
> True.  I notice that the early part of the two functions look almost
> identical.  Do we need error condition handling for the other one,
> too?

I'm not sure which two you mean. Do you mean find_pack_entry_one() in
packfile.c as the other one? If so, I think it is fine in the
zero-object case, because it does not do the "this is the sha1 at the
position where it _would_ be found" trick, which is what causes us to
potentially dereference nonsense.

-Peff


Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation

2017-10-10 Thread Junio C Hamano
Jeff King  writes:

> OK, I think that makes more sense. But note the p->num_objects thing I
> mentioned. If I do:
>
>   git pack-objects .git/objects/pack/pack 
> then I have a pack with zero objects, which I think we'd similarly want
> to return early from. I.e., I think we need:
>
>   if (p->num_objects)
>   return;
>
> Technically that also covers open_pack_index() failure, too, but that's
> a subtlety I don't think we should rely on.

True.  I notice that the early part of the two functions look almost
identical.  Do we need error condition handling for the other one,
too?


Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation

2017-10-10 Thread Jeff King
On Tue, Oct 10, 2017 at 08:16:27AM -0400, Derrick Stolee wrote:

> > > + mad->init_len = 0;
> > > + if (!match) {
> > > + nth_packed_object_oid(, p, first);
> > > + extend_abbrev_len(, mad);
> > If we have zero objects in the pack, what would nth_packed_object_oid()
> > be returning here?
> > 
> > So I actually think we do want an early return, not just when
> > open_packed_index() fails, but also when p->num_objects is zero.
> > 
> > -Peff
> 
> Sorry about this. I caught this while I was writing my cover letter and
> amended my last commit to include the following:
> 
>     if (open_pack_index(p))
>         return;
> 
> After I amended the commit, I forgot to 'format-patch' again. I can send a
> diff between the commits after review has calmed.

OK, I think that makes more sense. But note the p->num_objects thing I
mentioned. If I do:

  git pack-objects .git/objects/pack/pack num_objects)
return;

Technically that also covers open_pack_index() failure, too, but that's
a subtlety I don't think we should rely on.

-Peff


Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation

2017-10-10 Thread Derrick Stolee

On 10/9/2017 9:49 AM, Jeff King wrote:

On Sun, Oct 08, 2017 at 02:49:42PM -0400, Derrick Stolee wrote:


@@ -505,6 +506,65 @@ static int extend_abbrev_len(const struct object_id *oid, 
void *cb_data)
return 0;
  }
  
+static void find_abbrev_len_for_pack(struct packed_git *p,

+struct min_abbrev_data *mad)
+{
+   int match = 0;
+   uint32_t num, last, first = 0;
+   struct object_id oid;
+
+   open_pack_index(p);
+   num = p->num_objects;
+   last = num;
+   while (first < last) {
[...]

Your cover letter lists:

   * Silently skip packfiles that fail to open with open_pack_index()

as a change from the previous version. But this looks the same as the
last round. I think this _does_ end up skipping such packfiles because
p->num_objects will be zero. Is it worth having a comment to that
effect (or even just an early return) to make it clear that the
situation is intentional?

Although...


+   /*
+* first is now the position in the packfile where we would insert
+* mad->hash if it does not exist (or the position of mad->hash if
+* it does exist). Hence, we consider a maximum of three objects
+* nearby for the abbreviation length.
+*/
+   mad->init_len = 0;
+   if (!match) {
+   nth_packed_object_oid(, p, first);
+   extend_abbrev_len(, mad);

If we have zero objects in the pack, what would nth_packed_object_oid()
be returning here?

So I actually think we do want an early return, not just when
open_packed_index() fails, but also when p->num_objects is zero.

-Peff


Sorry about this. I caught this while I was writing my cover letter and 
amended my last commit to include the following:


    if (open_pack_index(p))
        return;

After I amended the commit, I forgot to 'format-patch' again. I can send 
a diff between the commits after review has calmed.


Thanks,
-Stolee


Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation

2017-10-09 Thread Jeff King
On Sun, Oct 08, 2017 at 02:49:42PM -0400, Derrick Stolee wrote:

> @@ -505,6 +506,65 @@ static int extend_abbrev_len(const struct object_id 
> *oid, void *cb_data)
>   return 0;
>  }
>  
> +static void find_abbrev_len_for_pack(struct packed_git *p,
> +  struct min_abbrev_data *mad)
> +{
> + int match = 0;
> + uint32_t num, last, first = 0;
> + struct object_id oid;
> +
> + open_pack_index(p);
> + num = p->num_objects;
> + last = num;
> + while (first < last) {
> [...]

Your cover letter lists:

  * Silently skip packfiles that fail to open with open_pack_index()

as a change from the previous version. But this looks the same as the
last round. I think this _does_ end up skipping such packfiles because
p->num_objects will be zero. Is it worth having a comment to that
effect (or even just an early return) to make it clear that the
situation is intentional?

Although...

> + /*
> +  * first is now the position in the packfile where we would insert
> +  * mad->hash if it does not exist (or the position of mad->hash if
> +  * it does exist). Hence, we consider a maximum of three objects
> +  * nearby for the abbreviation length.
> +  */
> + mad->init_len = 0;
> + if (!match) {
> + nth_packed_object_oid(, p, first);
> + extend_abbrev_len(, mad);

If we have zero objects in the pack, what would nth_packed_object_oid()
be returning here?

So I actually think we do want an early return, not just when
open_packed_index() fails, but also when p->num_objects is zero.

-Peff