Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation
On 10/10/2017 9:30 AM, Jeff King wrote: On Tue, Oct 10, 2017 at 09:11:15AM -0400, Derrick Stolee wrote: On 10/10/2017 8:56 AM, Junio C Hamano wrote: Jeff Kingwrites: OK, I think that makes more sense. But note the p->num_objects thing I mentioned. If I do: git pack-objects .git/objects/pack/pack num_objects) return; Technically that also covers open_pack_index() failure, too, but that's a subtlety I don't think we should rely on. True. I notice that the early part of the two functions look almost identical. Do we need error condition handling for the other one, too? I prefer to fix the problem in all code clones when they cause review friction, so I'll send a fifth commit showing just the diff for these packfile issues in sha1_name.c. See patch below. Ah, that answers my earlier question. Junio mean unique_in_pack(). And yeah, I think it suffers from the same problem. Should open_pack_index() return a non-zero status if the packfile is empty? Or, is there a meaningful reason to have empty packfiles? I can't think of a compelling reason to have an empty packfile. But nor do I think we should consider them an error when we can handle them quietly (and returning non-zero status would cause Git to complain on many operations in a repository that has such a file). -Peff Thanks for the comments. I found some typos in my commit messages, too. I plan to send a (hopefully) final version tomorrow (Thursday). It will include: * Make 'pos' unsigned in get_hex_char_from_oid() * Check response from open_pack_index() * Small typos in commit messages If there are other issues, then please let me know. Thanks, -Stolee
Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation
On Tue, Oct 10, 2017 at 09:11:15AM -0400, Derrick Stolee wrote: > On 10/10/2017 8:56 AM, Junio C Hamano wrote: > > Jeff Kingwrites: > > > > > OK, I think that makes more sense. But note the p->num_objects thing I > > > mentioned. If I do: > > > > > >git pack-objects .git/objects/pack/pack > > > > > then I have a pack with zero objects, which I think we'd similarly want > > > to return early from. I.e., I think we need: > > > > > >if (p->num_objects) > > > return; > > > > > > Technically that also covers open_pack_index() failure, too, but that's > > > a subtlety I don't think we should rely on. > > True. I notice that the early part of the two functions look almost > > identical. Do we need error condition handling for the other one, > > too? > > I prefer to fix the problem in all code clones when they cause review > friction, so I'll send a fifth commit showing just the diff for these > packfile issues in sha1_name.c. See patch below. Ah, that answers my earlier question. Junio mean unique_in_pack(). And yeah, I think it suffers from the same problem. > Should open_pack_index() return a non-zero status if the packfile is empty? > Or, is there a meaningful reason to have empty packfiles? I can't think of a compelling reason to have an empty packfile. But nor do I think we should consider them an error when we can handle them quietly (and returning non-zero status would cause Git to complain on many operations in a repository that has such a file). -Peff
Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation
On 10/10/2017 8:56 AM, Junio C Hamano wrote: Jeff Kingwrites: OK, I think that makes more sense. But note the p->num_objects thing I mentioned. If I do: git pack-objects .git/objects/pack/pack num_objects) return; Technically that also covers open_pack_index() failure, too, but that's a subtlety I don't think we should rely on. True. I notice that the early part of the two functions look almost identical. Do we need error condition handling for the other one, too? I prefer to fix the problem in all code clones when they cause review friction, so I'll send a fifth commit showing just the diff for these packfile issues in sha1_name.c. See patch below. Should open_pack_index() return a non-zero status if the packfile is empty? Or, is there a meaningful reason to have empty packfiles? Thanks, -Stolee diff --git a/sha1_name.c b/sha1_name.c index 49ba67955..9f8a33e82 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -153,7 +153,9 @@ static void unique_in_pack(struct packed_git *p, uint32_t num, last, i, first = 0; const struct object_id *current = NULL; - open_pack_index(p); + if (open_pack_index(p) || !p->num_objects) + return; + num = p->num_objects; last = num; while (first < last) { @@ -513,7 +515,9 @@ static void find_abbrev_len_for_pack(struct packed_git *p, uint32_t num, last, first = 0; struct object_id oid; - open_pack_index(p); + if (open_pack_index(p) || !p->num_objects) + return; + num = p->num_objects; last = num; while (first < last) {
Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation
On Tue, Oct 10, 2017 at 09:56:38PM +0900, Junio C Hamano wrote: > Jeff Kingwrites: > > > OK, I think that makes more sense. But note the p->num_objects thing I > > mentioned. If I do: > > > > git pack-objects .git/objects/pack/pack > > > then I have a pack with zero objects, which I think we'd similarly want > > to return early from. I.e., I think we need: > > > > if (p->num_objects) > > return; > > > > Technically that also covers open_pack_index() failure, too, but that's > > a subtlety I don't think we should rely on. > > True. I notice that the early part of the two functions look almost > identical. Do we need error condition handling for the other one, > too? I'm not sure which two you mean. Do you mean find_pack_entry_one() in packfile.c as the other one? If so, I think it is fine in the zero-object case, because it does not do the "this is the sha1 at the position where it _would_ be found" trick, which is what causes us to potentially dereference nonsense. -Peff
Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation
Jeff Kingwrites: > OK, I think that makes more sense. But note the p->num_objects thing I > mentioned. If I do: > > git pack-objects .git/objects/pack/pack > then I have a pack with zero objects, which I think we'd similarly want > to return early from. I.e., I think we need: > > if (p->num_objects) > return; > > Technically that also covers open_pack_index() failure, too, but that's > a subtlety I don't think we should rely on. True. I notice that the early part of the two functions look almost identical. Do we need error condition handling for the other one, too?
Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation
On Tue, Oct 10, 2017 at 08:16:27AM -0400, Derrick Stolee wrote: > > > + mad->init_len = 0; > > > + if (!match) { > > > + nth_packed_object_oid(, p, first); > > > + extend_abbrev_len(, mad); > > If we have zero objects in the pack, what would nth_packed_object_oid() > > be returning here? > > > > So I actually think we do want an early return, not just when > > open_packed_index() fails, but also when p->num_objects is zero. > > > > -Peff > > Sorry about this. I caught this while I was writing my cover letter and > amended my last commit to include the following: > > if (open_pack_index(p)) > return; > > After I amended the commit, I forgot to 'format-patch' again. I can send a > diff between the commits after review has calmed. OK, I think that makes more sense. But note the p->num_objects thing I mentioned. If I do: git pack-objects .git/objects/pack/pack num_objects) return; Technically that also covers open_pack_index() failure, too, but that's a subtlety I don't think we should rely on. -Peff
Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation
On 10/9/2017 9:49 AM, Jeff King wrote: On Sun, Oct 08, 2017 at 02:49:42PM -0400, Derrick Stolee wrote: @@ -505,6 +506,65 @@ static int extend_abbrev_len(const struct object_id *oid, void *cb_data) return 0; } +static void find_abbrev_len_for_pack(struct packed_git *p, +struct min_abbrev_data *mad) +{ + int match = 0; + uint32_t num, last, first = 0; + struct object_id oid; + + open_pack_index(p); + num = p->num_objects; + last = num; + while (first < last) { [...] Your cover letter lists: * Silently skip packfiles that fail to open with open_pack_index() as a change from the previous version. But this looks the same as the last round. I think this _does_ end up skipping such packfiles because p->num_objects will be zero. Is it worth having a comment to that effect (or even just an early return) to make it clear that the situation is intentional? Although... + /* +* first is now the position in the packfile where we would insert +* mad->hash if it does not exist (or the position of mad->hash if +* it does exist). Hence, we consider a maximum of three objects +* nearby for the abbreviation length. +*/ + mad->init_len = 0; + if (!match) { + nth_packed_object_oid(, p, first); + extend_abbrev_len(, mad); If we have zero objects in the pack, what would nth_packed_object_oid() be returning here? So I actually think we do want an early return, not just when open_packed_index() fails, but also when p->num_objects is zero. -Peff Sorry about this. I caught this while I was writing my cover letter and amended my last commit to include the following: if (open_pack_index(p)) return; After I amended the commit, I forgot to 'format-patch' again. I can send a diff between the commits after review has calmed. Thanks, -Stolee
Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation
On Sun, Oct 08, 2017 at 02:49:42PM -0400, Derrick Stolee wrote: > @@ -505,6 +506,65 @@ static int extend_abbrev_len(const struct object_id > *oid, void *cb_data) > return 0; > } > > +static void find_abbrev_len_for_pack(struct packed_git *p, > + struct min_abbrev_data *mad) > +{ > + int match = 0; > + uint32_t num, last, first = 0; > + struct object_id oid; > + > + open_pack_index(p); > + num = p->num_objects; > + last = num; > + while (first < last) { > [...] Your cover letter lists: * Silently skip packfiles that fail to open with open_pack_index() as a change from the previous version. But this looks the same as the last round. I think this _does_ end up skipping such packfiles because p->num_objects will be zero. Is it worth having a comment to that effect (or even just an early return) to make it clear that the situation is intentional? Although... > + /* > + * first is now the position in the packfile where we would insert > + * mad->hash if it does not exist (or the position of mad->hash if > + * it does exist). Hence, we consider a maximum of three objects > + * nearby for the abbreviation length. > + */ > + mad->init_len = 0; > + if (!match) { > + nth_packed_object_oid(, p, first); > + extend_abbrev_len(, mad); If we have zero objects in the pack, what would nth_packed_object_oid() be returning here? So I actually think we do want an early return, not just when open_packed_index() fails, but also when p->num_objects is zero. -Peff