Hi Junio,

On Tue, 23 Jan 2018, Junio C Hamano wrote:

> Johannes Schindelin <johannes.schinde...@gmx.de> writes:
> 
> > structure (similar in spirit to --preserve-merges, but with a
> > substantially less-broken design).
> > ...
> > @@ -2785,6 +2787,335 @@ void append_signoff(struct strbuf *msgbuf, int 
> > ignore_footer, unsigned flag)
> >     strbuf_release(&sob);
> >  }
> >  
> > +struct labels_entry {
> > +   struct hashmap_entry entry;
> > +   char label[FLEX_ARRAY];
> > +};
> > +
> > +static int labels_cmp(const void *fndata, const struct labels_entry *a,
> > +                 const struct labels_entry *b, const void *key)
> > +{
> > +   return key ? strcmp(a->label, key) : strcmp(a->label, b->label);
> > +}
> 
> label_oid() accesses state->labels hash using strihash() as the hash
> function, but the final comparison between the entries in the same
> hash buckets are done with case sensitivity.  It is unclear to me if
> that is what was intended, and why.

Heh... you were almost there with your analysis. strihash() is needed for
case-insensitive comparisons, and labels are... ref names.

So the idea (which I implemented only partially) was to make the labels
case-insensitive based on `ignore_case`.

I think the best way forward will be to copy
merge-recursive.c:path_hash().

> > +struct string_entry {
> > +   struct oidmap_entry entry;
> > +   char string[FLEX_ARRAY];
> > +};
> > +
> > +struct label_state {
> > +   struct oidmap commit2label;
> > +   struct hashmap labels;
> > +   struct strbuf buf;
> > +};
> > +
> > +static const char *label_oid(struct object_id *oid, const char *label,
> > +                        struct label_state *state)
> > +{
> > +   struct labels_entry *labels_entry;
> > +   struct string_entry *string_entry;
> > +   struct object_id dummy;
> > +   size_t len;
> > +   int i;
> > +
> > +   string_entry = oidmap_get(&state->commit2label, oid);
> > +   if (string_entry)
> > +           return string_entry->string;
> > +
> > +   /*
> > +    * For "uninteresting" commits, i.e. commits that are not to be
> > +    * rebased, and which can therefore not be labeled, we use a unique
> > +    * abbreviation of the commit name. This is slightly more complicated
> > +    * than calling find_unique_abbrev() because we also need to make
> > +    * sure that the abbreviation does not conflict with any other
> > +    * label.
> > +    *
> > +    * We disallow "interesting" commits to be labeled by a string that
> > +    * is a valid full-length hash, to ensure that we always can find an
> > +    * abbreviation for any uninteresting commit's names that does not
> > +    * clash with any other label.
> > +    */
> > +   if (!label) {
> > +           char *p;
> > +
> > +           strbuf_reset(&state->buf);
> > +           strbuf_grow(&state->buf, GIT_SHA1_HEXSZ);
> > +           label = p = state->buf.buf;
> > +
> > +           find_unique_abbrev_r(p, oid->hash, default_abbrev);
> > +
> > +           /*
> > +            * We may need to extend the abbreviated hash so that there is
> > +            * no conflicting label.
> > +            */
> > +           if (hashmap_get_from_hash(&state->labels, strihash(p), p)) {
> > +                   size_t i = strlen(p) + 1;
> > +
> > +                   oid_to_hex_r(p, oid);
> > +                   for (; i < GIT_SHA1_HEXSZ; i++) {
> > +                           char save = p[i];
> > +                           p[i] = '\0';
> > +                           if (!hashmap_get_from_hash(&state->labels,
> > +                                                      strihash(p), p))
> > +                                   break;
> > +                           p[i] = save;
> > +                   }
> > +           }
> 
> If oid->hash required full 40-hex to disambiguate, then
> find-unique-abbrev would give 40-hex and we'd want the same "-<num>"
> suffix technique employed below to make it consistently unique.  I
> wonder if organizing the function this way ...
> 
>       if (!label)
>               label = oid-to-hex(oid);
> 
>       if (label already exists or full oid) {
>               make it unambiguous;
>       }

It was hard to miss, I agree. The first arm of the if() is not trying to
make a label unambiguous by adding `-<num>`, but by extending the
abbreviated 40-hex until it is unique. (And we guarantee that there is a
solution, as we do not allow valid 40-hex strings to be used as label.)

The reason: if no `label` is provided, we want a valid raw (possibly
abbreviated) commit name. If `label` is provided, we want a valid ref name
(possibly made unique by appending `-<num>`).

Different beasts. Cannot be put into the same if() arm.

> A related tangent.  Does an auto-label given to "uninteresting"
> commit need to be visible to end users?

Yes. The reason for this is the `merge`/`reset` commands when *not*
rebasing cousins. Because in that case, we have to refer to commits that
we could not possibly have `label`ed, because they were never the current
revision. Therefore we cannot assign <name>-<num> labels, but have to use
the abbreviated commit hash.

> > +static int make_script_with_merges(struct pretty_print_context *pp,
> > +                              struct rev_info *revs, FILE *out,
> > +                              unsigned flags)
> > +{
> > + ...
> > +   int abbr = flags & TODO_LIST_ABBREVIATE_CMDS;
> > +   const char *p = abbr ? "p" : "pick", *l = abbr ? "l" : "label",
> > +            *t = abbr ? "t" : "reset", *b = abbr ? "b" : "bud",
> > +            *m = abbr ? "m" : "merge";
> 
> It would be easier to understand if these short named variables are
> reserved only for temporary use, not as constants.  It is not too
> much to spell 
> 
>       fprintf(out, "%s onto\n", cmd_label);
> 
> than
> 
>       fprintf(out, "%s onto\n", l);
> 
> and would save readers from head-scratching, wondering where the
> last assignment to variable "l" is.

Sure.

> > +   oidmap_init(&commit2todo, 0);
> > +   oidmap_init(&state.commit2label, 0);
> > +   hashmap_init(&state.labels, (hashmap_cmp_fn) labels_cmp, NULL, 0);
> > +   strbuf_init(&state.buf, 32);
> > +
> > +   if (revs->cmdline.nr && (revs->cmdline.rev[0].flags & BOTTOM)) {
> > +           struct object_id *oid = &revs->cmdline.rev[0].item->oid;
> > +           FLEX_ALLOC_STR(entry, string, "onto");
> > +           oidcpy(&entry->entry.oid, oid);
> > +           oidmap_put(&state.commit2label, entry);
> > +   }
> > +
> > +   /*
> > +    * First phase:
> > +    * - get onelines for all commits
> > +    * - gather all branch tips (i.e. 2nd or later parents of merges)
> > +    * - label all branch tips
> > +    */
> 
> When an early part of a branch is merged and then the remaining part
> of the same branch is merged again, "branch tip" and "2nd or later
> parents of merges" would become different concepts.  The 2nd parent
> of an early merge is not among the branch tips.
> 
> For the purpose of the "recreate the topology" algorithm, I am
> imagining that you would need not just the tips but all the 2nd and
> subsequent parents of merges, and my quick skimming tells me that
> the following code grabs them correctly.

Yes. I have used an earlier iteration of this patch series to create the
merging-rebase of Git for Windows v2.16.0 and that has a couple of really
good corner cases to exercise this code.

Ciao,
Dscho

Reply via email to