Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems
Johannes Schindelin writes: > To summarize, there are two commits recorded for that Message-Id, the > later one not mapped back, and neither is the correct commit that made it > into `master`. > > It would be nice to figure out what went wrong there, and how to fix it > for the future (and also to fix up the existing mis-mappings in `amlog`). I think what happened is that I used to have post-rewrite, but because it did not solve the real issue of multiple commits existing for the same message ID (either because of amending, or because of running "am" multiple times while looking for the best base to contruct a topic branch for the series that contains it) *and* the one that will eventually used in the final history may not be the last one (e.g. I may "am" twice to see if an older base I use in my second attempt is a better one than the base I originally used, and the patches may even apply cleanly to the older history, but may turn out to need semantic adjustment, at which point I would discard that second attempt and use the old commit from the first attempt that built on a newer base), I stopped using it. The mid-to-commit, for it to be relialble, needs to keep mapping for all the commits created from a single message, instead of being the last-one-survives mapping. I just didn't have that much interest back when I decided it was not worth and dropped the post-rewrite, I think.
Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems
Hi Junio, On Wed, 11 Jul 2018, Junio C Hamano wrote: > Johannes Schindelin writes: > > > To summarize, there are two commits recorded for that Message-Id, the > > later one not mapped back, and neither is the correct commit that made it > > into `master`. > > > > It would be nice to figure out what went wrong there, and how to fix it > > for the future (and also to fix up the existing mis-mappings in `amlog`). > > I think what happened is that I used to have post-rewrite, but > because it did not solve the real issue of multiple commits existing > for the same message ID (either because of amending, or because of > running "am" multiple times while looking for the best base to > contruct a topic branch for the series that contains it) *and* the > one that will eventually used in the final history may not be the > last one (e.g. I may "am" twice to see if an older base I use in my > second attempt is a better one than the base I originally used, and > the patches may even apply cleanly to the older history, but may > turn out to need semantic adjustment, at which point I would discard > that second attempt and use the old commit from the first attempt > that built on a newer base), I stopped using it. > > The mid-to-commit, for it to be relialble, needs to keep mapping for > all the commits created from a single message, instead of being the > last-one-survives mapping. I just didn't have that much interest > back when I decided it was not worth and dropped the post-rewrite, I > think. I would like to ask you to reinstate the post-rewrite hook, as it still improves the situation over the current one. Of course, it would be nice to get the automation into a shape where the mappings in `refs/notes/amlog` of commits that hit `next` are fixed, if necessary, to stop referring to commits that did not make it into `next`. Because the *concept* of `amlog` is quite useful, to put back at least *some* of the information we lost by transiting Git commits via mails without any connection to their original commits. It is still the most annoying thing when I contribute patches myself. Ciao, Dscho
Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems
Johannes Schindelin writes: > I would like to ask you to reinstate the post-rewrite hook, as it still > improves the situation over the current one. Without post-rewrite I seem to be getting correct amlog entries for commits created by "git rebase"; do our rebase--am backend still trigger post-applypatch hook in its "am" phase to apply the patches created with "format-patch"?
Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems
Junio C Hamano writes: > Johannes Schindelin writes: > >> I would like to ask you to reinstate the post-rewrite hook, as it still >> improves the situation over the current one. > > Without post-rewrite I seem to be getting correct amlog entries for > commits created by "git rebase"; do our rebase--am backend still > trigger post-applypatch hook in its "am" phase to apply the patches > created with "format-patch"? That was a wrong line of thought that led to a dead end. format-patch won't recreate Message-Id to its output from notes/amlog, so even if the "format-patch --stdout | am" pipeline inside rebase-am triggered the post-applypatch hook, it would not have a chance to carry the notes forward that way. What was really happening was I have $ git config --list | grep amlog notes.rewriteref=refs/notes/amlog and that ought to be sufficient to carry "commit-to-original-msg-id" entries across rebases. And it seems to correctly work. I however suspect that "cherry-pick A..B" may lose the notes, but I haven't checked.
Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems
Hi Junio, On Thu, 19 Jul 2018, Junio C Hamano wrote: > Junio C Hamano writes: > > > Johannes Schindelin writes: > > > >> I would like to ask you to reinstate the post-rewrite hook, as it still > >> improves the situation over the current one. > > > > Without post-rewrite I seem to be getting correct amlog entries for > > commits created by "git rebase"; do our rebase--am backend still > > trigger post-applypatch hook in its "am" phase to apply the patches > > created with "format-patch"? > > That was a wrong line of thought that led to a dead end. format-patch > won't recreate Message-Id to its output from notes/amlog, so even if > the "format-patch --stdout | am" pipeline inside rebase-am triggered > the post-applypatch hook, it would not have a chance to carry the > notes forward that way. > > What was really happening was I have > > $ git config --list | grep amlog > notes.rewriteref=refs/notes/amlog > > and that ought to be sufficient to carry "commit-to-original-msg-id" > entries across rebases. And it seems to correctly work. I however > suspect that "cherry-pick A..B" may lose the notes, but I haven't > checked. AFAICT there is at least one scenario where you run `rebase -i`, the notes get updated, and of course the *reverse mapping* does *not* get updated: you have a mapping both from commit to Message-Id *and crucially* from Message-Id to commit. The automatic rewrite of commit notes in `rebase -i` tackles only the commit notes, obviously, not the reverse. Hence the post-rewrite hook I think I already suggested at least once in a previous reply. Ciao, Dscho
Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems
Johannes Schindelin writes: > AFAICT there is at least one scenario where you run `rebase -i`, the notes > get updated, and of course the *reverse mapping* does *not* get updated: It turns out that I never had a rewrite hook; the notes.rewriteref mechanism is the only thing that has been used to maintain amlog. I've stopped populating the reverse mapping, by the way. The script that I feed a message from gmane or public-inbox when I need to learn the set of commits that resulted from the message instead uses "git grep $message-id notes/amlog". And that is fast enough for my purpose. There is no good reason to abuse the notes mechanism to map a random object-name looking string (i.e. hash result of message id), other than the ease of "quick access" when somebody is making a lot of inquiry, but that "database" does not have to be stored in notes. It certainly does not belong to cycles worth spending by me *while* I work during the say with various history reshaping tools to record and/or update the reverse mapping and that is why my post-applypatch hook no longer has the "reverse map" hack. It is not like anybody (including me) needs realtime up-to-date reverse mapping from amlog while I run my "commit --amend", "rebase -i", etc. and the reverse map is constructable by reversing the forward map, obviously, with a postprocessing. And I think that is a reasonably way forward if anybody wants to have a reverse mapping. The postprocessing can be done either by me before pushing out the amlog ref, or done by any consumer after fetching the amlog ref from me. If I did the postprocessing and refuse to use rewrite hook you wouldn't even know ;-)
Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems
On Fri, Jul 20, 2018 at 12:35 PM Junio C Hamano wrote: > It is not like anybody (including me) needs realtime up-to-date I thought the same for a long time, but contributing to other projects showed me that this is not necessarily the case. Having a real time update, even if it would be just "your patch is labeled 'under discussion'" is beneficial as I would know where it is "in the system". In a way I'd compare our contribution process to having an incredible fine grained paper map. Most of the world moved on to digital maps, that zoom in on-demand. (C.f. spelling out "See banned.h for banned functions" in Documentation/CodingGuidelines is a fine grained detail that is not relevant for *most* of the contributions, but just burdens the bearer of the paper map with weight; if this hint is given dynamically by the compiler or build system at relevant times, it is much better; Regarding the real time aspect here, it is also very good comparison to maps: While I know how to read paper maps (or offline maps) and how to navigate my way, it sure is easier to just follow the online up-to-date navigation service, that tells me what to do. )
Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems
Stefan Beller writes: > On Fri, Jul 20, 2018 at 12:35 PM Junio C Hamano wrote: > >> It is not like anybody (including me) needs realtime up-to-date > > I thought the same for a long time, but contributing to other projects > showed me that this is not necessarily the case. Having a real time > update, even if it would be just "your patch is labeled 'under discussion'" > is beneficial as I would know where it is "in the system". Well, you wouldn't have an access to the up-to-date amlog maintained by me *UNTIL* I push it out at the end of the day. So by definition, you do not have real-time access to the up-to-date state. And also by definition, you do not *NEED* such an access, because you won't see newly created or rewritten commits, whose originating Message-Id is not in the copy of amlog you have (yet), until I push the day's integration result out *AND* you fetch what I pushed out.
Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems
+cc list On Fri, Jul 20, 2018 at 2:29 PM Junio C Hamano wrote: > ... which means that it does not matter if I have an elaborate rewrite hook > that constantly updates the reverse mapping or if the reverse mapping is > made immediately before I push out. You wouldn't even be able to tell any > difference. > > And for that matter, it could even be made on the receiving end by you > after you fetch from me before you need the reverse map information.
Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems
Hi Junio, On Fri, 20 Jul 2018, Junio C Hamano wrote: > Johannes Schindelin writes: > > > AFAICT there is at least one scenario where you run `rebase -i`, the notes > > get updated, and of course the *reverse mapping* does *not* get updated: > > It turns out that I never had a rewrite hook; the notes.rewriteref > mechanism is the only thing that has been used to maintain amlog. > > I've stopped populating the reverse mapping, by the way. That's just great. I ask you to make my life easier by keeping the information correct, and now you just drop it altogether? Just great. Seriously, I am trying to *improve* something here, because I really do care about contributors, and how hard we make it on them. I would not have expected such a backlash against that. > The script that I feed a message from gmane or public-inbox when I need > to learn the set of commits that resulted from the message instead uses > "git grep $message-id notes/amlog". And that is fast enough for my > purpose. Awesome. You might want to make sure that Peff stops advertising the amlog notes, then, though. > There is no good reason to abuse the notes mechanism to map a random > object-name looking string (i.e. hash result of message id), other > than the ease of "quick access" when somebody is making a lot of > inquiry, but that "database" does not have to be stored in notes. Right. And it does not have to be stored anywhere, because nobody used it anyway, right? Well, I hate to break it to you: I just found a really excellent use case, and you are making it very, very hard for me. Deliberately so. I don't know how I deserve that. > It certainly does not belong to cycles worth spending by me *while* > I work during the say with various history reshaping tools to record > and/or update the reverse mapping and that is why my post-applypatch > hook no longer has the "reverse map" hack. > > It is not like anybody (including me) needs realtime up-to-date > reverse mapping from amlog while I run my "commit --amend", "rebase > -i", etc. and the reverse map is constructable by reversing the > forward map, obviously, with a postprocessing. And I think that is > a reasonably way forward if anybody wants to have a reverse mapping. > The postprocessing can be done either by me before pushing out the > amlog ref, or done by any consumer after fetching the amlog ref from > me. If I did the postprocessing and refuse to use rewrite hook you > wouldn't even know ;-) The idea that you publish the amlog notes just for your own use cases, sounds a bit strange to me. So to reiterate: the information you have in amlog is useful, if faulty. Rather than "fixing" it by stopping the useful reverse-mapping, it would make a ton more sense to instate that post-rewrite hook I already drafted for you. Besides, while you spent all of that time to make things harder for me, you still did not look into the most worrisome of my findings: there are apparently Message-Id mappings where *none* of the commits returned by said `git grep` you mentioned above are valid. Not a single one. I will dig out the mail for you on Monday, because I care that much, where I provided one example of a Message-Id with two commits that match in amlog, none of which is actually reachable from any of your public branches, and I also provided the commit that *actually* corresponds to that Message-Id, and it is not annotated. So at least in this case *even you* should have a vested interest in figuring out what goes wrong because even your own use case is affected by it. Ciao, Dscho
Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems
Hi, On Fri, 20 Jul 2018, Stefan Beller wrote: > +cc list > On Fri, Jul 20, 2018 at 2:29 PM Junio C Hamano wrote: > > ... which means that it does not matter if I have an elaborate rewrite hook > > that constantly updates the reverse mapping or if the reverse mapping is > > made immediately before I push out. You wouldn't even be able to tell any > > difference. > > > > And for that matter, it could even be made on the receiving end by you > > after you fetch from me before you need the reverse map information. I refuse to believe that the suggestion to go back to the equivalent of pencil and paper is sincere. We are developing a state of the art source code management tool here, not some hodge podge project of somebody who is trying to teach themselves C. The current state is that there is no reliable "paper trail" of code contributions. The solution to that is absolutely not to abolish what little of a paper trail we *do* have. The solution is to step up the game and correct that automated record. And I would have expected a lot better from the inventor of the pickaxe options: why care so much about source code "archeology" on the one hand, and then burning the library on the other hand? It just does not make sense. Ciao, Dscho
Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems
On Sat, Jul 21, 2018 at 11:56:06PM +0200, Johannes Schindelin wrote: > > The script that I feed a message from gmane or public-inbox when I need > > to learn the set of commits that resulted from the message instead uses > > "git grep $message-id notes/amlog". And that is fast enough for my > > purpose. > > Awesome. You might want to make sure that Peff stops advertising the amlog > notes, then, though. Woah, what did I do now? > > There is no good reason to abuse the notes mechanism to map a random > > object-name looking string (i.e. hash result of message id), other > > than the ease of "quick access" when somebody is making a lot of > > inquiry, but that "database" does not have to be stored in notes. > > Right. And it does not have to be stored anywhere, because nobody used it > anyway, right? If I understand the situation correctly, Junio is saying that he will continue to produce the amlog mapping, and that it contains sufficient information to produce the reverse mapping (which, as an aside, I did not even know existed -- I mostly want to go the other way, from digging in history to a mailing list conversation). E.g., the script below builds and queries an incremental reverse mapping. -- >8 -- #!/usr/bin/perl my $REF = 'refs/notes/amlog'; my $DBFILE = '.git/amlog.rev'; use DB_File; my %h; my $db = tie %h, 'DB_File', $DBFILE, O_CREAT|O_RDWR, 0644 or die "unable to open/create $DBFILE: $!"; my $db_tip = $h{TIP}; chomp(my $rev_tip = `git rev-parse $REF`); if (!defined $db_tip || $db_tip ne $rev_tip) { print STDERR "Updating reverse mapping...\n"; # using -p here is quick and easy, since we know the # shape of the data. Using --raw and cat-file might be less # hacky, though. my @cmd = (qw(git log --format= --reverse -p), $rev_tip); push @cmd, "^$db_tip" if defined $db_tip; open(my $fh, "-|", @cmd); my $commit; while (<$fh>) { if (m{^\+\+\+ b/([0-9a-f/]+)}) { $commit = $1; $commit =~ s/[^0-9a-f]//g; } elsif (/^\+Message-Id: <(.*)>/i) { print STDERR "Imported $commit => $1\n"; $h{$1} = $commit; } } $h{TIP} = $rev_tip; } print "$h{$_} $_\n" for @ARGV; -- >8 -- That stores it in a local dbm. But it could also build a git-notes tree if you really want that. And if I understand what is being said here: > > It certainly does not belong to cycles worth spending by me *while* > > I work during the say with various history reshaping tools to record > > and/or update the reverse mapping and that is why my post-applypatch > > hook no longer has the "reverse map" hack. > > > > It is not like anybody (including me) needs realtime up-to-date > > reverse mapping from amlog while I run my "commit --amend", "rebase > > -i", etc. and the reverse map is constructable by reversing the > > forward map, obviously, with a postprocessing. And I think that is > > a reasonably way forward if anybody wants to have a reverse mapping. > > The postprocessing can be done either by me before pushing out the > > amlog ref, or done by any consumer after fetching the amlog ref from > > me. If I did the postprocessing and refuse to use rewrite hook you > > wouldn't even know ;-) It is not "I refuse to push out a reverse mapping". It is "I could make the reverse mapping before push-out, and you would not need to know or care if I did it all at once, or using a rewrite hook". Though personally, I do not know if there is much point in pushing it out, given that receivers can reverse the mapping themselves. Or is there some argument that there is information in the reverse map that _cannot_ be generated from the forward map? -Peff
Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems
Jeff King writes: > If I understand the situation correctly, Junio is saying that he will > continue to produce the amlog mapping, and that it contains sufficient > information to produce the reverse mapping (which, as an aside, I did > not even know existed -- I mostly want to go the other way, from digging > in history to a mailing list conversation). Yes, the reverse mapping in amlog was an experiment that did not work well in the end. When I use "git am" to make a commit out of a message, a post-applypatch hook picks up the "Message-Id:" from the original message and adds a git note to the resulting commit. This is in line with how the notes are meant to be used. We have a commit object, and a piece of information that we want to associate with the commit object, which is not recorded as a part of the commit object. So we say "git notes add -m 'that piece of info' $commit" (the message-id happens to be that piece of info in this example). And with notes.rewriteRef, "git commit --amend" etc. would copy the piece of info about the original commit to the rewritten commit. Side Note: there are a few workflow elements I do want to keep using but they currently *lose* the mapping info. An obvious one is $ git checkout -b to/pic master && ... review in MUA and then ... $ git am -s mbox && ... review in tree, attempt to build, tweak, etc. $ git format-patch --stdout master..to/pic >P && $ edit P && $ git reset --hard master && $ git am P which is far more versatile and efficient when doing certain transformations on the series than running "rebase -i" and reopening and editing the target files of the patches one by one in each step. But because format-patch does not generate Message-Id header of the original one out of the commit, the post-applypatch hook run by "am" at the end of the steps would not have a chance to record that for the newly created commit. For this one, I think I can use "format-patch --notes=amlog" to produce the patch file and then teach post-applypatch script to pay attention to the Notes annotation without changing anything else to record the message id of the original. Other workflow elements that lose the notes need to be identified and either a fix implemented or a workaround found for each of them. For example, I suspect there is no workaround for "cherry-pick" and it would take a real fix. A reverse mapping entry used to get created by post-applypatch to map the blob that represents the notes text added to the $commit to another text blob that contains the 40-hex of the commit object. This is the experiment that did not work well. As none of the later integrator's work e.g. "commit --amend", "rebase", "cherry-pick", etc. is about rewriting that blob, notes.rewriteRef mechanism would not kick in, and that is understandasble. And these (incomplete) reverse mapping entries get in the way to maintain and correct the forward mapping. When a commit that got unreachable gets expired, I want "git notes prune" to remove notes on them, and I do not want to even think about what should happen to the entries in the notes tree that abuse the mechanism to map blobs that are otherwise *not* even reachable from the main history. A much more important task is to make sure that the forward mapping that annotates invidual commits reachable from 'pu' and/or 'master' is maintained correctly by various tools. From a correctly maintained forward mapping, it should be straight forward to get a reverse mapping if needed. > Though personally, I do not know if there is much point in pushing it > out, given that receivers can reverse the mapping themselves. Before this thread, I was planning to construct and publish the reverse mapping at the end of the day, but do so on a separate notes ref (see above---the hacky abuse gets in the way of maintaining and debugging the forward mapping, but a separate notes-ref that only contains hacks is less worrysome). But I have changed my mind and decided not to generate or publish one. It is sort of similar to the way the pack .idx is constructed only by the receiver [*1*]. > Or is there some argument that there is information in the reverse map > that _cannot_ be generated from the forward map? I know there is no information loss (after all I was the only one who ran that experimental hack), but there is one objection that is still possible, even though I admit that is a weak argument. If a plumbing "diff-{files,tree,index}" family had a sibling "diff-notes" to compare two notes-shaped trees while pretending that the object-name fan-out did not exist (i.e. instead, the trees being compared is without a subtree and full of 40-hex filenames), then it would be less cumbersome to incrementally update the revers
Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems
On Mon, Jul 23, 2018 at 06:50:46PM -0700, Junio C Hamano wrote: > Side Note: there are a few workflow elements I do want to > keep using but they currently *lose* the mapping info. An > obvious one is > > $ git checkout -b to/pic master && > ... review in MUA and then ... > $ git am -s mbox && > ... review in tree, attempt to build, tweak, etc. > $ git format-patch --stdout master..to/pic >P && > $ edit P && > $ git reset --hard master && > $ git am P > > which is far more versatile and efficient when doing certain > transformations on the series than running "rebase -i" and > reopening and editing the target files of the patches one by > one in each step. But because format-patch does not > generate Message-Id header of the original one out of the > commit, the post-applypatch hook run by "am" at the end of > the steps would not have a chance to record that for the > newly created commit. > > For this one, I think I can use "format-patch --notes=amlog" > to produce the patch file and then teach post-applypatch > script to pay attention to the Notes annotation without > changing anything else to record the message id of the > original. Yes. I wonder if it would make sense to teach format-patch/am a micro-format to automatically handle this case. I.e., some machine-readable way of passing the notes in the email message. Of course it's easy to design a format that covers the relatively restricted form of these amlog notes, and much harder to cover the general case. > Other workflow elements that lose the notes need > to be identified and either a fix implemented or a > workaround found for each of them. For example, I suspect > there is no workaround for "cherry-pick" and it would take a > real fix. I think the existing notes.rewriteRef is probably a good match here. I can definitely think of notes you wouldn't want to cherry-pick, but I'm having trouble coming up with an example that should survive a rebase but not a cherry-pick. > And these (incomplete) reverse mapping entries get in the way to > maintain and correct the forward mapping. When a commit that got > unreachable gets expired, I want "git notes prune" to remove notes > on them, and I do not want to even think about what should happen to > the entries in the notes tree that abuse the mechanism to map blobs > that are otherwise *not* even reachable from the main history. Right, I think the notes tree is a poor distribution method for that reason. > > Though personally, I do not know if there is much point in pushing it > > out, given that receivers can reverse the mapping themselves. > > Before this thread, I was planning to construct and publish the > reverse mapping at the end of the day, but do so on a separate notes > ref (see above---the hacky abuse gets in the way of maintaining and > debugging the forward mapping, but a separate notes-ref that only > contains hacks is less worrysome). But I have changed my mind and > decided not to generate or publish one. It is sort of similar to > the way the pack .idx is constructed only by the receiver [*1*]. Yes, the pack .idx was the same mental model I had when writing my earlier message. > > Or is there some argument that there is information in the reverse map > > that _cannot_ be generated from the forward map? > > I know there is no information loss (after all I was the only one > who ran that experimental hack), but there is one objection that is > still possible, even though I admit that is a weak argument. I wondered if you might have a case like this (building as we go): - message-id M becomes commit X - we write the forward map X->M - we write the reverse map M->X - during a rewrite (e.g., --amend), commit X becomes commit Y - we write the forward map Y->M - we write the reverse map M->Y The difference between that result and an inverted map created at the end is that we know that M->Y is the final result. Whereas by looking at the inverted map, we do not know which of M->X and M->Y is correct. In fact they are _both_ correct. But only one of X and Y would eventually get merged (both may make it into the repo's of people fetching from you if we imagine that X is on "pu" and you push between the two steps). So I think the inverted mapping is not actually one-to-one, and in either case you'd want to retain all possible matches (pruning only when a commit is eventually dropped from the forward mapping, which rewritten things from "pu" would eventually do). And in that case it does not matter if you generate it incrementally or all at once. > If a plumbing "diff-{files,tree,index}" family had a sibling > "diff-notes" to compare two notes-shaped trees while pretending that > the object-name fan-out did not exist (i.e. instead, the trees being > compared is without