Hi Paul
This might be a bit long, noticed a few more items to be added to the documentation for git-upstream. Good thing documentation is the major item for the next release ;-) After an initial attempt at writing one response, I've decided to break it up a bit into a couple of emails, because there a few topics in here and one of them is pretty lengthy and the others follow somewhat independent issues. For this one I'll stick to just the 2nd and 4th paragraphs since they are related. And apologies in advance for anyone that didn't need to see such a dive into git log behaviour here. :-) On 17/11/16 12:47, Paul Bourke wrote: > Hi Darragh / git-upstream community, > > I've been looking at a way to easily view a log of what commits made > since the last upstream import when managing a branch with git-upstream. > Right now this can be hard to do - something like 'git log > upstream/master..HEAD' shows a lot of duplicate commits reasons I don't > understand well enough to explain. I'll go into that in a little more detail with a separate response, to help anyone else reading or in the future searching. Definitely should have been documented before now... O_o > Darragh had suggested using the --dense option will help here, so 'git > log --dense upstream/master..HEAD -- .' seems to generally give the > right result. I've put up a patch to add this as a git-upstream command > in the form of "carrying" at https://review.openstack.org/#/c/381669/. > However, it seems there are in fact cases where the above command will > show incorrect commits, and I'm struggling a bit to fully grok this. Unfortunately explanation is going to be a bit lengthy, and there are pieces of git here I'm not fully sure I understand what it's doing, and plan to ask the git-users google group for a bit more info. Possibly may require going to the git mailing list to get a complete picture (g...@vger.kernel.org). Until recently, I didn't fully grok the git-log(1) manpage, https://www.kernel.org/pub/software/scm/git/docs/git-log.html, specifically the section about "History Simplification", naturally I didn't realise that until I understood what it was saying better. Mainly to do with how TREESAME and !TREESAME apply to simplifying the history walked and displayed. TREESAME means that the resulting git tree is identical and you can have multiple commits referencing the same TREE SHA1 in git. It's one of the ways it avoids duplicating the same information, identical directories result in the same SHA1, so each commit to a git tree only adds blobs (for files) and tree (for directory) objects for what is different. !TREESAME basically means there was something different between the two directories or trees in the repo, and as git allows for path limiting, this can be applied to any subdirectory in a git repository as well. In the review I was referencing a particular scenario that we've come across locally (see the url above): 1) Dev uploads a patch 'A' for review against branch tracking latest from stable/mitaka (presumably also sent it upstream) 2) Nightly sync jobs run and bring down all the patches that landed in the stable branch upstream over the last 24 hours 2.1) Git upstream is able to replay the current local changes without conflict, dropping those that landed upstream automatically 2.2) Git-upstream merges the result into the tracking branch, and it now becomes the new head (assuming it passes a gate check locally) 3) Following day or so, reviewer approves patch 'A' to land and it merges without conflict. See following for the actual test scenario, it's a bit more complex than what I've discribed and the letters used differ: https://github.com/openstack/git-upstream/blob/a52f01f89401e79db95ad9fee9195de90e5a71f2/git_upstream/tests/searchers/scenarios/changes_upload_prior_to_import.yaml If git-upstream didn't do a special type of merge, and just produced a new branch each time, we'd have to re-target patch 'A' to the new branch, alternatively we could also have triggered a rebase of any outstanding changes to avoid the particular git graph this produces. Both of these seemed like unnecessary confusion and make-work for developers/reviewers just trying to focus on landing changes. The result is a git graph like the following (hope this displays correctly): A---- / \ -----M-------X---N---O local/mitaka / / / X'--- (rebase/cherry-pick of X onto G) / / -E---F---G upstream stable/mitaka branch M = the merge commit created by git-upstream for the previous sync X = local change not yet upsteam X' = replay of the local change X onto the latest stable/mitaka N = most recent merge commit created by git-upstream A is the commit created locally by a developer to fix a issue affecting local deployments (and may need to be reworked for upstream) O = merge commit created in Gerrit by approval of 'A' When git-upstream created N, it made the tree of N TREESAME to X', which includes commits F & G, while ignoring anything that could have come from M & X, which still retaining the history. This allows existing patches in Gerrit to be merged without needing rebases, and developers get to continue to use "git-pull --rebase" locally without needing to worry about the target branch. However as you've spotted it does complicate the view of git-log a bit. Normally if 'A' was not involved: ---M-------X---N local/mitaka / / / X'--- (rebase of X onto G) / / -E---F---G upstreams stable/mitaka branch And the command being run is: git log --dense stable/mitaka..local/mitaka This would look at N and check whether the tree of each parent was TREESAME (identical) or !TREESAME and only cares about commits that are not TREESAME to any parent. However what is not clear is that the behaviour between walking merge commits and normal commits is not the same: The only reference to this behaviour is listed with the default mode: "If the commit was a merge, and it was TREESAME to one parent, follow only that parent" What this means is that looking at the parents of N, X is ignored, and X' is followed because X' is TREESAME to N, or alternatively because X contributes nothing to the contents of N it's irrelevant. With '--dense' added N is removed from the final display since it's the same as X' and thus adds nothing to the final tree. The important thing to spot is that the man page describes two different things, one is what history is *walked* to find commits, and how TREESAME/!TREESAME as applied to merge commits is different there to what is *displayed* as contributing to the tree. In this case the only difference between the two branches is X'. And is exactly what you've seen. However when A comes into the picture. O: neither parent is TREESAME so both get walked N: X is TREESAME and so is excluded, X' is walked X': is not TREESAME to G and so is included G: is stable/mitaka, stop walking and ignore G in output A: is not TREESAME to X, continue <----- this is where the problem is X: is not TREESAME to M, continue M: is TREESAME to E so ignore the other parent E: is on stable/mitaka, stop walking and ignore E in output In this case you can see that X and X' will both be included in the output, because git only concerns itself with the TREESAME/!TREESAME when applied to directly related commits. Exclusion of X on the basis of being TREESAME to N isn't used as a condition for deciding whether to walk either it or any of it's parents when you encounter another path to the same part of the graph. Besides talking to the git maintainers and suggesting that if going to exclude uninteresting commits based on TREESAME & !TREESAME, then the graph including A represents a case that should be considered, I've not seen any way to solve this simply. (I do plan to mention this to them, in fact I'll be using some of this email to help explain it). To solve, need to identify commit X programmatically and exclude both it and any parent from the output by telling git to ignore everything reachable from X. Since "git log" also allows you to specify revisions to be excluded, it's possible to get the right information using the following: git log --dense stable/mitaka..local/mitaka --not X All that remains is to be able to find X to pass it to the Git-upstream can find the previous import merges, it has to, in order to solve this to determine which commits are changes to be carried on each import. And in git terms X == N^1, or in non git notation the first parent of the most recent import. This means that for the log output, can use the existing git-upstream code to determine N and then call: git log --dense stable/mitaka..local/mitaka --not N^1 Since git-upstream is opinionated about the parent order of import merge commits, this will work consistently. > My current understanding is we have a branch, that consists of a mixture > of upstream commits from previous imports, and custom commits. We want > to show a list of commits added since the last import. However, if those > commits also contain commits from another non upstream branch, we want > to exclude those? This makes sense with the example of say a packaging > branch, but what if commits came from say a feature branch? Does it also > make sense to ignore those? Response in separate email, cuz frankly this one is long enough already... and I've the final paragraph to go yet! > Secondly, can you recap exactly how we find the most recent import > commit? How does excluding the parent of this commit combined with > --dense give the correct result? From your comment in Gerrit you > identify it as the commit with the subject "[I] Merging E1 into E", but > I can't see exactly how you spot this. Locally in the same scenario, > taking the parent of the commit marked that subject and plugging it into > the command is not showing the same graph as you pasted. Think two questions are being asked; 1) how did I know which commit was the previous import 2) how does git-upstream identify such commits Short answer to the first part; because the test scenario was written to have "I" represent a previous import merge commit. The test case referenced in the review is simulating a particular history layout to exercise git-upstream replaying changes when one of them is based on a commit from before the most recent import merge commit. It was looking that that problem that revealed some of the behaviour of git log's history simplification. The test scenario is described by a yaml list in git_upstream/tests/searchers/scenarios/changes_upload_prior_to_import.yaml see: https://github.com/openstack/git-upstream/blob/a52f01f89401e79db95ad9fee9195de90e5a71f2/git_upstream/tests/searchers/scenarios/changes_upload_prior_to_import.yaml This is combined with some code in git_upstream/tests/base.py see the following for the code involved in taking the list and generated a git repository from it: https://github.com/openstack/git-upstream/blob/a52f01f89401e79db95ad9fee9195de90e5a71f2/git_upstream/tests/base.py#L196-L319 https://github.com/openstack/git-upstream/blob/a52f01f89401e79db95ad9fee9195de90e5a71f2/git_upstream/tests/base.py#L49-L92 Within the tree definition, there is an entry "[I, [E, =E1]]", which means: 1) Create a merge commit "I" with parents "E" and "E1", merged in that order 2) The "=" symbol means that the resulting tree contents of the merge commit "I" should be exactly the same as "E1" instead of a combination of "E" and "E1" The code in git_upstream/tests/base.py has an algorithm that understands how to workout the root commits of such graphs (no parents), and will create the commits only after any required parents exist. I won't go into detail on it here, you can look at the referenced urls above to see what it does. So this answers, when I was discussing the scenario how I knew that "[I] Merging E1 into E" identified the previous import. I noticed that I forgot to give a directly link to that file in the review, or even include the ascii representation when discussing, oops! However, I think you're also possibly asking, how does git-upstream identify that "I" was the previous import? The answer to that lies in the idea of TREESAME and !TREESAME. If git-upstream encounters a merge commit, and one of the parents is the exact same tree SHA1 (so exactly the same contents and file/directory metadata) as the merge commit itself, then it assumes that the parent commit that is TREESAME was a previous import, and the commit that had a different contents that were discarded was the previous history. There's a little bit more to it than that, since it has to cover some edge cases seen as well. The code for this is at: https://github.com/openstack/git-upstream/blob/a52f01f89401e79db95ad9fee9195de90e5a71f2/git_upstream/lib/searchers.py#L69-L249 Specifically the code that identifies the previous import is: https://github.com/openstack/git-upstream/blob/a52f01f89401e79db95ad9fee9195de90e5a71f2/git_upstream/lib/searchers.py#L153-L179 Taking another look at it, I think moving that second block out to a separate method would make it easier to complete the 'carrying' command by allowing you to simply call it to do the heavy work of identifying the previous import merge commit. > Thanks in advance for anything that might help cut through some of the > confusion. > > Cheers, > -Paul > Sorry that ended up being a bit wordy, hope it helps explain things a bit better, and hope you can help be get some of those points into the docs for a useful reference in the future :D -- Regards, Darragh Bailey IRC: electrofelix "Nothing is foolproof to a sufficiently talented fool" - Unknown
signature.asc
Description: OpenPGP digital signature
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev