Thank you both for getting back to me. The discussion in the docs about 
flattening was really interesting!  I should note that the git clone / git 
log command pair I provided gives me almost exactly what I want, but I need 
to combine the diffs. It seems to contain the correct changes, and the 
speed is pretty good too.

Let me give an example of the situation I am optimizing for. I apologize in 
advance I am going to use GitHub terms which I know are not pure git, but 
in the end my question is a git question. 

Say you're a developer working in a many-developer repository. Here's the 
sequence:

   - On Day 0 you check out "main" and create "my-topic-branch". You add 
   commits A, B, C, D to that branch. 
   - Now you open a pull request on GitHub asking to merge your branch 
   "my-topic-branch" into "master". 
   - You see a collaborator has landed a change to "main" since you 
   started. So you do "git fetch origin main && git merge main" and make a 
   merge commit in your branch. 
   - Then you add three more commits E, F, G on top of that and push your 
   branch again. So you have: A, B, C, D, (merge main), E, F, G.
   - A coworker has already looked at commits A, B, C and wants to see what 
   you've done then. So they ask GitHub to show the diff from commits D 
   through G (including the merge).

When you do this, GitHub does something which (to me, anyway) is pretty 
magical. You are shown only the changes that you committed to your branch 
in D, E, F, and G. Changes which you merged in, which may or may not 
involve the files in your Pull Request, are not shown at all since they're 
not "yours".

Here's a public example showing a team using this pattern. This one has 
multiple merges, so I may need to find a cleaner example but hopefully this 
makes sense.

   - Consider this PR: 
   https://github.com/firebase/firebase-tools/pull/5478/files
      - This is the full diff (according to GitHub) and we can see exactly 
      one added line in CHANGELOG.md
   - Here's a merge commit: 
   
https://github.com/firebase/firebase-tools/pull/5478/commits/ebce28ceb799f721d36b986705c54cbcd597a27a
      - We can see that on the base branch, "master", a line was added to 
      the *end* of the CHANGELOG.md file. There is no such addition 
      displayed in the full diff.
   - Here's a "magic" diff where I selected three commits (before merge, 
   merge, and after merge): 
   
https://github.com/firebase/firebase-tools/pull/5478/files/28b8a72561b266a2086059c0d9840ab25f03d8ae..b2d89ebd67e3f8c17c4c607c630c18096303096b
      - We can see that the changes from the merge commit are not shown as 
      additions! But they are present as context lines.
   
I need to find a sequence of git commands to produce the same exact diff 
that GitHub produces (and ideally do it very quickly even in a large 
repository) and I just can't figure it out.

Thanks,
Sam

   


On Tuesday, 21 February 2023 at 14:12:48 UTC-8 philip...@iee.email wrote:

> This may also be an issue of the History Simplification process and / or 
> the 'flattening' processes for history linearisation and rebases.
>
> The flattening is a known phenomena and was currently being mentioned on 
> the Git List, so I have noted this there. 
> [1] https://lore.kernel.org/git/a856dd16-9876-509b...@iee.email/ 
> <https://lore.kernel.org/git/a856dd16-9876-509b-6a99-11ea0020633c@iee.email/>
>
> There is a technical discussion of flattening in the docs at 
> https://github.com/git/git/blob/master/Documentation/howto/keep-canonical-history-correct.txt
>  
>
> Do note the original email title  "Pull is mostly evil" ;-) (whole thread 
> at https://lore.kernel.org/git/5363bb9...@xiplink.com/ 
> <https://lore.kernel.org/git/5363bb9f.40...@xiplink.com/>)
>
> Clarifying the " excluding merge commit changes" (or misunderstandings if 
> you've there were some..) would be really useful. The existing devs do have 
> the 'curse of knowledge' so often can't see the problems.
> On Tuesday, February 21, 2023 at 5:29:36 PM UTC Konstantin Khomoutov wrote:
>
>> On Mon, Feb 20, 2023 at 09:27:20PM -0800, 'Samuel Stern' via Git for 
>> human beings wrote: 
>>
>> > This is an *extremely* specific question which I've been trying to get 
>> an 
>> > answer to for quite a while now, so hopefully someone here knows the 
>> answer. 
>> > 
>> > Let's say I am starting from nothing, an empty directory on a server. I 
>> > have: 
>> > 
>> > - The URL for a public git repository 
>> > - Two endpoint SHAs (commits on the same branch) 
>> > 
>> > I want to get the complete diff between those commits *excluding* merge 
>> > commit changes, and I want to do this as fast as possible (so much 
>> faster 
>> > than cloning everything and diffing). 
>> > 
>> > I am able to get almost there with the following sequence: 
>> > 
>> > # Fast clone 
>> > git clone --verbose --no-checkout --filter=blob:limit=250k 
>> --single-branch 
>> > --branch=${branch} --depth=${depth} $REPO_URL 
>> > 
>> > # Get a series of patches 
>> > git log --no-merges --first-parent --patch ${base.sha}..${head.sha} 
>> > 
>> > However I need to get a *single* patch that represents all the changes 
>> > combined, not a series of patches from the log. 
>>
>> Isn't mere 
>>
>> git diff ${head.sha} ${base.sha} 
>>
>> is what you're looking for? 
>>
>> Otherwise, I'm with Philipp in that your statements (rephrased) 
>>
>> - I want to get a single combined change ("patch") describing the literal 
>> set of changes between such and such commits. 
>>
>> - I want changes brought in by merge commits excluded. 
>>
>> Contradict each other: I could in principle envision some algorithm which 
>> would try to incrementally produce a diff as in walks a chain of commits 
>> and 
>> tries to ignore the changes introduced by merge commits located in that 
>> chain, 
>> but leaving aside the fact such an algotithm would be very brittle for 
>> any 
>> real-world cases, I simply see no use for it - even a theoretical one. 
>>
>>
>> You might got trapped by the fact you have found `git log` first in your 
>> search, and this command traverses all individual commits in the subgraph 
>> it's 
>> told to traverse - including "sidelines" brought in by merge commits. 
>> Instead, plain old `git diff` does not traverse anything: it takes two 
>> states 
>> of the project and compares them. 
>>
>>

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/git-users/54b4235d-2266-4c31-8b14-67b726ef7339n%40googlegroups.com.

Reply via email to