Sorry for the delay, I had to collect this information from the team. As you said: Crave doesn't need to know the commit history, it just needs enough information from git to be able to reproduce the filesystem state on to the remote side without having to do a full rsync. We have to account for the possibility that there's any combination of: (common merge base whether on branch or not) + (staged commits) + (unstaged commits)
Patch process: Get the merge base Use the command git rev-list --topo-order --reverse <localBranch> --not --remotes=<remote> -- to get a list of commit ID. If the list is empty, then the HEAD commit in the branch is the common merge base. Use it. If the list is NOT empty, then use the first entry in the list to get one commit ID before it. This is the merge base. With the merge base, creare a series of patches Staged changes: git --no-pager diff --src-prefix=a/ --dst-prefix=b/ --binary --no-ext-diff --ignore-submodules --no-color --cached Local changes: git --no-pager format-patch --src-prefix=a/ --dst-prefix=b/ --no-color --stdout <mergeBase> Send patches list across to the remote side, and then apply using git apply --verbose --whitespace=nowarn <patch> In the past, we did use git am , but it is all git apply now. Also, we don't use format-patch because we had to implement "incremental git patching" for users repeatedly doing incremental builds in the common case, eg: User does incremental builds as changes keep getting "collected". At the end of 3-4 days, there was 200 files modified. Patching 200 files on every incremental caused a mtime update that causes the build system from unnecessarily rebuilding far more than actually needed. A simple git diff allows us to do incremental diff patching. But besides all this explanation, we're looking to improve the entire patching process so that there's no need to repeatedly restart Github Actions. One of the ways we're trying to get debug information is to get access to the Action Runner so that we can look at the git tree so we can get a repro without having to guess every time. Thanks, -Uv On Aug 2 2023, at 10:50 am, Michael Gibney <mich...@michaelgibney.net> wrote: > Hi Uv: > > regarding https://lists.apache.org/thread/l8hgqwkm9vt8yhygoo2blw5j4n5gh121 > Would it be possible for Crave, instead of using `git format-patch` / > `git am`, to instead generate/apply a simple single diff/patch, via > `git diff <merge-base>..<pr_head> > file.patch` / `git apply > file.patch`? > > This would address a couple of problems, and should be more efficient > on IO as well (which is I gather the motivation for the patch-based > approach in the first place). > > In some cases (e.g., https://github.com/apache/solr/pull/1351), Crave > fails to apply patches even when the source PR branch is completely > up-to-date (has merged in the latest commit from `main`). The > workaround of rebasing is not ideal, in that it necessitates > force-pushes, which are considered an anti-pattern in some contexts, > and can generally be an inconvenience for transparency and > collaborative development. > > But separately relevant: Crave doesn't care about the commit history > -- it's only using patches to optimize IO bandwidth. So generating a > single diff between the merge-base and the PR head should iiuc be > exactly what you want, and would be guaranteed to always apply > cleanly. > > Would you mind sharing how you're deriving the "merge-base" commit, > and the command you're using (I assume, `git format-patch`?) to > generate the patch files? Having just merged `main` into a PR branch, > the merge-base of the PR branch and `main` should be: `main`. > > Thank you for your consideration! > Michael >