Sorry for the delay, I had to collect this information from the team.

As you said: Crave doesn't need to know the commit history, it just needs 
enough information from git to be able to reproduce the filesystem state on to 
the remote side without having to do a full rsync.
We have to account for the possibility that there's any combination of: (common 
merge base whether on branch or not) + (staged commits) + (unstaged commits)

Patch process:
Get the merge base
Use the command git rev-list --topo-order --reverse <localBranch> --not 
--remotes=<remote> -- to get a list of commit ID.

If the list is empty, then the HEAD commit in the branch is the common merge 
base. Use it.

If the list is NOT empty, then use the first entry in the list to get one 
commit ID before it. This is the merge base.

With the merge base, creare a series of patches
Staged changes: git --no-pager diff --src-prefix=a/ --dst-prefix=b/ --binary 
--no-ext-diff --ignore-submodules --no-color --cached

Local changes: git --no-pager format-patch --src-prefix=a/ --dst-prefix=b/ 
--no-color --stdout <mergeBase>

Send patches list across to the remote side, and then apply using git apply 
--verbose --whitespace=nowarn <patch>

In the past, we did use git am , but it is all git apply now.
Also, we don't use format-patch because we had to implement "incremental git 
patching" for users repeatedly doing incremental builds in the common case, eg:
User does incremental builds as changes keep getting "collected". At the end of 
3-4 days, there was 200 files modified. Patching 200 files on every incremental 
caused a mtime update that causes the build system from unnecessarily 
rebuilding far more than actually needed.
A simple git diff allows us to do incremental diff patching.

But besides all this explanation, we're looking to improve the entire patching 
process so that there's no need to repeatedly restart Github Actions.
One of the ways we're trying to get debug information is to get access to the 
Action Runner so that we can look at the git tree so we can get a repro without 
having to guess every time.

Thanks,
-Uv

On Aug 2 2023, at 10:50 am, Michael Gibney <mich...@michaelgibney.net> wrote:
> Hi Uv:
>
> regarding https://lists.apache.org/thread/l8hgqwkm9vt8yhygoo2blw5j4n5gh121
> Would it be possible for Crave, instead of using `git format-patch` /
> `git am`, to instead generate/apply a simple single diff/patch, via
> `git diff <merge-base>..<pr_head> > file.patch` / `git apply
> file.patch`?
>
> This would address a couple of problems, and should be more efficient
> on IO as well (which is I gather the motivation for the patch-based
> approach in the first place).
>
> In some cases (e.g., https://github.com/apache/solr/pull/1351), Crave
> fails to apply patches even when the source PR branch is completely
> up-to-date (has merged in the latest commit from `main`). The
> workaround of rebasing is not ideal, in that it necessitates
> force-pushes, which are considered an anti-pattern in some contexts,
> and can generally be an inconvenience for transparency and
> collaborative development.
>
> But separately relevant: Crave doesn't care about the commit history
> -- it's only using patches to optimize IO bandwidth. So generating a
> single diff between the merge-base and the PR head should iiuc be
> exactly what you want, and would be guaranteed to always apply
> cleanly.
>
> Would you mind sharing how you're deriving the "merge-base" commit,
> and the command you're using (I assume, `git format-patch`?) to
> generate the patch files? Having just merged `main` into a PR branch,
> the merge-base of the PR branch and `main` should be: `main`.
>
> Thank you for your consideration!
> Michael
>

Reply via email to