Hi All, In case it is helpful for others, I wanted to summarize the high-level workflow that we have been following at MathWorks to manage our mathworks/arrow fork of apache/arrow.
We use the mathworks/arrow fork as a sort of "staging" area, where we can experiment, perform preliminary code review, and prepare pull requests that will eventually be shared with the upstream apache/arrow project. Although we have multiple contributors to the mathworks/arrow fork, the steps we have been following should more or less work for personal forks too. In general, mathworks/arrow follows the "Branch and Pull" workflow described here: http://www.goring.org/resources/project-management.html Although we are still continuously refining this workflow based on experience and feedback, it seems to work pretty well for our purposes. The basic process we follow is described below: 1. Clone the mathworks/arrow fork to your local machine. $ git clone https://github.com/mathworks/arrow.git 2. Set up a remote to point to the upstream apache/arrow repository. $ cd arrow/ $ git remote add apache https://github.com/apache/arrow.git 3. Sync the mathworks/arrow:master branch with the upstream apache/arrow:master branch. $ git pull --ff-only apache master $ git push 4. Create a new feature branch off the up-to-date mathworks/arrow:master branch. This branch will eventually be used for making a pull request against the upstream apache/arrow:master branch. All pull requests should be associated with an existing Apache JIRA Issue. We recommend naming your feature branch after the associated Apache JIRA Issue. For example, for an Apache JIRA issue named ARROW-1234, you could name your branch arrow_1234. $ git checkout -b arrow_1234 $ git push --set-upstream origin arrow_1234 5. If mathworks/arrow:master moves ahead (because it is re-synced with the upstream apache/arrow:master branch), then rebase the feature branch with the mathworks/arrow:master branch. This will ensure that any commits made on the feature branch will be developed "on top" of the latest mathworks/arrow:master commit history. $ git rebase master NOTE: We use a "work in progress" branch as detailed in steps 6 and 8 for managing preliminary code reviews in mathworks/arrow. Therefore, these steps may not be necessary if you are managing a personal fork. 6. Create a "work in progress" branch named <feature>_wip (where wip = "work in progress"). This branch will be used as a "staging" area for doing iterative feature development. $ git checkout -b arrow_1234_wip $ git push --set-upstream origin arrow_1234_wip 7. Do work on the <feature>_wip branch. Commit as little or as often as you like. Push the changes to mathworks/arrow:<feature>_wip as often as you feel is appropriate. $ git commit -am "Fix issue with ..." $ git push 8. When you are ready, create a preliminary code review by making a pull request from the <feature>_wip branch to the corresponding <feature> branch. Once the code review is complete, accept the pull request into the <feature> branch. 9. Once the <feature> branch is finalized, make a pull request from the mathworks/arrow:<feature> branch against the upstream apache/arrow:master branch. Collaborate with the rest of the Arrow community to address feedback on your changes. 10. If you see any CI failures, inspect the Travis CI and/or AppVeyor logs to determine whether the failures are occurring due to your changes or some unrelated recent commits to apache/arrow:master. If the failures are occurring due to your changes, you can make any necessary changes on your local clone of the mathworks/arrow:<feature> branch. Pushing this branch to mathworks/arrow will re-run CI jobs. 11. If CI failures continue to occur which appear unrelated to your pull request, add a comment to your pull request which mentions this and wait for the CI build of apache/arrow:master to start passing again (the status of the CI build is displayed as a badge on the Arrow README page). Once the master CI build of Arrow is passing again, rebase your <feature> branch with the upstream apache/arrow:master branch (this will also automatically re-run CI): $ git checkout master $ git pull --ff-only apache master $ git push $ git checkout arrow_1234 $ git rebase master $ git push --force NOTE: Wes has pointed out that GitHub will not notify reviewers if you force push commits (with git push --force). You should add a comment to your pull request whenever force pushing to inform reviewers of any new changes. --- We would welcome any feedback on this workflow, and if others think it would be useful, I would be happy to contribute a generic version of these process details to the Apache Arrow contribution guidelines. Best Regards, Kevin Gurney -----Original Message----- From: Ravindra Pindikura <ravin...@dremio.com> Sent: Wednesday, January 30, 2019 8:52 PM To: dev@arrow.apache.org Subject: Re: Git workflow question Ok. Thanks, wes. > On Jan 30, 2019, at 8:43 PM, Wes McKinney <wesmck...@gmail.com> wrote: > > hi Ravindra, > > On Wed, Jan 30, 2019 at 12:00 AM Ravindra Pindikura <ravin...@dremio.com > <mailto:ravin...@dremio.com>> wrote: >> >> >> >> >>> On Jan 30, 2019, at 11:05 AM, Andy Grove <andygrov...@gmail.com> wrote: >>> >>> Got it. Thanks for the clarification. >>> >>> On Tue, Jan 29, 2019 at 10:30 PM Wes McKinney <wesmck...@gmail.com> wrote: >>> >>>> hi Andy, >>>> >>>> yes, in this project I recommend never using "git merge". Merge >>>> commits just make branches harder to maintain when master is not >>>> using "merge" for merging patches. >>>> >>>> It is semantically simpler in the case of conflicts with master to >>>> use "git rebase -i" to combine your changes into a single commit, >>>> then "git rebase master" and resolve the conflicts then. >> >> Here’s the workflow that I use : >> >> git fetch upstream >> git log -> count my local commits, and remember it as ‘X' >> git rebase -i HEAD~x >> git rebase upstream/master >> git push -f >> >> >> I’m not able to avoid the ‘-f’ in the last step. But, Wes had recommended >> that we avoid the force option. Is there a better way to do this ? > > You do have to force-push after rebasing. > > I did write an e-mail about force-pushing where notifications are > concerned. So let me revise my thoughts about it: > > * If you need to rebase, rebase. If you expect a contributor to look > at your rebased PR, please comment to say that the PR has been updated > because GitHub does not send email notifications for force pushes > * If you don't *need* to rebase (i.e. there aren't any upstream > patches you need), then it's OK to leave as is or keep pushing commits > to the branch > > As we are not using Gerrit or similar code review tool, there is no > squash-and-rebase requirement. The e-mail that I wrote was to let > contributors know that there is an extra communication requirement > when you force-push if you want your PR reviewed > >> >> Thanks & regards, >> Ravindra, >> >>>> >>>> A linear commit history, with all patches landing in master as >>>> single commits, significantly eases downstream users who may be >>>> cherry picking fixes into maintenance branches. The alternative -- >>>> trying to sift the changes you want out of a tangled web of merge >>>> commits -- would be utter madness. >>>> >>>> - Wes >>>> >>>> On Tue, Jan 29, 2019 at 11:20 PM Andy Grove <andygrov...@gmail.com> wrote: >>>>> >>>>> I've been struggling a bit with the workflow and I think I see >>>>> what I'm doing wrong now but wanted to confirm. >>>>> >>>>> I've been running the following to keep my fork up to date: >>>>> >>>>> git checkout master >>>>> git fetch upstream >>>>> git merge upstream/master >>>>> git push origin >>>>> >>>>> And then to update my branch I have been doing: >>>>> >>>>> git checkout ARROW-nnnn >>>>> git merge master >>>>> git push origin >>>>> >>>>> This generally has worked but sometimes I seem to pick up random >>>>> commits >>>> on >>>>> my branch. >>>>> >>>>> Reading the github fork workflow docs again it looks like I should >>>>> have been running "git rebase master" instead of "git merge master" ? >>>>> >>>>> Is that the only mistake I'm making? >>>>> >>>>> Thanks, >>>>> >>>>> Andy.