Hi All,

In case it is helpful for others, I wanted to summarize the high-level workflow 
that we have been following at MathWorks to manage our mathworks/arrow fork of 
apache/arrow.

We use the mathworks/arrow fork as a sort of "staging" area, where we can 
experiment, perform preliminary code review, and prepare pull requests that 
will eventually be shared with the upstream apache/arrow project. Although we 
have multiple contributors to the mathworks/arrow fork, the steps we have been 
following should more or less work for personal forks too.

In general, mathworks/arrow follows the "Branch and Pull" workflow described 
here: http://www.goring.org/resources/project-management.html

Although we are still continuously refining this workflow based on experience 
and feedback, it seems to work pretty well for our purposes.

The basic process we follow is described below:

1. Clone the mathworks/arrow fork to your local machine.

$ git clone https://github.com/mathworks/arrow.git

2. Set up a remote to point to the upstream apache/arrow repository.

$ cd arrow/
$ git remote add apache https://github.com/apache/arrow.git

3. Sync the mathworks/arrow:master branch with the upstream apache/arrow:master 
branch.

$ git pull --ff-only apache master
$ git push

4. Create a new feature branch off the up-to-date mathworks/arrow:master 
branch. This branch will eventually be used for making a pull request against 
the upstream apache/arrow:master branch. All pull requests should be associated 
with an existing Apache JIRA Issue. We recommend naming your feature branch 
after the associated Apache JIRA Issue. For example, for an Apache JIRA issue 
named ARROW-1234, you could name your branch arrow_1234.

$ git checkout -b arrow_1234
$ git push --set-upstream origin arrow_1234

5. If mathworks/arrow:master moves ahead (because it is re-synced with the 
upstream apache/arrow:master branch), then rebase the feature branch with the 
mathworks/arrow:master branch. This will ensure that any commits made on the 
feature branch will be developed "on top" of the latest mathworks/arrow:master 
commit history.

$ git rebase master

NOTE: We use a "work in progress" branch as detailed in steps 6 and 8 for 
managing preliminary code reviews in mathworks/arrow. Therefore, these steps 
may not be necessary if you are managing a personal fork.

6. Create a "work in progress" branch named <feature>_wip (where wip = "work in 
progress"). This branch will be used as a "staging" area for doing iterative 
feature development.

$ git checkout -b arrow_1234_wip
$ git push --set-upstream origin arrow_1234_wip

7. Do work on the <feature>_wip branch. Commit as little or as often as you 
like. Push the changes to mathworks/arrow:<feature>_wip as often as you feel is 
appropriate.

$ git commit -am "Fix issue with ..."
$ git push

8. When you are ready, create a preliminary code review by making a pull 
request from the <feature>_wip branch to the corresponding <feature> branch. 
Once the code review is complete, accept the pull request into the <feature> 
branch.

9. Once the <feature> branch is finalized, make a pull request from the 
mathworks/arrow:<feature> branch against the upstream apache/arrow:master 
branch. Collaborate with the rest of the Arrow community to address feedback on 
your changes.

10. If you see any CI failures, inspect the Travis CI and/or AppVeyor logs to 
determine whether the failures are occurring due to your changes or some 
unrelated recent commits to apache/arrow:master. If the failures are occurring 
due to your changes, you can make any necessary changes on your local clone of 
the mathworks/arrow:<feature> branch. Pushing this branch to mathworks/arrow 
will re-run CI jobs.

11. If CI failures continue to occur which appear unrelated to your pull 
request, add a comment to your pull request which mentions this and wait for 
the CI build of apache/arrow:master to start passing again (the status of the 
CI build is displayed as a badge on the Arrow README page). Once the master CI 
build of Arrow is passing again, rebase your <feature> branch with the upstream 
apache/arrow:master branch (this will also automatically re-run CI):

$ git checkout master
$ git pull --ff-only apache master
$ git push
$ git checkout arrow_1234
$ git rebase master
$ git push --force

NOTE: Wes has pointed out that GitHub will not notify reviewers if you force 
push commits (with git push --force). You should add a comment to your pull 
request whenever force pushing to inform reviewers of any new changes.

---

We would welcome any feedback on this workflow, and if others think it would be 
useful, I would be happy to contribute a generic version of these process 
details to the Apache Arrow contribution guidelines.

Best Regards,

Kevin Gurney

-----Original Message-----
From: Ravindra Pindikura <ravin...@dremio.com> 
Sent: Wednesday, January 30, 2019 8:52 PM
To: dev@arrow.apache.org
Subject: Re: Git workflow question

Ok. Thanks, wes.

> On Jan 30, 2019, at 8:43 PM, Wes McKinney <wesmck...@gmail.com> wrote:
> 
> hi Ravindra,
> 
> On Wed, Jan 30, 2019 at 12:00 AM Ravindra Pindikura <ravin...@dremio.com 
> <mailto:ravin...@dremio.com>> wrote:
>> 
>> 
>> 
>> 
>>> On Jan 30, 2019, at 11:05 AM, Andy Grove <andygrov...@gmail.com> wrote:
>>> 
>>> Got it. Thanks for the clarification.
>>> 
>>> On Tue, Jan 29, 2019 at 10:30 PM Wes McKinney <wesmck...@gmail.com> wrote:
>>> 
>>>> hi Andy,
>>>> 
>>>> yes, in this project I recommend never using "git merge". Merge 
>>>> commits just make branches harder to maintain when master is not 
>>>> using "merge" for merging patches.
>>>> 
>>>> It is semantically simpler in the case of conflicts with master to 
>>>> use "git rebase -i" to combine your changes into a single commit, 
>>>> then "git rebase master" and resolve the conflicts then.
>> 
>> Here’s the workflow that I use :
>> 
>> git fetch upstream
>> git log -> count my local commits, and remember it as ‘X'
>> git rebase -i HEAD~x
>> git rebase upstream/master
>> git push -f
>> 
>> 
>> I’m not able to avoid the ‘-f’ in the last step. But, Wes had recommended 
>> that we avoid the force option. Is there a better way to do this ?
> 
> You do have to force-push after rebasing.
> 
> I did write an e-mail about force-pushing where notifications are 
> concerned. So let me revise my thoughts about it:
> 
> * If you need to rebase, rebase. If you expect a contributor to look 
> at your rebased PR, please comment to say that the PR has been updated 
> because GitHub does not send email notifications for force pushes
> * If you don't *need* to rebase (i.e. there aren't any upstream 
> patches you need), then it's OK to leave as is or keep pushing commits 
> to the branch
> 
> As we are not using Gerrit or similar code review tool, there is no 
> squash-and-rebase requirement. The e-mail that I wrote was to let 
> contributors know that there is an extra communication requirement 
> when you force-push if you want your PR reviewed
> 
>> 
>> Thanks & regards,
>> Ravindra,
>> 
>>>> 
>>>> A linear commit history, with all patches landing in master as 
>>>> single commits, significantly eases downstream users who may be 
>>>> cherry picking fixes into maintenance branches. The alternative -- 
>>>> trying to sift the changes you want out of a tangled web of merge 
>>>> commits -- would be utter madness.
>>>> 
>>>> - Wes
>>>> 
>>>> On Tue, Jan 29, 2019 at 11:20 PM Andy Grove <andygrov...@gmail.com> wrote:
>>>>> 
>>>>> I've been struggling a bit with the workflow and I think I see 
>>>>> what I'm doing wrong now but wanted to confirm.
>>>>> 
>>>>> I've been running the following to keep my fork up to date:
>>>>> 
>>>>> git checkout master
>>>>> git fetch upstream
>>>>> git merge upstream/master
>>>>> git push origin
>>>>> 
>>>>> And then to update my branch I have been doing:
>>>>> 
>>>>> git checkout ARROW-nnnn
>>>>> git merge master
>>>>> git push origin
>>>>> 
>>>>> This generally has worked but sometimes I seem to pick up random 
>>>>> commits
>>>> on
>>>>> my branch.
>>>>> 
>>>>> Reading the github fork workflow docs again it looks like I should 
>>>>> have been running "git rebase master" instead of "git merge master" ?
>>>>> 
>>>>> Is that the only mistake I'm making?
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Andy.

Reply via email to