Re: [DISCUSS] git rebase vs. git merge for branch development
+1, I agree with the support for git-merge based workflows for large branch merge. I have experienced the pain of re-basing the entire branch HDFS-7285, just for verification though, and I found even a line change in trunk in core files ( ex: FSNameSystem.java, BlockManager.java ) makes it hard to rebase many commits in the branch. One main problem, as I have experienced, with git-rebase is, If we need to retain same commits, All conflicts should be resolved by the same person who is doing the rebase, as 'git-rebase' should be executed in same machine and there is a fair chance of miss-handling conflicts and causing problem. The person doing rebase may not be very familiar with the conflicted code. In these kind of situations, I think its very hard to find out what was the original code and what is conflicted code, once the rebase is done. IMO, its fair to go with periodic merge from trunk-branch, even though there are little conflicts, these may not be much problematic, compare to rebase-conflicts. Regarding merging to branch-2, though it needs little more conflict resolutions compare to trunk, but may not be too much, as trunk and branch-2 are going parallel, at-least in terms of features and fixes ( ~ 90% I would say). Regards, Vinay On Tue, Aug 18, 2015 at 6:12 AM, Sangjin Lee sj...@apache.org wrote: I also think allowing merges as a way to uprev with trunk would be a good idea. AFAIK, git rebase works well when your branch is short-lived and contains a fairly small number of commits, but doesn't work so well if your branch is large. Also, the cost of rebase will only go up as time goes. On the other hand, git merge has a pretty decent chance to succeed, especially more so if you merge the trunk often. My 2 cents. Sangjin On Mon, Aug 17, 2015 at 1:18 PM, Jing Zhao jing.apa...@gmail.com wrote: I think we should allow merge-based workflows. I worked and am working in several big feature branches, including HDFS-2802 (100 subtasks) and HDFS-7285 (currently already 200 subtasks), and tried both the merge-based and rebase-based workflows. When the feature change becomes big, the rebase will become a big pain, considering a small change in trunk can cause conflicts for rebasing large number of commits in the feature branch. Using git merge to merge trunk changes into the feature branch is much easier in this case. Thanks, -Jing On Mon, Aug 17, 2015 at 12:17 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, I've thought about this topic more over the last week, and felt I should play devil's advocate for a merge workflow. A few comments: - The issue of merges polluting history is mainly an issue when using a github PR workflow, which results in one merge per PR. Clearly this is not okay, but a separate issue from feature branches. We only have a handful of merge commits per feature branch. - The issue of changes hiding in merge commits can happen when resolving rebase conflicts too, except it's harder to track. Right now neither go through code review, which is sketchy. We probably should review these too, and it's easier to review a single merge commit vs. an entire rebased branch. Merge is also a more natural way of integrating changes from trunk, since you just resolve all conflicts at once at the end. - Merge gives us a linear history on the branch but worse history on trunk/branch-2. Rebase has worse history on the branch but a linear history on trunk/branch-2. This means for quick/small feature branches that don't have a lot of conflicts, rebase is preferred. For large features with lots of conflicts, merge is preferred. This is basically what we're running into on HDFS-7285. - Rebase also comes with increased coordination costs, since public history is being rewritten. This is again okay for smaller efforts (where there are fewer contributors), but more painful with bigger ones. There have been a number of HDFS-7285 branches created basically as a result of rebase, with corresponding JIRA discussions about where to commit things. - The issue of a single squashed commit for the branch-2 backport is arguably an issue with how we structure our branches. If release branches forked off of trunk rather than branch-2, we wouldn't have this problem. We could require branch-2 integration to also happen via git merge. Or we kick trunk out to a feature branch based off of branch-2. Or we shrug and keep the status quo. I'd definitely appreciate commentary from others who've worked on feature branches in git, even in communities outside of Hadoop. If there is support for allowing merge-based workflows in addition to rebase, we'd need to kick off a [VOTE] thread since the last [VOTE] only allows
Re: [DISCUSS] git rebase vs. git merge for branch development
Sounds like we have a lot of support for also allowing merge workflows. Let me draft a proper proposal and go through the [DISCUSS] and [VOTE] process. One thing I think we should amend from the previous [VOTE] is using git merge --no-ff rather than rebase --onto for branch - trunk integration, since it makes reverting the branch easier. Also using git merge rather than a squashed commit for the branch-2 backport as Vinay said. In the meantime, I think it's okay for ongoing feature branch development like HDFS-7285 to start using merge rather than rebase. Haven't seen any objections to merge yet. On Tue, Aug 18, 2015 at 1:39 AM, Vinayakumar B vinayakum...@apache.org wrote: +1, I agree with the support for git-merge based workflows for large branch merge. I have experienced the pain of re-basing the entire branch HDFS-7285, just for verification though, and I found even a line change in trunk in core files ( ex: FSNameSystem.java, BlockManager.java ) makes it hard to rebase many commits in the branch. One main problem, as I have experienced, with git-rebase is, If we need to retain same commits, All conflicts should be resolved by the same person who is doing the rebase, as 'git-rebase' should be executed in same machine and there is a fair chance of miss-handling conflicts and causing problem. The person doing rebase may not be very familiar with the conflicted code. In these kind of situations, I think its very hard to find out what was the original code and what is conflicted code, once the rebase is done. IMO, its fair to go with periodic merge from trunk-branch, even though there are little conflicts, these may not be much problematic, compare to rebase-conflicts. Regarding merging to branch-2, though it needs little more conflict resolutions compare to trunk, but may not be too much, as trunk and branch-2 are going parallel, at-least in terms of features and fixes ( ~ 90% I would say). Regards, Vinay On Tue, Aug 18, 2015 at 6:12 AM, Sangjin Lee sj...@apache.org wrote: I also think allowing merges as a way to uprev with trunk would be a good idea. AFAIK, git rebase works well when your branch is short-lived and contains a fairly small number of commits, but doesn't work so well if your branch is large. Also, the cost of rebase will only go up as time goes. On the other hand, git merge has a pretty decent chance to succeed, especially more so if you merge the trunk often. My 2 cents. Sangjin On Mon, Aug 17, 2015 at 1:18 PM, Jing Zhao jing.apa...@gmail.com wrote: I think we should allow merge-based workflows. I worked and am working in several big feature branches, including HDFS-2802 (100 subtasks) and HDFS-7285 (currently already 200 subtasks), and tried both the merge-based and rebase-based workflows. When the feature change becomes big, the rebase will become a big pain, considering a small change in trunk can cause conflicts for rebasing large number of commits in the feature branch. Using git merge to merge trunk changes into the feature branch is much easier in this case. Thanks, -Jing On Mon, Aug 17, 2015 at 12:17 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, I've thought about this topic more over the last week, and felt I should play devil's advocate for a merge workflow. A few comments: - The issue of merges polluting history is mainly an issue when using a github PR workflow, which results in one merge per PR. Clearly this is not okay, but a separate issue from feature branches. We only have a handful of merge commits per feature branch. - The issue of changes hiding in merge commits can happen when resolving rebase conflicts too, except it's harder to track. Right now neither go through code review, which is sketchy. We probably should review these too, and it's easier to review a single merge commit vs. an entire rebased branch. Merge is also a more natural way of integrating changes from trunk, since you just resolve all conflicts at once at the end. - Merge gives us a linear history on the branch but worse history on trunk/branch-2. Rebase has worse history on the branch but a linear history on trunk/branch-2. This means for quick/small feature branches that don't have a lot of conflicts, rebase is preferred. For large features with lots of conflicts, merge is preferred. This is basically what we're running into on HDFS-7285. - Rebase also comes with increased coordination costs, since public history is being rewritten. This is again okay for smaller efforts (where there are fewer contributors), but more painful with bigger ones. There have been a number of HDFS-7285 branches created basically as a result of rebase, with
Re: [DISCUSS] git rebase vs. git merge for branch development
One other (long shot) option might be to do git cherry-picks of all new *trunk* commits into the feature branch when you uprev. But I'm not sure if that will be a sustainable practice, given the number of commits that are happening on the trunk. Unless you're upreving very often (e.g. daily), this could also get out of hand. On Tue, Aug 18, 2015 at 11:33 AM, Andrew Wang andrew.w...@cloudera.com wrote: Sounds like we have a lot of support for also allowing merge workflows. Let me draft a proper proposal and go through the [DISCUSS] and [VOTE] process. One thing I think we should amend from the previous [VOTE] is using git merge --no-ff rather than rebase --onto for branch - trunk integration, since it makes reverting the branch easier. Also using git merge rather than a squashed commit for the branch-2 backport as Vinay said. In the meantime, I think it's okay for ongoing feature branch development like HDFS-7285 to start using merge rather than rebase. Haven't seen any objections to merge yet. On Tue, Aug 18, 2015 at 1:39 AM, Vinayakumar B vinayakum...@apache.org wrote: +1, I agree with the support for git-merge based workflows for large branch merge. I have experienced the pain of re-basing the entire branch HDFS-7285, just for verification though, and I found even a line change in trunk in core files ( ex: FSNameSystem.java, BlockManager.java ) makes it hard to rebase many commits in the branch. One main problem, as I have experienced, with git-rebase is, If we need to retain same commits, All conflicts should be resolved by the same person who is doing the rebase, as 'git-rebase' should be executed in same machine and there is a fair chance of miss-handling conflicts and causing problem. The person doing rebase may not be very familiar with the conflicted code. In these kind of situations, I think its very hard to find out what was the original code and what is conflicted code, once the rebase is done. IMO, its fair to go with periodic merge from trunk-branch, even though there are little conflicts, these may not be much problematic, compare to rebase-conflicts. Regarding merging to branch-2, though it needs little more conflict resolutions compare to trunk, but may not be too much, as trunk and branch-2 are going parallel, at-least in terms of features and fixes ( ~ 90% I would say). Regards, Vinay On Tue, Aug 18, 2015 at 6:12 AM, Sangjin Lee sj...@apache.org wrote: I also think allowing merges as a way to uprev with trunk would be a good idea. AFAIK, git rebase works well when your branch is short-lived and contains a fairly small number of commits, but doesn't work so well if your branch is large. Also, the cost of rebase will only go up as time goes. On the other hand, git merge has a pretty decent chance to succeed, especially more so if you merge the trunk often. My 2 cents. Sangjin On Mon, Aug 17, 2015 at 1:18 PM, Jing Zhao jing.apa...@gmail.com wrote: I think we should allow merge-based workflows. I worked and am working in several big feature branches, including HDFS-2802 (100 subtasks) and HDFS-7285 (currently already 200 subtasks), and tried both the merge-based and rebase-based workflows. When the feature change becomes big, the rebase will become a big pain, considering a small change in trunk can cause conflicts for rebasing large number of commits in the feature branch. Using git merge to merge trunk changes into the feature branch is much easier in this case. Thanks, -Jing On Mon, Aug 17, 2015 at 12:17 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, I've thought about this topic more over the last week, and felt I should play devil's advocate for a merge workflow. A few comments: - The issue of merges polluting history is mainly an issue when using a github PR workflow, which results in one merge per PR. Clearly this is not okay, but a separate issue from feature branches. We only have a handful of merge commits per feature branch. - The issue of changes hiding in merge commits can happen when resolving rebase conflicts too, except it's harder to track. Right now neither go through code review, which is sketchy. We probably should review these too, and it's easier to review a single merge commit vs. an entire rebased branch. Merge is also a more natural way of integrating changes from trunk, since you just resolve all conflicts at once at the end. - Merge gives us a linear history on the branch but worse history on trunk/branch-2. Rebase has worse history on the branch but a linear history on trunk/branch-2. This means for quick/small feature branches that don't have a lot
Re: [DISCUSS] git rebase vs. git merge for branch development
I also think allowing merges as a way to uprev with trunk would be a good idea. AFAIK, git rebase works well when your branch is short-lived and contains a fairly small number of commits, but doesn't work so well if your branch is large. Also, the cost of rebase will only go up as time goes. On the other hand, git merge has a pretty decent chance to succeed, especially more so if you merge the trunk often. My 2 cents. Sangjin On Mon, Aug 17, 2015 at 1:18 PM, Jing Zhao jing.apa...@gmail.com wrote: I think we should allow merge-based workflows. I worked and am working in several big feature branches, including HDFS-2802 (100 subtasks) and HDFS-7285 (currently already 200 subtasks), and tried both the merge-based and rebase-based workflows. When the feature change becomes big, the rebase will become a big pain, considering a small change in trunk can cause conflicts for rebasing large number of commits in the feature branch. Using git merge to merge trunk changes into the feature branch is much easier in this case. Thanks, -Jing On Mon, Aug 17, 2015 at 12:17 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, I've thought about this topic more over the last week, and felt I should play devil's advocate for a merge workflow. A few comments: - The issue of merges polluting history is mainly an issue when using a github PR workflow, which results in one merge per PR. Clearly this is not okay, but a separate issue from feature branches. We only have a handful of merge commits per feature branch. - The issue of changes hiding in merge commits can happen when resolving rebase conflicts too, except it's harder to track. Right now neither go through code review, which is sketchy. We probably should review these too, and it's easier to review a single merge commit vs. an entire rebased branch. Merge is also a more natural way of integrating changes from trunk, since you just resolve all conflicts at once at the end. - Merge gives us a linear history on the branch but worse history on trunk/branch-2. Rebase has worse history on the branch but a linear history on trunk/branch-2. This means for quick/small feature branches that don't have a lot of conflicts, rebase is preferred. For large features with lots of conflicts, merge is preferred. This is basically what we're running into on HDFS-7285. - Rebase also comes with increased coordination costs, since public history is being rewritten. This is again okay for smaller efforts (where there are fewer contributors), but more painful with bigger ones. There have been a number of HDFS-7285 branches created basically as a result of rebase, with corresponding JIRA discussions about where to commit things. - The issue of a single squashed commit for the branch-2 backport is arguably an issue with how we structure our branches. If release branches forked off of trunk rather than branch-2, we wouldn't have this problem. We could require branch-2 integration to also happen via git merge. Or we kick trunk out to a feature branch based off of branch-2. Or we shrug and keep the status quo. I'd definitely appreciate commentary from others who've worked on feature branches in git, even in communities outside of Hadoop. If there is support for allowing merge-based workflows in addition to rebase, we'd need to kick off a [VOTE] thread since the last [VOTE] only allows rebase. Best, Andrew On Mon, Aug 17, 2015 at 11:33 AM, Andrew Wang andrew.w...@cloudera.com wrote: @Sangjin, I believe this is covered by the [VOTE] I linked to above, key excerpt being: 3. Force-push on feature-branches is allowed. Before pulling in a feature, the feature-branch should be rebased on latest trunk and the changes applied to trunk through git rebase --onto or git cherry-pick commit-range. This specifies that the last uprev final integration of the branch into trunk happen with rebase. It doesn't say anything about the periodic uprev's, but it'd be very strange to merge periodically and then rebase once at the end. So I take it to mean doing periodic uprevs with rebase too. On Mon, Aug 17, 2015 at 11:23 AM, Sangjin Lee sj...@apache.org wrote: Just to be clear, are we discussing the process of uprev'ing the feature development branch with the latest from the trunk from time to time, or making the final merge of the feature branch onto the trunk? On Mon, Aug 17, 2015 at 10:21 AM, Steve Loughran ste...@hortonworks.com wrote: I haven't done a bit piece of work in the ASF code repo since the migration to git; though I have done it in the svn era. Currently with private git repos -anyone gets SCM control of their source -you can commit for your own reasons (about to make
Re: [DISCUSS] git rebase vs. git merge for branch development
I haven't done a bit piece of work in the ASF code repo since the migration to git; though I have done it in the svn era. Currently with private git repos -anyone gets SCM control of their source -you can commit for your own reasons (about to make a change, want a private jenkins run, ...) and gain from having many small checkins. More succinctly: if you aren't checking in your work 2+ times a day —why not? -rebasing a painful necessity on personal, private branches to keep the final patch to hadoop git a single diff With the private git process that's the defacto standard, we lose history anyway. I know what I've done and somewhere there's a tag in my own github repo of my work to create a JIRA. But we don't always need that entire history of trying to debug kerberos, typo in exception, and other stuff that accrues during the work. I think therefore that I'm in favour of big squash commits. What we could do is extend that with a policy of 1. tag the final commit used to make the patch, something like tag_HADOOP-8192. The tag ensures that the history isn't gc'd 2. Delete the branch (keeps the #of branches down) 3. In the JIRA, include the name of the tag and the git commit number in the comments. Someone curious can rebuild that history
Re: [DISCUSS] git rebase vs. git merge for branch development
Just to be clear, are we discussing the process of uprev'ing the feature development branch with the latest from the trunk from time to time, or making the final merge of the feature branch onto the trunk? On Mon, Aug 17, 2015 at 10:21 AM, Steve Loughran ste...@hortonworks.com wrote: I haven't done a bit piece of work in the ASF code repo since the migration to git; though I have done it in the svn era. Currently with private git repos -anyone gets SCM control of their source -you can commit for your own reasons (about to make a change, want a private jenkins run, ...) and gain from having many small checkins. More succinctly: if you aren't checking in your work 2+ times a day —why not? -rebasing a painful necessity on personal, private branches to keep the final patch to hadoop git a single diff With the private git process that's the defacto standard, we lose history anyway. I know what I've done and somewhere there's a tag in my own github repo of my work to create a JIRA. But we don't always need that entire history of trying to debug kerberos, typo in exception, and other stuff that accrues during the work. I think therefore that I'm in favour of big squash commits. What we could do is extend that with a policy of 1. tag the final commit used to make the patch, something like tag_HADOOP-8192. The tag ensures that the history isn't gc'd 2. Delete the branch (keeps the #of branches down) 3. In the JIRA, include the name of the tag and the git commit number in the comments. Someone curious can rebuild that history
Re: [DISCUSS] git rebase vs. git merge for branch development
@Sangjin, I believe this is covered by the [VOTE] I linked to above, key excerpt being: 3. Force-push on feature-branches is allowed. Before pulling in a feature, the feature-branch should be rebased on latest trunk and the changes applied to trunk through git rebase --onto or git cherry-pick commit-range. This specifies that the last uprev final integration of the branch into trunk happen with rebase. It doesn't say anything about the periodic uprev's, but it'd be very strange to merge periodically and then rebase once at the end. So I take it to mean doing periodic uprevs with rebase too. On Mon, Aug 17, 2015 at 11:23 AM, Sangjin Lee sj...@apache.org wrote: Just to be clear, are we discussing the process of uprev'ing the feature development branch with the latest from the trunk from time to time, or making the final merge of the feature branch onto the trunk? On Mon, Aug 17, 2015 at 10:21 AM, Steve Loughran ste...@hortonworks.com wrote: I haven't done a bit piece of work in the ASF code repo since the migration to git; though I have done it in the svn era. Currently with private git repos -anyone gets SCM control of their source -you can commit for your own reasons (about to make a change, want a private jenkins run, ...) and gain from having many small checkins. More succinctly: if you aren't checking in your work 2+ times a day —why not? -rebasing a painful necessity on personal, private branches to keep the final patch to hadoop git a single diff With the private git process that's the defacto standard, we lose history anyway. I know what I've done and somewhere there's a tag in my own github repo of my work to create a JIRA. But we don't always need that entire history of trying to debug kerberos, typo in exception, and other stuff that accrues during the work. I think therefore that I'm in favour of big squash commits. What we could do is extend that with a policy of 1. tag the final commit used to make the patch, something like tag_HADOOP-8192. The tag ensures that the history isn't gc'd 2. Delete the branch (keeps the #of branches down) 3. In the JIRA, include the name of the tag and the git commit number in the comments. Someone curious can rebuild that history
Re: [DISCUSS] git rebase vs. git merge for branch development
Thanks for the clarification Andrew. So is the proposal on the table squashing commits (on the feature branch) when we rebase the feature branch with the latest from trunk? How would the process work? A simple schematic example might be helpful in understanding the proposal. If the feature branch was pushed to the remote repo, then squashing commits (i.e. rewriting commits) could become tricky, right? Thanks in advance. On Mon, Aug 17, 2015 at 11:33 AM, Andrew Wang andrew.w...@cloudera.com wrote: @Sangjin, I believe this is covered by the [VOTE] I linked to above, key excerpt being: 3. Force-push on feature-branches is allowed. Before pulling in a feature, the feature-branch should be rebased on latest trunk and the changes applied to trunk through git rebase --onto or git cherry-pick commit-range. This specifies that the last uprev final integration of the branch into trunk happen with rebase. It doesn't say anything about the periodic uprev's, but it'd be very strange to merge periodically and then rebase once at the end. So I take it to mean doing periodic uprevs with rebase too. On Mon, Aug 17, 2015 at 11:23 AM, Sangjin Lee sj...@apache.org wrote: Just to be clear, are we discussing the process of uprev'ing the feature development branch with the latest from the trunk from time to time, or making the final merge of the feature branch onto the trunk? On Mon, Aug 17, 2015 at 10:21 AM, Steve Loughran ste...@hortonworks.com wrote: I haven't done a bit piece of work in the ASF code repo since the migration to git; though I have done it in the svn era. Currently with private git repos -anyone gets SCM control of their source -you can commit for your own reasons (about to make a change, want a private jenkins run, ...) and gain from having many small checkins. More succinctly: if you aren't checking in your work 2+ times a day —why not? -rebasing a painful necessity on personal, private branches to keep the final patch to hadoop git a single diff With the private git process that's the defacto standard, we lose history anyway. I know what I've done and somewhere there's a tag in my own github repo of my work to create a JIRA. But we don't always need that entire history of trying to debug kerberos, typo in exception, and other stuff that accrues during the work. I think therefore that I'm in favour of big squash commits. What we could do is extend that with a policy of 1. tag the final commit used to make the patch, something like tag_HADOOP-8192. The tag ensures that the history isn't gc'd 2. Delete the branch (keeps the #of branches down) 3. In the JIRA, include the name of the tag and the git commit number in the comments. Someone curious can rebuild that history
Re: [DISCUSS] git rebase vs. git merge for branch development
Hi all, I've thought about this topic more over the last week, and felt I should play devil's advocate for a merge workflow. A few comments: - The issue of merges polluting history is mainly an issue when using a github PR workflow, which results in one merge per PR. Clearly this is not okay, but a separate issue from feature branches. We only have a handful of merge commits per feature branch. - The issue of changes hiding in merge commits can happen when resolving rebase conflicts too, except it's harder to track. Right now neither go through code review, which is sketchy. We probably should review these too, and it's easier to review a single merge commit vs. an entire rebased branch. Merge is also a more natural way of integrating changes from trunk, since you just resolve all conflicts at once at the end. - Merge gives us a linear history on the branch but worse history on trunk/branch-2. Rebase has worse history on the branch but a linear history on trunk/branch-2. This means for quick/small feature branches that don't have a lot of conflicts, rebase is preferred. For large features with lots of conflicts, merge is preferred. This is basically what we're running into on HDFS-7285. - Rebase also comes with increased coordination costs, since public history is being rewritten. This is again okay for smaller efforts (where there are fewer contributors), but more painful with bigger ones. There have been a number of HDFS-7285 branches created basically as a result of rebase, with corresponding JIRA discussions about where to commit things. - The issue of a single squashed commit for the branch-2 backport is arguably an issue with how we structure our branches. If release branches forked off of trunk rather than branch-2, we wouldn't have this problem. We could require branch-2 integration to also happen via git merge. Or we kick trunk out to a feature branch based off of branch-2. Or we shrug and keep the status quo. I'd definitely appreciate commentary from others who've worked on feature branches in git, even in communities outside of Hadoop. If there is support for allowing merge-based workflows in addition to rebase, we'd need to kick off a [VOTE] thread since the last [VOTE] only allows rebase. Best, Andrew On Mon, Aug 17, 2015 at 11:33 AM, Andrew Wang andrew.w...@cloudera.com wrote: @Sangjin, I believe this is covered by the [VOTE] I linked to above, key excerpt being: 3. Force-push on feature-branches is allowed. Before pulling in a feature, the feature-branch should be rebased on latest trunk and the changes applied to trunk through git rebase --onto or git cherry-pick commit-range. This specifies that the last uprev final integration of the branch into trunk happen with rebase. It doesn't say anything about the periodic uprev's, but it'd be very strange to merge periodically and then rebase once at the end. So I take it to mean doing periodic uprevs with rebase too. On Mon, Aug 17, 2015 at 11:23 AM, Sangjin Lee sj...@apache.org wrote: Just to be clear, are we discussing the process of uprev'ing the feature development branch with the latest from the trunk from time to time, or making the final merge of the feature branch onto the trunk? On Mon, Aug 17, 2015 at 10:21 AM, Steve Loughran ste...@hortonworks.com wrote: I haven't done a bit piece of work in the ASF code repo since the migration to git; though I have done it in the svn era. Currently with private git repos -anyone gets SCM control of their source -you can commit for your own reasons (about to make a change, want a private jenkins run, ...) and gain from having many small checkins. More succinctly: if you aren't checking in your work 2+ times a day —why not? -rebasing a painful necessity on personal, private branches to keep the final patch to hadoop git a single diff With the private git process that's the defacto standard, we lose history anyway. I know what I've done and somewhere there's a tag in my own github repo of my work to create a JIRA. But we don't always need that entire history of trying to debug kerberos, typo in exception, and other stuff that accrues during the work. I think therefore that I'm in favour of big squash commits. What we could do is extend that with a policy of 1. tag the final commit used to make the patch, something like tag_HADOOP-8192. The tag ensures that the history isn't gc'd 2. Delete the branch (keeps the #of branches down) 3. In the JIRA, include the name of the tag and the git commit number in the comments. Someone curious can rebuild that history
Re: [DISCUSS] git rebase vs. git merge for branch development
I think we should allow merge-based workflows. I worked and am working in several big feature branches, including HDFS-2802 (100 subtasks) and HDFS-7285 (currently already 200 subtasks), and tried both the merge-based and rebase-based workflows. When the feature change becomes big, the rebase will become a big pain, considering a small change in trunk can cause conflicts for rebasing large number of commits in the feature branch. Using git merge to merge trunk changes into the feature branch is much easier in this case. Thanks, -Jing On Mon, Aug 17, 2015 at 12:17 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, I've thought about this topic more over the last week, and felt I should play devil's advocate for a merge workflow. A few comments: - The issue of merges polluting history is mainly an issue when using a github PR workflow, which results in one merge per PR. Clearly this is not okay, but a separate issue from feature branches. We only have a handful of merge commits per feature branch. - The issue of changes hiding in merge commits can happen when resolving rebase conflicts too, except it's harder to track. Right now neither go through code review, which is sketchy. We probably should review these too, and it's easier to review a single merge commit vs. an entire rebased branch. Merge is also a more natural way of integrating changes from trunk, since you just resolve all conflicts at once at the end. - Merge gives us a linear history on the branch but worse history on trunk/branch-2. Rebase has worse history on the branch but a linear history on trunk/branch-2. This means for quick/small feature branches that don't have a lot of conflicts, rebase is preferred. For large features with lots of conflicts, merge is preferred. This is basically what we're running into on HDFS-7285. - Rebase also comes with increased coordination costs, since public history is being rewritten. This is again okay for smaller efforts (where there are fewer contributors), but more painful with bigger ones. There have been a number of HDFS-7285 branches created basically as a result of rebase, with corresponding JIRA discussions about where to commit things. - The issue of a single squashed commit for the branch-2 backport is arguably an issue with how we structure our branches. If release branches forked off of trunk rather than branch-2, we wouldn't have this problem. We could require branch-2 integration to also happen via git merge. Or we kick trunk out to a feature branch based off of branch-2. Or we shrug and keep the status quo. I'd definitely appreciate commentary from others who've worked on feature branches in git, even in communities outside of Hadoop. If there is support for allowing merge-based workflows in addition to rebase, we'd need to kick off a [VOTE] thread since the last [VOTE] only allows rebase. Best, Andrew On Mon, Aug 17, 2015 at 11:33 AM, Andrew Wang andrew.w...@cloudera.com wrote: @Sangjin, I believe this is covered by the [VOTE] I linked to above, key excerpt being: 3. Force-push on feature-branches is allowed. Before pulling in a feature, the feature-branch should be rebased on latest trunk and the changes applied to trunk through git rebase --onto or git cherry-pick commit-range. This specifies that the last uprev final integration of the branch into trunk happen with rebase. It doesn't say anything about the periodic uprev's, but it'd be very strange to merge periodically and then rebase once at the end. So I take it to mean doing periodic uprevs with rebase too. On Mon, Aug 17, 2015 at 11:23 AM, Sangjin Lee sj...@apache.org wrote: Just to be clear, are we discussing the process of uprev'ing the feature development branch with the latest from the trunk from time to time, or making the final merge of the feature branch onto the trunk? On Mon, Aug 17, 2015 at 10:21 AM, Steve Loughran ste...@hortonworks.com wrote: I haven't done a bit piece of work in the ASF code repo since the migration to git; though I have done it in the svn era. Currently with private git repos -anyone gets SCM control of their source -you can commit for your own reasons (about to make a change, want a private jenkins run, ...) and gain from having many small checkins. More succinctly: if you aren't checking in your work 2+ times a day —why not? -rebasing a painful necessity on personal, private branches to keep the final patch to hadoop git a single diff With the private git process that's the defacto standard, we lose history anyway. I know what I've done and somewhere there's a tag in my own github repo of my work to create a JIRA. But we don't always need that entire history of trying to debug kerberos, typo in exception, and other stuff that accrues
Re: [DISCUSS] git rebase vs. git merge for branch development
I prefer Proposal #1 as well. Squashing some of the commits seems a major improvement over our previous model of a single commit for the entire branch. On Tue, Aug 11, 2015 at 2:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, We are currently working on a pretty substantial new feature in a branch over at HDFS-7285. As the # of commits has grown, running `git rebase` and fixing conflicts in the 180+ commits has become untenable. As you may recall, we voted to use a rebase workflow when we did the switch from SVN to git a year ago [1]. I'm aware of two proposals right now: Proposal 1: Squash some of the commits to make rebase easier. Often times, intermediate commits are made to code that get changed again later, and thus don't end up in HEAD. Fixing conflicts in these intermediate commits is a waste of time, especially with 180 commits. I run into this issue even with my local feature branches, and thus squash. The downside is that squashing loses some of the development history, since now multiple JIRAs are combined into a single commit. There are some ways to mitigate this: the old branch with the full history can be left in place, and the squashed commits can reference the JIRAs that have been squashed together. Proposal 2: Allow merge-based workflows too. This is what we were doing in the SVN days. Periodically merge trunk to the branch, resulting in merge commits to resolve conflicts. When the branch is ready, merge it back to trunk. I read through the discussion thread [2] where we decided to go with rebase, The concerns were that merge commits pollute history, which was an issue for HBase and I believe Spark. Merge commits are not associated with a single JIRA or commit, and fixes are sometimes hidden in merge commits. This makes backports harder. Merge-based workflows also squash the history when backporting to a branch. In the SVN merge-based days, backporting to branch-2 was typically done as a single squashed commit. With a rebase workflow, it's possible to rebase the branch against branch-2 and get the same history as trunk. My mild preference is for Proposal #1 since it results in a clean linear history in both trunk and branch-2, but it has to be understood that squashing is sometimes a required part of a rebase workflow. If the core issue with squashing is maintaining development history, I think it's satisfied by keeping old branches around and referencing the squashed JIRAs. Welcome other thoughts here too. Best, Andrew [1]: http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201408.mbox/%3CCALwhT94Y64M9keY25Ry_QOLUSZQT29tJQ95twsoa8xXrcNTxpQ%40mail.gmail.com%3E [2]: http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201408.mbox/%3CCALwhT97bM36X6-3%3DcCUwaAKxZ80jfZwuf53BTR7TbWwV5e%2BXkA%40mail.gmail.com%3E
[DISCUSS] git rebase vs. git merge for branch development
Hi all, We are currently working on a pretty substantial new feature in a branch over at HDFS-7285. As the # of commits has grown, running `git rebase` and fixing conflicts in the 180+ commits has become untenable. As you may recall, we voted to use a rebase workflow when we did the switch from SVN to git a year ago [1]. I'm aware of two proposals right now: Proposal 1: Squash some of the commits to make rebase easier. Often times, intermediate commits are made to code that get changed again later, and thus don't end up in HEAD. Fixing conflicts in these intermediate commits is a waste of time, especially with 180 commits. I run into this issue even with my local feature branches, and thus squash. The downside is that squashing loses some of the development history, since now multiple JIRAs are combined into a single commit. There are some ways to mitigate this: the old branch with the full history can be left in place, and the squashed commits can reference the JIRAs that have been squashed together. Proposal 2: Allow merge-based workflows too. This is what we were doing in the SVN days. Periodically merge trunk to the branch, resulting in merge commits to resolve conflicts. When the branch is ready, merge it back to trunk. I read through the discussion thread [2] where we decided to go with rebase, The concerns were that merge commits pollute history, which was an issue for HBase and I believe Spark. Merge commits are not associated with a single JIRA or commit, and fixes are sometimes hidden in merge commits. This makes backports harder. Merge-based workflows also squash the history when backporting to a branch. In the SVN merge-based days, backporting to branch-2 was typically done as a single squashed commit. With a rebase workflow, it's possible to rebase the branch against branch-2 and get the same history as trunk. My mild preference is for Proposal #1 since it results in a clean linear history in both trunk and branch-2, but it has to be understood that squashing is sometimes a required part of a rebase workflow. If the core issue with squashing is maintaining development history, I think it's satisfied by keeping old branches around and referencing the squashed JIRAs. Welcome other thoughts here too. Best, Andrew [1]: http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201408.mbox/%3CCALwhT94Y64M9keY25Ry_QOLUSZQT29tJQ95twsoa8xXrcNTxpQ%40mail.gmail.com%3E [2]: http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201408.mbox/%3CCALwhT97bM36X6-3%3DcCUwaAKxZ80jfZwuf53BTR7TbWwV5e%2BXkA%40mail.gmail.com%3E