Sounds reasonable to me. Jon
On Tue, Jan 31, 2017, 2:03 PM James Sirota <jsir...@apache.org> wrote: > Is everyone ok with modifying the document with the following? I can make > the changes and put it up for a vote. > > Thanks, > James > > 31.01.2017, 08:46, "Nick Allen" <n...@nickallen.org>: > > I am in agreement with what I am reading. Something along the lines of > the > > following. > > > > - Best effort should be made to create a pull request that can be > > attributed to a single author. > > - If the work of multiple authors cannot be split into separate pull > > requests for justifiable reasons, then it is the author's > responsibility to > > create a pull request with the minimal number of commits required to > > provide appropriate attribution to multiple authors. > > - Each commit should be attributed to a single author and follow the > > commit message guidelines identifying the issue associated with the > change. > > - When merging the pull request, the committer will not squash these > > commits so that attribution to each author can be maintained in the > > revision history. > > > > On Tue, Jan 31, 2017 at 9:42 AM, Otto Fowler <ottobackwa...@gmail.com> > > wrote: > > > >> Just to be clear - I’m not pushing for this, I was more curious about > how > >> to do it if we wanted to. My main interest is that the process we > agree to > >> fosters and encourages collaboration rather than make it such a pain > that > >> contributors think twice before helping out on large PR efforts. > >> > >> How did we do it for the Storm efforts? > >> > >> On January 31, 2017 at 09:21:17, Casey Stella (ceste...@gmail.com) > wrote: > >> > >> To my reading, that script only notes additional authors, it does not > >> create separate commits for those authors. I feel that we should give > >> authors the ability to construct pull requests in such a way that > >> attribution in the form of github statistics are preserved. > >> > >> Also, and this is for the mentors, does the zookeeper methodology > abide by > >> the requirement of having the commit log be a place of record for > >> authorization? > >> > >> Casey > >> > >> On Tue, Jan 31, 2017 at 9:16 AM, Otto Fowler <ottobackwa...@gmail.com> > >> wrote: > >> > >> > https://cwiki.apache.org/confluence/display/ZOOKEEPER/ > >> > Merging+Github+Pull+Requests > >> > > >> > > >> > On January 31, 2017 at 09:04:38, Casey Stella (ceste...@gmail.com) > >> wrote: > >> > > >> > The problem is that a single commit cannot have multiple authors and > it's > >> > difficult to construct a minimal set of commits for multiple authors > in > >> the > >> > same PR by the committers. I would be in favor of not automating this > >> and, > >> > rather, insisting that the pull request either be split up into > multiple > >> > PRs or have the individual PR's commits squashed and named > appropriately > >> by > >> > the authors. > >> > > >> > For reference, here's how Apache Apex does this. Lines item 7 and 8 > under > >> > "Opening a Pull Request" at > https://apex.apache.org/contributing.html : > >> > > >> > 1. > >> > > >> > After all review is complete, combine all new commits into one > squashed > >> > commit except when there are multiple contributors, and include the > Jira > >> > number in the commit message. There are several ways to squash > >> > commits, but here > >> > is one explanation from git-scm.com > >> > <https://git-scm.com/book/en/v2/Git-Tools-Rewriting- > >> > History#Squashing-Commits> > >> > and > >> > a simple example is illustrated below: > >> > > >> > If tracking upstream/master then run git rebase -i. Else run git > rebase > >> > -i upstream/master. This command opens the text editor which lists > the > >> > multiple commits: > >> > > >> > pick 67cd79b change1 > >> > pick 6f98905 change2 > >> > > >> > # Rebase e13748b..3463fbf onto e13748b (2 command(s)) > >> > # > >> > # Commands: > >> > # p, pick = use commit > >> > # r, reword = use commit, but edit the commit message > >> > # e, edit = use commit, but stop for amending > >> > # s, squash = use commit, but meld into previous commit > >> > # f, fixup = like "squash", but discard this commit's log message > >> > # x, exec = run command (the rest of the line) using shell > >> > # > >> > # These lines can be re-ordered; they are executed from top to > bottom. > >> > > >> > Squash 'change2' to 'change1' and save. > >> > > >> > pick 67cd79b change1 > >> > squash 6f98905 change2 > >> > > >> > 2. If there are multiple contributors in a pull request preserve > >> > individual attributions. Try to squash the commits to the minimum > number > >> of > >> > commits required to preserve attribution and the contribution to > still be > >> > functionally correct. > >> > > >> > Their language is similar to the ones proposed by Dave and I like it. > >> This > >> > is, I consider, a special case and we should avoid trying to > automate it, > >> > but rather delegate it to meatspace. Thoughts? > >> > > >> > On Tue, Jan 31, 2017 at 5:58 AM, zeo...@gmail.com <zeo...@gmail.com> > >> > wrote: > >> > > >> > > I thought the idea would be to maintain the commit mapping to > >> individuals > >> > > so things like the GitHub statistics are accurate regarding # of > >> > commits, # > >> > > of lines affected +/-, # of contributors, etc. > >> > > > >> > > Jon > >> > > > >> > > On Tue, Jan 31, 2017, 12:20 AM Otto Fowler < > ottobackwa...@gmail.com> > >> > > wrote: > >> > > > >> > > > I mean multi-author of course > >> > > > > >> > > > > >> > > > On January 31, 2017 at 00:19:17, Otto Fowler ( > >> ottobackwa...@gmail.com) > >> > > > wrote: > >> > > > > >> > > > So, if we use a script like this…. > >> > > > > >> > > > import sys > >> > > > import requests > >> > > > import json > >> > > > from sets import Set > >> > > > > >> > > > if len(sys.argv) == 1: > >> > > > print "Must pass in pull request number" > >> > > > exit(1) > >> > > > > >> > > > next_url = 'https://api.github.com/repos/ > >> > apache/incubator-metron/pulls/' > >> > > > + sys.argv[1] + '/commits' > >> > > > authors = Set() > >> > > > while next_url: > >> > > > response = requests.get(next_url) > >> > > > if response.status_code == 200: > >> > > > m = json.loads(response.content) > >> > > > for c in m: > >> > > > authors.add(c["commit"]["author"]["name"] + " <" + > >> > > > c["commit"]["author"]["email"] + ">") > >> > > > if 'next' in response.links: > >> > > > next_url = response.links['next']['url'] > >> > > > else: > >> > > > next_url = '' > >> > > > > >> > > > for author in authors: > >> > > > print author > >> > > > > >> > > > > >> > > > and call it from our bash scripts ( this is a test script, would > need > >> > > error > >> > > > check in the real thing ) > >> > > > > >> > > > 1 #!/bin/local/bin bash > >> > > > > >> > > > 2 > >> > > > > >> > > > 3 my_array=() > >> > > > > >> > > > 4 while IFS= read -r line; do > >> > > > > >> > > > 5 my_array+=( "$line" ) > >> > > > > >> > > > 6 done < <( python test.py 316 ) > >> > > > > >> > > > 7 > >> > > > > >> > > > 8 for element in "${my_array[@]}" > >> > > > > >> > > > 9 do > >> > > > > >> > > > 10 echo "${element}" > >> > > > > >> > > > 11 done > >> > > > > >> > > > I get the following output for a multi commit PR like #316 > >> > > > > >> > > > *➜* *ottofowler**@**Winterfell * *~/src/github/forks * bash > test.sh > >> > > > > >> > > > JJ <jjmey...@gmail.com> > >> > > > > >> > > > merrimanr <merrim...@gmail.com> > >> > > > > >> > > > rmerriman <rmerri...@hortonworks.com> > >> > > > > >> > > > > >> > > > Would putting that information in the commit message be good > enough? > >> > > The > >> > > > idea that we integrate this with Nick’s scripts. > >> > > > > >> > > > > >> > > > On January 30, 2017 at 07:00:35, Otto Fowler ( > >> ottobackwa...@gmail.com) > >> > > > wrote: > >> > > > > >> > > > Maybe using something like this? > >> > > > https://developer.github.com/v3/pulls/#list-commits-on-a- > >> pull-request > >> > > > > >> > > > > >> > > > On January 28, 2017 at 09:52:16, Otto Fowler ( > >> ottobackwa...@gmail.com) > >> > > > wrote: > >> > > > > >> > > > Would it be possible to get all the authors out of the pr before > >> > > squashing > >> > > > and change out commit messages? Maybe using an extended version > of > >> > > nick's > >> > > > scripts? > >> > > > > >> > > > On January 27, 2017 at 19:02:54, Casey Stella ( > ceste...@gmail.com) > >> > > wrote: > >> > > > > >> > > > > Having thought about it a minute longer, maybe you're right, > Dave. > >> I > >> > > like > >> > > > > this. > >> > > > > > >> > > > > I'd add one sentence more: > >> > > > > If the contribution does not have a single author, please ask > that > >> > the > >> > > > > contribution be squashed by the author into the minimum number > of > >> > > commits > >> > > > > required to preserve correct attribution each following the > same > >> > > standard > >> > > > > naming convention of: JIRA Description (e.g. METRON-123: Foo > all > >> the > >> > > > Bars). > >> > > > > The committer should not squash commits in this instance so > that > >> > > > ownership > >> > > > > is maintained. > >> > > > > > >> > > > > On Fri, Jan 27, 2017 at 6:55 PM, Casey Stella < > ceste...@gmail.com> > >> > > > wrote: > >> > > > > > >> > > > > Yeah, this is sort of the reason I was going for multiple PRs > >> rather > >> > > than > >> > > > > having contributors do their own squashing and commits. I > recognize > >> > > that > >> > > > > multiple PRs are difficult to review. > >> > > > > > >> > > > > That being said, I don't know that the actual commits are > required > >> to > >> > > > have > >> > > > > all of a person's commit. I imagined the process being: > >> > > > > * The contributors squashes their commits owned by one of them > >> > > > > * The others add an empty commit for each of the contributors > >> > > > > * The committer then just does a straight pull rather than a > >> squashed > >> > > > pull. > >> > > > > > >> > > > > My understanding is that we need to keep a record of who > >> contributed > >> > to > >> > > > > what JIRA not that the actual commits reflect the actual code > >> > > contributed > >> > > > > (which can be quite difficult to disentangle, I imagine). > Please > >> > > correct > >> > > > > me if I'm wrong here, though. > >> > > > > > >> > > > > I'm being prescriptive because I'm worried that "do the right > >> thing" > >> > > will > >> > > > > result in us not doing the right thing, sadly. > >> > > > > > >> > > > > Thoughts? > >> > > > > > >> > > > > On Fri, Jan 27, 2017 at 6:03 PM, David Lyle < > dlyle65...@gmail.com> > >> > > > wrote: > >> > > > > > >> > > > > We may be trying to be too prescriptive, it won't always be > >> possible > >> > to > >> > > > > have a single commit per author (interleaved changes). > >> > > > > > >> > > > > Maybe > >> > > > > > >> > > > > If the contribution does not have a single author, please ask > that > >> > the > >> > > > > contribution be squashed by the author into the minimum number > of > >> > > commits > >> > > > > required to preserve correct attribution each following the > same > >> > > standard > >> > > > > naming convention of: JIRA Description (e.g. METRON-123: Foo > all > >> the > >> > > > > Bars). > >> > > > > > >> > > > > ? > >> > > > > > >> > > > > I really don't know- the real message is, do the right thing to > >> make > >> > > sure > >> > > > > everyone's contribution is duly recorded. So the commit log > would > >> be > >> > > > > something like > >> > > > > Author: 1 METRON-123 Foo all the bars - foo'd some bars > >> > > > > Author: 2 METRON-123 Foo all the bars - helped for a better foo > >> > > > > Author: 1 METRON-123 Fool all the bars - But, there was this > little > >> > > deal > >> > > > > > >> > > > > I think it's hard to cover all the cases, so maybe a statement > of > >> the > >> > > > > principle we're trying to maintain? > >> > > > > > >> > > > > -D... > >> > > > > > >> > > > > > >> > > > > On Fri, Jan 27, 2017 at 5:50 PM, Casey Stella < > ceste...@gmail.com> > >> > > > wrote: > >> > > > > > >> > > > > So, then the wording would be: > >> > > > > > >> > > > > If the contribution does not have a single author, please ask > that > >> > the > >> > > > > contribution be squashed by the author into one commit per > author > >> > > > > each following the same standard naming convention of: JIRA > >> > > > > Description (e.g. METRON-123: Foo all the Bars). > >> > > > > > >> > > > > Fair or did I get it wrong? > >> > > > > > >> > > > > On Fri, Jan 27, 2017 at 5:47 PM, David Lyle < > dlyle65...@gmail.com> > >> > > > > > >> > > > > wrote: > >> > > > > > >> > > > > > >> > > > > The bolded didn't come through, but I'm guessing it's here: > >> > > > > > >> > > > > *If the contribution does not have a single author, please ask > that > >> > > > > > >> > > > > the > >> > > > > > >> > > > > contribution be separated into multiple separate pull requests > (one > >> > > > > > >> > > > > per > >> > > > > > >> > > > > author with their contributions) that may have (non-cyclical) > >> > > > > > >> > > > > dependencies, > >> > > > > > >> > > > > each following the same standard naming convention of JIRA: > JIRA > >> > > > > Description (e.g. METRON-123: Foo all the Bars).* > >> > > > > > >> > > > > I'd recommend against multiple PRs. Creates all kinds of > >> difficulties > >> > > > > reviewing and testing. The changeset should be reviewed and > tested > >> as > >> > > > > > >> > > > > a > >> > > > > > >> > > > > single unit. If there is more than one author, simply don't > squash. > >> > > > > > >> > > > > -D... > >> > > > > > >> > > > > > >> > > > > On Fri, Jan 27, 2017 at 5:44 PM, Casey Stella < > ceste...@gmail.com> > >> > > > > > >> > > > > wrote: > >> > > > > > >> > > > > > >> > > > > It is important for every contribution to attribute the > author, our > >> > > > > > >> > > > > git > >> > > > > > >> > > > > log > >> > > > > > >> > > > > is a legal papertrail for attribution. As a consequence of > that, I > >> > > > > recommend the Development Guidelines section 1.2 be amended to > the > >> > > > > following (bolded change): > >> > > > > > >> > > > > Everyone is encouraged to review open pull requests. We only > ask > >> > > > > > >> > > > > that > >> > > > > > >> > > > > you > >> > > > > > >> > > > > try and think carefully, ask questions and are excellent to one > >> > > > > > >> > > > > another. > >> > > > > > >> > > > > Code review is our opportunity to share knowledge, design > ideas and > >> > > > > > >> > > > > make > >> > > > > > >> > > > > friends. The instructions on how to checkout a patch for review > >> > > > > > >> > > > > can be > >> > > > > > >> > > > > found here <https://github.com/nickwallen/metron-commit-stuff > >. > >> > > > > > >> > > > > When reviewing a patch try to keep each of these concepts in > mind: > >> > > > > > >> > > > > > >> > > > > > >> > > > > - Is the proposed change being made in the correct place? Is > it a > >> > > > > > >> > > > > fix > >> > > > > > >> > > > > in > >> > > > > > >> > > > > a backend when it should be in the primitives? In Kafka vs > >> > > > > > >> > > > > Storm? > >> > > > > > >> > > > > - What is the change being proposed? Is it based on Community > >> > > > > recognized issues? > >> > > > > - Do we want this feature or is the bug they’re fixing really a > >> > > > > > >> > > > > bug? > >> > > > > > >> > > > > - Does the change do what the author claims? > >> > > > > - Are there sufficient tests? > >> > > > > - Has it been documented? > >> > > > > - Will this change introduce new bugs? > >> > > > > - *Does this contribution have a single author?* > >> > > > > > >> > > > > > >> > > > > *If the contribution does not have a single author, please ask > that > >> > > > > > >> > > > > the > >> > > > > > >> > > > > contribution be separated into multiple separate pull requests > (one > >> > > > > > >> > > > > per > >> > > > > > >> > > > > author with their contributions) that may have (non-cyclical) > >> > > > > > >> > > > > dependencies, > >> > > > > > >> > > > > each following the same standard naming convention of JIRA: > JIRA > >> > > > > Description (e.g. METRON-123: Foo all the Bars).* > >> > > > > > >> > > > > Also, please review if the submitter correctly flagged impacts > to > >> > > > > > >> > > > > the > >> > > > > > >> > > > > following (if exist): > >> > > > > > >> > > > > - System Configuration Changes > >> > > > > - Metron Configuration > >> > > > > - Metron Component Configuration (sensors, etc) > >> > > > > - Tech Stack Configuration (Storm, Hbase, etc) > >> > > > > - Environmental Changes > >> > > > > - Helper Shell Scripts > >> > > > > - RPM Packaging > >> > > > > - Ansible, Ambari, AWS, Docker > >> > > > > - Documentation Impacts > >> > > > > - Changes to Wiki Documentation > >> > > > > - Revisions in Tutorials > >> > > > > - Developer Guide > >> > > > > - Expansions in readme's > >> > > > > - Changes to System Interfaces > >> > > > > - Stellar Shell > >> > > > > - REST APIs > >> > > > > - Etc... > >> > > > > > >> > > > > > >> > > > > Thoughts? Better way to handle this scenario while still > >> > > > > > >> > > > > maintaining > >> > > > > > >> > > > > unambiguous authorship? > >> > > > > > >> > > > > Casey > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > -- > >> > > > >> > > Jon > >> > > > >> > > Sent from my mobile device > >> > > > >> > > >> > > > > > -- > > Nick Allen <n...@nickallen.org> > > ------------------- > Thank you, > > James Sirota > PPMC- Apache Metron (Incubating) > jsirota AT apache DOT org > -- Jon Sent from my mobile device