Hi Andrew, Sorry for increasing scope of this topic.
As a reviewer of your PR https://github.com/apache/arrow/pull/9598, I was not sure we can really keep following "The Apache Way" with the PR's approach. The workflow proposed by the PR creates an associated JIRA issue for a PR when the PR is merged. I was not sure we can ensure "make plans in the open" on the single point of truth platform (JIRA) with the workflow. I have a concern about it just satisfies "we have an associated JIRA issue for each non-trivial PR". (I think that https://github.com/apache/arrow/pull/9576 that is referred from the PR is a trivial PR. I don't think that we need an associated JIRA issue for it.) I haven't read all discussions yet but I hope that I can get an insight from them. Thanks all. Thanks, -- kou In <CAFhtnRw=thytfqekkxz_o95z8wvbyxec_ncwkgpyotc9znv...@mail.gmail.com> "Re: Requirements on JIRA usage in Apache Arrow" on Tue, 2 Mar 2021 16:57:37 -0500, Andrew Lamb <al...@influxdata.com> wrote: > To be clear, I have no objection to JIRA and I like the point that a little > friction on contribution may encourage more structured contribution and > planning. > > As Neil mentioned, my original goal with this thread was to understand any > non-technical reason that prevented automating the creation of JIRA tickets > from github PRs before I went and coded such a thing (plug: proposed change > to merge_pr.py is here[1] if anyone wants to review it) > > I did not intend to open a philosophical conversation on changing > communication patterns and behavior in the Arrow community at large, though > reading the viewpoints on this thread has been most enlightening. Thank you > all for the time you have taken to share your views. > > Andrew > > [1] https://github.com/apache/arrow/pull/9598 > > > On Tue, Mar 2, 2021 at 3:01 PM Wes McKinney <wesmck...@gmail.com> wrote: > >> With regards to the actual work of merging patches. I have merged >> 3,234 patches in this project [1], so I think this qualifies me to >> have an opinion about this. I don't think that the merge tool is >> problematic -- I use a little bash helper function [2] which cuts down >> my work merging a patch down to just a few keystrokes to the point >> where I hardly think about it anymore. I store my Jira username and >> password in environment variables — they aren't Bitcoin private keys >> so I think if they get compromised it's not a big deal. I believe it's >> worth the effort for every committer to streamline their process so >> they aren't having to jump through a bunch of hoops every time they >> merge a patch. >> >> As far as creating Jira issues for PR's that don't have them: >> >> * It seems like it should be straightforward to create a GitHub >> actions bot to create a Jira issue and paste the link into the GitHub >> PR so you don't have to use the web interface at all. >> * The one-time cost of asking a contributor to create an account and >> assign themselves to the issue is still going to be present, but the >> cost of this seems low to me. The project has had around 600 unique >> contributors in 5 years. Some of the administrative labor of adding >> users to roles could in theory be automated, but since it takes 30-60 >> seconds to do this, is it worth the automation effort to reduce this >> down to ~10 seconds or so? >> * We should really be encouraging contributors to be good citizens >> with respect to issue hygiene. If they care enough to write a PR but >> not enough to create an issue about it, is that good? >> * Spark uses the "[MINOR]" tag for PRs that don't merit an entry in >> the changelog, I'd be fine with adopting that formally and >> implementing it in the merge tool >> >> I could just be numb to the drudgery at this point but looking back on >> 5 years of work and thousands of issues I don't feel it's negatively >> impacted me, but I'm probably not a typical case. (Note: I >> automatically filter all Jira mail that doesn't have "mentioned you" >> in it into a separate folder and find it helpful to be able to search >> through it in gmail) >> >> That said, I'm open to migrating away from Jira ONLY if everyone else >> is committed to holding contributors to some reasonable standards of >> professionalism when it comes to communication. The worry I have is >> that we end up with a long queue of PRs having little context and not >> much planning / organization. This project has been successful not >> haphazardly/anarchically but precisely because of deliberate planning >> and organization over a long period of time, so I think we should try >> to respect the process that has gotten us to where we are now. What >> may seem like bureaucracy serves a valuable function to keep people >> organized and informed. >> >> - Wes >> >> [1]: https://gist.github.com/wesm/7058fd833861afc0d3306cdabe5b0a90 >> [2]: https://gist.github.com/wesm/17bc4cb8e7a6e5a715cb6de46d2e01e9 >> >> On Tue, Mar 2, 2021 at 1:44 PM Neal Richardson >> <neal.p.richard...@gmail.com> wrote: >> > >> > A few thoughts: >> > >> > * Given the cost of switching issue trackers (even if we were to identify >> > one we thought was better), I think it's extremely unlikely that we would >> > abandon JIRA. So rather than dumping on JIRA (an easy target, of course), >> > we should focus on how we can make the workflows we have smoother. >> > >> > * Our workflow assumes that a JIRA issue exists before a pull request >> does, >> > so it's awkward when you do it the other way, but we can add automation >> to >> > make it better. Perhaps this is exactly what Andrew Lamb is working on >> (the >> > original impetus for this discussion thread), but suppose for example we >> > had a PR comment bot action that would create a JIRA issue from a pull >> > request and ideally then rename the PR to match. Our current bot that >> > checks that every PR has a JIRA issue would just suggest that you make >> the >> > magic comment on the PR to create an issue. >> > >> > * FWIW for merging PRs, I just keep a terminal window open with the >> python >> > virtualenv active, and merging a PR just means hitting the up arrow and >> > changing the PR number from the previous command. You can set your >> > credentials in env vars if you find it burdensome to retype them. As >> > someone who merges lots of patches, I find this easy. It's less easy than >> > clicking a merge button in the web browser, but the extra checks and >> > confirmations it does have prevented me from merging bad code before, so >> I >> > think the extra friction actually serves a purpose here. >> > >> > Neal >> > >> > On Tue, Mar 2, 2021 at 11:21 AM Weston Pace <weston.p...@gmail.com> >> wrote: >> > >> > > It also seems like we're describing two different issues. The first, >> > > a barrier to entry for new development. The second, overhead imposed >> > > on an active developer. I'm personally not so worried about the >> > > overhead imposed, perhaps because I can't write code that fast >> > > anyways, so I'll stay out of that discussion. >> > > >> > > I think the barrier to entry is not so much "I don't know which issue >> > > tracker to use" or "I have to follow a bunch of steps" as it is "I'm >> > > pretty sure I can improve this but not 100% sure and I don't want to >> > > look like a fool and this is a huge code base and I'd need a lot of >> > > help getting started and I don't want to burden people." >> > > >> > > Also, I would challenge the fact that people born after the year 2000 >> > > are cognitively identical :) Someone born after 2000 treats email the >> > > same way people born after 1970 treat phone calls. Most gen Z I work >> > > with see gmail as an authentication tool and not a communication tool. >> > > >> > > I think Fernando's point about informal discussion is a good one. I >> > > don't think Github is the tool you'd want for this anyways. We have >> > > Zulip but it is not advertised (e.g. >> > > https://arrow.apache.org/community/). It's also heavily >> > > developer-centric and not user-centric at the moment. If we want >> > > something like that I'd be willing to help with the management / >> > > answering questions as I'm able. >> > > >> > > On Tue, Mar 2, 2021 at 9:11 AM Jorge Cardoso Leitão >> > > <jorgecarlei...@gmail.com> wrote: >> > > > >> > > > Hi Antoine, >> > > > >> > > > Can you expand a bit on this? In particular, which aspects of using >> > > > > JIRA feel bureaucratic? Is it the requirement to create a new >> issue >> > > > > for each PR? Or is it other concerns (such as the UI for entering >> or >> > > > > searching issues)? >> > > > > >> > > > >> > > > First of all, thank you for taking my concerns and actively trying to >> > > > understand them. >> > > > >> > > > It is advantageous for everyone to have small, focused PRs, as they >> are >> > > > easy to review and can narrow the discussion to a single problem. >> This >> > > also >> > > > makes it easy for new contributors to start. For this to work, we >> need a >> > > > system on which the work needed to create and merge PRs must be >> small in >> > > > absolute terms, as the "meat" of the PR may be small. *This* IMO is >> not >> > > > working. As a flavour, below is the usual process for a situation on >> > > which >> > > > while working on an issue, I found a side issue and need for PR it: >> > > > >> > > > 1. create the PR on github with the fix >> > > > 2. got an email from github that a bot commented that I must put the >> JIRA >> > > > issue. Got it, I forgot about that... >> > > > 3. go to JIRA, if not logged in, log in (3 clicks + some password >> manager >> > > > stuff, and be redirected to a random page) >> > > > 4. press "create issue" >> > > > 5. Fill content: >> > > > * type >> > > > * Summary (do not forget to add the component to the title) >> > > > * component >> > > > * assign myself >> > > > * description(*) >> > > > 6. press create. A small popup on JIRA will show that it was >> created. It >> > > is >> > > > really difficult to copy-paste the issue number from the pop up: >> > > > 7. Press on it before it disappears so that I can easily copy-paste >> its >> > > > number. I need to be fast, though: if it disappears before I press on >> > > it, I >> > > > will need to find it, which is a story on itself. >> > > > 8. go back to github and modify the title >> > > > >> > > > (*) I already wrote a description on the PR using markdown, which is >> > > when I >> > > > was first thinking about the PR itself. JIRA does not support >> markdown, >> > > so >> > > > I can't copy-paste. I now need to fight with the "visual" editor, or >> > > > remember what the notation is for the text. I also need to remember >> that >> > > > {{{ }}}, not backticks, is for code. I will likely leave that one >> empty. >> > > > >> > > > I am adding hiccups above because they do happen due to the mental >> > > workload >> > > > involved, even for someone quite proficient at this. >> > > > >> > > > Let's now assume that all is done and we can merge it. Let's assume >> for >> > > > simplicity that the PR was done by a contributor that is already a >> member >> > > > of JIRA and already has a contributor role on JIRA (the easy case). >> > > > >> > > > 1. Open a terminal and navigate to a clone of the arrow project that >> > > > already has a Python venv on it >> > > > 2. run `source venv/bin/activate` (our script has some external >> > > > requirements, thus we need this) >> > > > 3. run `dev/merge_pr.py PR number`. What was the PR number again? >> > > > 4. Go to github and copy the PR number >> > > > 5. paste on the terminal and press enter >> > > > 6. Now type my username from JIRA >> > > > 7. Now my password. I store all my passwords on a password manager >> with a >> > > > browser extension and often work from different VMs via ssh. Thus, I >> go >> > > to >> > > > JIRA on the browser, click on the extension and copy password >> > > > 8. paste on my terminal and press enter >> > > > 9. Assuming that no conflicts arise, press enter/yes 2 or 3 times >> and it >> > > is >> > > > merged and pushed. Great! >> > > > 10. When updating the JIRA issue, I noticed on the terminal that the >> > > > component is missing. Dam... >> > > > 11. press on the JIRA link on the terminal (and a possible new >> login) and >> > > > add the necessary components >> > > > 12. press enter on the terminal. Now we are done. >> > > > 13. Go to github and thank the contributor for the great work. >> > > > >> > > > IMO, these flows are large. They represent about the same time I >> would >> > > need >> > > > to create the small fix, a test, and PR it, including the PR >> description. >> > > > >> > > > Maybe some people have better flows than I do, but my understanding >> from >> > > > other committers is that these steps are more or less representative >> of a >> > > > good day (i.e. no merge conflicts, master passing, etc). >> > > > >> > > > A corollary problem is that this is not something only on committers >> / >> > > > PMCs' plate. Let's now go through the other side: I am a brand new >> > > > contributor. >> > > > >> > > > 1. create the PR on github because I found something to fix and >> fixed it >> > > > 2. got an email by a bot that I must follow some convention. >> > > > 3. okk, let me try to fix the title. Oh, I need an issue in JIRA... >> > > > 3. go to JIRA and there is no "create issue" option... >> > > > 4. somehow I figured that I need to create an account. >> > > > 5. There is no option to create a new account... >> > > > 6. Try pressing log in and see if it sends me somewhere. Now there >> is a >> > > > small button "Sign up". Finally, let's go. >> > > > 6. create an account: no SSO: I need write my name, create a new >> password >> > > > on my password manager and wait for an email. >> > > > 6. wait for verification email and validate >> > > > 7. press create issue >> > > > * type >> > > > * Summary (no idea I need to put the component on the title) >> > > > * Priority (no idea) >> > > > * Due date (no idea) >> > > > * component (no idea, let me search some keywords and see if >> something >> > > > matches: perfect, Rust) >> > > > * affected version (put latest) >> > > > * Add description. Can't use markdown, so some struggle here as >> > > > copy-pasting code usually does not work. Need to learn how to use >> this. >> > > > * More 5 fields or so, that I have no idea if it is expected of >> me to >> > > > fill or not. >> > > > 9. press create >> > > > 10. Pop up shows up: great, now I have the issue... wait, I forgot, >> why I >> > > > was creating the issue again? Pop up disappears. >> > > > 11. Ah, right, the issue number. Dam, how do I find it now? The pop >> up is >> > > > gone. >> > > > 12. Spend an arbitrary amount of time finding the issue.... finally, >> I >> > > > found it. >> > > > 13. Copy it over to the github, edit the title on github. >> > > > 14. Oh, I just received two emails from JIRA: someone already >> commented >> > > on >> > > > it?!?! >> > > > 15. Open email, and after some digging about it I conclude that: >> > > > * the first email is telling me that the title of the JIRA has >> been >> > > > updated >> > > > * the second email is telling me that there is now a PR >> associated >> > > with >> > > > the issue. Well, that is a bummer... >> > > > 16. That bot comment has not disappeared from github. Am I done? >> > > > 17. (some time later): oh, another email from JIRA: after 1m reading >> the >> > > > diff: ah, someone added some [DataFusion] to the title... >> > > > >> > > > Again, I am adding some clues of what I perceive to be a state of >> mind of >> > > > the person going through this flow, just to point out that IMO we are >> > > > talking about no small barrier here. I am also assuming an infinite >> > > > willpower. >> > > > >> > > > I do not find JIRA appealing, but I do not find it bad either. I do >> think >> > > > that the setup we have puts too much load on everyone and in my >> previous >> > > > email I tried to express that, for what is worth, that has caused me >> to >> > > > significantly reduce my contributions. I also tried formulating what >> I >> > > see >> > > > as the root cause for the current status quo (availability of >> information >> > > > from the project); I hope this one helps to clarify what I meant with >> > > "too >> > > > much". >> > > > >> > > > In my experience, discussion on JIRA is about the issue itself (for >> > > > > example diagnosing a bug or discussing a feature), then discussion >> on >> > > > > the PR is about the implementation. JIRA discussions are generally >> > > > > readable by users (and indeed, users often participate) while PR >> > > > > discussions are really for developers of the project. >> > > > > >> > > > >> > > > In my experience (from Rust alone), little discussion happens on >> JIRA. >> > > > Either on the PRs, google docs, or mailing list. Only one of them >> > > supports >> > > > markdown, though, which I consider a basic requirement for any >> product >> > > for >> > > > developers. >> > > > >> > > > FWIW, I've set up a mail filter that sends all "work logged" >> automated >> > > > > mail to the trashbin. I agree it's unfortunate that developers >> have to >> > > > > do that. I also have other qualms with the Apache JIRA >> configuration, >> > > > > such as the fact that "labels" (keywords) are shared between all >> > > > > projects, so there is essentially a million of them with no effort >> at >> > > > > taxonomy. >> > > > > >> > > > >> > > > I also struggle with both. I tried at some point adding "beginner >> > > friendly" >> > > > labels to issues that were so, but there are 5 variants of that >> label; >> > > if I >> > > > do not know which one to pick, how can I expect a *beginner* to know >> > > which >> > > > one to pick? >> > > > >> > > > Best, >> > > > Jorge >> > > > >> > > > On Tue, Mar 2, 2021 at 10:26 AM Antoine Pitrou <anto...@python.org> >> > > wrote: >> > > > >> > > > > >> > > > > Hi Jorge, >> > > > > >> > > > > On Tue, 2 Mar 2021 08:55:03 +0100 >> > > > > Jorge Cardoso Leitão <jorgecarlei...@gmail.com> wrote: >> > > > > > Hi, >> > > > > > >> > > > > > FWIW, the amount of bureaucracy that goes into JIRA is a major >> > > > > contributing >> > > > > > factor for the reduction of my time commitment to this project by >> > > 80%+. >> > > > > >> > > > > Can you expand a bit on this? In particular, which aspects of >> using >> > > > > JIRA feel bureaucratic? Is it the requirement to create a new >> issue >> > > > > for each PR? Or is it other concerns (such as the UI for entering >> or >> > > > > searching issues)? >> > > > > >> > > > > I can't say I like JIRA myself, but at least it provides the >> > > > > classification and navigation features that I would expect from an >> > > > > issue tracker. The Github issue tracker AFAIK is rudimentary and >> not >> > > > > really practical when a project has accumulated many issues (but >> they >> > > > > may have changed this recently). >> > > > > >> > > > > > The major challenge is that most discussions happen where PRs are >> > > created >> > > > > > and seen, which is on github, but JIRA and mailing list is used >> for >> > > other >> > > > > > types of decisions. In this model, how do we preserve curated >> > > information >> > > > > > about the decision process while at the same time leverage both >> JIRA >> > > and >> > > > > > github's capabilities? >> > > > > >> > > > > In my experience, discussion on JIRA is about the issue itself (for >> > > > > example diagnosing a bug or discussing a feature), then discussion >> on >> > > > > the PR is about the implementation. JIRA discussions are generally >> > > > > readable by users (and indeed, users often participate) while PR >> > > > > discussions are really for developers of the project. >> > > > > >> > > > > > OTOH, asking contributors to create a jira account >> > > > > > and committers to add the person as contributor, as well as the >> email >> > > > > spam >> > > > > > and the merge process is a large barrier. >> > > > > >> > > > > FWIW, I've set up a mail filter that sends all "work logged" >> automated >> > > > > mail to the trashbin. I agree it's unfortunate that developers >> have to >> > > > > do that. I also have other qualms with the Apache JIRA >> configuration, >> > > > > such as the fact that "labels" (keywords) are shared between all >> > > > > projects, so there is essentially a million of them with no effort >> at >> > > > > taxonomy. >> > > > > >> > > > > > IMO the foundation could be clearer wrt to what does it mean with >> > > > > > information being preserved and available (e.g. on apache >> servers?) >> > > and >> > > > > if >> > > > > > yes, follow it through by hosting all their projects on their own >> > > github >> > > > > / >> > > > > > gitlab / whatever, where issues and PRs are on the same >> platform, and >> > > > > offer >> > > > > > SSO for contributors as a way to prove identity across the >> system. >> > > But >> > > > > that >> > > > > > is also a complex operation with a lot of unknowns... >> > > > > >> > > > > From what I see of the ASF's velocity, I wouldn't expect such a >> large >> > > > > breakthrough in the short future. >> > > > > >> > > > > (this is not trying to badmouth the ASF, just a pragmatic >> evaluation) >> > > > > >> > > > > Regards >> > > > > >> > > > > Antoine. >> > > > > >> > > > > >> > > > > >> > > >>