Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21060 I might not explain it well. Sorry for the misunderstanding. Thank you @rxin for helping me clarify my points. It sounds like many of you think this backport is fine. I am not against this specific PR. We do not need to revert the PR but just improve the documentation. That should be fine, although I still personally prefer to adding the configuration. As what I said in the original PR https://github.com/apache/spark/pull/21007 that was merged to master, let me point out two points here too. - PR descriptions will be part of the commit log. We need to be very careful before merging the PR. In the past, I also missed a few when I did the merge. To be honest, I am not sure how the native English speakers think. The first paragraph scared me when I reading the PR commit log. @srowen WDYT? ``` This PR proposes to add collect to a query executor as an action. ``` - Document the behavior changes that are visible to the external users/developers. In Spark 2.3, we started to enforce it in every merged PR. I believe many of you got multiple similar comments in the previous PRs. This PR should also upgrade the migration guides. @HyukjinKwon Do you agree? Before we finalize the backport policy, below is my inputs about the whitelist which we can backport: - The critical/important bug fixes and security fixes. - The regression fixes. - The PRs that do not touch the production code, like test-only patches, documentation fixes, and the log message fixes. Avoid backporting the PRs if it contains - The new features - The minor bug fixes/improvements that have external behavior changes - The code refactoring - The code changes with the high/mid risk In the OSS community, I believe no committer will be fired just because we merged/introduced a bug, right? If the users application failed due to an upgrade, normally we blame our users or the bug are just accidentally introduced. However, this is not acceptable in my first team. Let me share what I experienced. Just various customer accidents in my related product teams. - One director got demoted (almost fired) due to a bad release. She is a very nice lady. We really like her. That release had many cool features but the quality is not controlled well. Many customers are not willing to upgrade. - There is a famous system upgrade failure a few years ago. The whole system became very slow after the upgrade. It took 10s hours to recover the system. After a few days, the GM went to the customer site and got blamed in the whole day. Multiple architects and VPs were forced to write apology letters. Customers planned to sue us. In the customer side, the CTO got fired later and the upgrade accident was also on the national TV news because it affects many people. - A few directors were on call with me 10+ nights to resolve one Japanese customer data corruption issue. The client teams ran multiple systems at the same time to reproduce the issue. After a few weeks, it was finally resolved after reading the memory dump. The root cause is the code merge from one branch to another branch many years ago. If all the above people believes Spark is the best product in Big Data, we need to be more conservative. Our decisions could affect many people. This is not the first time I argued with the other committers/contributors about the PR quality. In one previous PR, I left almost 100 comments just because the documents are not accurate. If my above comments offend anyone, I apologize. Everyone has different understanding about the software development because we have different work experience. The whole community already did a wonderful job compared with the other open source projects. I still believe we can do a better job, right? Let us formalize the backport policy and enforce them in each release.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org