Hi devs, I know we're busy with making Spark 3.0 be out, but I think the topic is good to discuss at any time and actually be better to be resolved sooner than later.
In the page "Contributing to Spark", we describe the guide of "affects version" as "For Bugs, assign at least one version that is known to exhibit the problem or need the change". For me, that sentence clearly describes minimal requirement of affects version via: * For the type of bug, assign one valid version * For other types, there's no requirement but I'm seeing the requests more than the requirement which makes me think there might be different understanding of the sentence. Maybe there's more, but to summarize on such requests: 1) add affects version as same as master branch for improvement/new feature 2) check with older versions to fill up affects version for bug I don't see any point on doing 1). It might give some context if we don't update the affect version (so that it can say which version was considered when filing JIRA issue) but we also update the affect version when we bump the master branch, which is no longer informational as the version should have been always the same as master branch. I agree it's ideal to do 2) but I think the reason the guide doesn't enforce is that it requires pretty much efforts to check with old versions (sometimes even more than origin work). Suppose the happy case we have UT to verify the bugfix which fails without the patch and passes with the patch. To check with older versions we have to checkout the tag, and apply the UT, and "rebuild", and run UT to verify which is pretty much time-consuming. What if there's a conflict indeed? That's still a happy case, and in worse case (there's no such UT) we should do E2E manual verification which I would give up. There should have some balance/threshold, and the balance should be the thing the community has a consensus. Would like to hear everyone's voice on this. Thanks, Jungtaek Lim (HeartSaVioR)