+1 for Sean's concerns and questions. Bests, Dongjoon.
On Fri, Mar 6, 2020 at 3:14 PM Sean Owen <sro...@gmail.com> wrote: > This thread established some good general principles, illustrated by a few > good examples. It didn't draw specific conclusions about what to add back, > which is why it wasn't at all controversial. What it means in specific > cases is where there may be disagreement, and that harder question hasn't > been addressed. > > The reverts I have seen so far seemed like the obvious one, but yes, there > are several more going on now, some pretty broad. I am not even sure what > all of them are. In addition to below, > https://github.com/apache/spark/pull/27839. Would it be too much overhead > to post to this thread any changes that one believes are endorsed by these > principles and perhaps a more strict interpretation of them now? It's > important enough we should get any data points or input, and now. (We're > obviously not going to debate each one.) A draft PR, or several, actually > sounds like a good vehicle for that -- as long as people know about them! > > Also, is there any usage data available to share? many arguments turn > around 'commonly used' but can we know that more concretely? > > Otherwise I think we'll back into implementing personal interpretations of > general principles, which is arguably the issue in the first place, even > when everyone believes in good faith in the same principles. > > > > On Fri, Mar 6, 2020 at 1:08 PM Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> Hi, All. >> >> Recently, reverting PRs seems to start to spread like the *well-known* >> virus. >> Can we finalize this first before doing unofficial personal decisions? >> Technically, this thread was not a vote and our website doesn't have a >> clear policy yet. >> >> https://github.com/apache/spark/pull/27821 >> [SPARK-25908][SQL][FOLLOW-UP] Add Back Multiple Removed APIs >> ==> This technically revert most of the SPARK-25908. >> >> https://github.com/apache/spark/pull/27835 >> Revert "[SPARK-25457][SQL] IntegralDivide returns data type of the >> operands" >> >> https://github.com/apache/spark/pull/27834 >> Revert [SPARK-24640][SQL] Return `NULL` from `size(NULL)` by default >> >> Bests, >> Dongjoon. >> >> On Thu, Mar 5, 2020 at 9:08 PM Dongjoon Hyun <dongjoon.h...@gmail.com> >> wrote: >> >>> Hi, All. >>> >>> There is a on-going Xiao's PR referencing this email. >>> >>> https://github.com/apache/spark/pull/27821 >>> >>> Bests, >>> Dongjoon. >>> >>> On Fri, Feb 28, 2020 at 11:20 AM Sean Owen <sro...@gmail.com> wrote: >>> >>>> On Fri, Feb 28, 2020 at 12:03 PM Holden Karau <hol...@pigscanfly.ca> >>>> wrote: >>>> >> 1. Could you estimate how many revert commits are required in >>>> `branch-3.0` for new rubric? >>>> >>>> Fair question about what actual change this implies for 3.0? so far it >>>> seems like some targeted, quite reasonable reverts. I don't think >>>> anyone's suggesting reverting loads of changes. >>>> >>>> >>>> >> 2. Are you going to revert all removed test cases for the >>>> deprecated ones? >>>> > This is a good point, making sure we keep the tests as well is >>>> important (worse than removing a deprecated API is shipping it broken),. >>>> >>>> (I'd say, yes of course! which seems consistent with what is happening >>>> now) >>>> >>>> >>>> >> 3. Does it make any delay for Apache Spark 3.0.0 release? >>>> >> (I believe it was previously scheduled on June before Spark >>>> Summit 2020) >>>> > >>>> > I think if we need to delay to make a better release this is ok, >>>> especially given our current preview releases being available to gather >>>> community feedback. >>>> >>>> Of course these things block 3.0 -- all the more reason to keep it >>>> specific and targeted -- but nothing so far seems inconsistent with >>>> finishing in a month or two. >>>> >>>> >>>> >> Although there was a discussion already, I want to make the >>>> following tough parts sure. >>>> >> 4. We are not going to add Scala 2.11 API, right? >>>> > I hope not. >>>> >> >>>> >> 5. We are not going to support Python 2.x in Apache Spark 3.1+, >>>> right? >>>> > I think doing that would be bad, it's already end of lifed elsewhere. >>>> >>>> Yeah this is an important subtext -- the valuable principles here >>>> could be interpreted in many different ways depending on how much you >>>> weight value of keeping APIs for compatibility vs value in simplifying >>>> Spark and pushing users to newer APIs more forcibly. They're all >>>> judgment calls, based on necessarily limited data about the universe >>>> of users. We can only go on rare direct user feedback, on feedback >>>> perhaps from vendors as proxies for a subset of users, and the general >>>> good faith judgment of committers who have lived Spark for years. >>>> >>>> My specific interpretation is that the standard is (correctly) >>>> tightening going forward, and retroactively a bit for 3.0. But, I do >>>> not think anyone is advocating for the logical extreme of, for >>>> example, maintaining Scala 2.11 compatibility indefinitely. I think >>>> that falls out readily from the rubric here: maintaining 2.11 >>>> compatibility is really quite painful if you ever support 2.13 too, >>>> for example. >>>> >>>