Hi Daniel, I completely agree with wanting to consolidate, but I am very concerned about people needing to change their existing code. As Ping and Felix also said, I worry that people ignore deprecation warnings, especially since the original code authors don't really look at logs or other pipeline artifacts unless something is broken.
With respect to the options you outlined, I am personally leaning towards (4) i.e. "don't deprecate timetable, but remove schedule_on (AIP-48 has added it in main) and take List[Dataset] in `schedule_interval`". I could be convinced with (3), since I think that "timetable" is relatively recent and have not seen as much usage for this invocation yet. I am definitely curious about other perspectives on this as well. I have a hard time with option (1) of deprecating "schedule interval", because of the historical nature of it and the code changes that people may have to make to deal with it. Thanks for raising this topic and the discussion, Vikram On Thu, Aug 4, 2022 at 10:24 AM Daniel Standish <daniel.stand...@astronomer.io.invalid> wrote: > I agree with those (including myself) who have suggested that it would be > great to have a deprecation fix tool. > > I also think that UI notifications would be valuable. Another option > would be to provide a less intrusive indicator somewhere (like when chrome > lets you know you need to update), and then you click on it and it takes > you to a page that lists out all the deprecation warnings. > > > On Thu, Aug 4, 2022 at 1:03 AM Jarek Potiuk <ja...@potiuk.com> wrote: > >> I know we are distracting a bit from the main subject, but I think it's >> worth it. >> >> I think this is a fact of life that people don't remove deprecations. But >> we have a choice of "not doing anything about it" and complaining. Or doing >> something that might improve it. >> >> I usually err on the side of "doing something" when I see a problem. I am >> a doer :). >> >> And I have a possibly controversial proposal. I got the inspiration from >> node.js old versions. Whenever an old version of Node is being used, you >> have a 20 seconds delay artificially added and information that in order to >> remove those 20 seconds you need to upgrade. >> >> Why don't we do that for some really old/serious deprecations ? Maybe not >> 20 seconds. Maybe adding an unremovable RED warning in the UI if you have >> any deprecation unhandled. Something that is annoying enough that >> will cause people to make an action but not intrusive enough to break >> things. >> >> I think this is a very basic psychological trait: >> >> If it does not cost the user anything to leave the deprecations, there >> will be a big number of people who will do that. >> We need to make sure that deprecations are costing our users "a little" >> and that they are aware that they can remove the cost (and know how - this >> is super important that we only show such a message with clear instructions >> on what to do). >> >> Such a message might be annoying in many ways: >> >> * it can obstruct significant part of the screen >> * It can be flashy and had very annoying colours (think bold pink fonts >> on flash-green background for example) >> * It can not allow to do anything before the user closes it with a REALLY >> small 'x' that it is difficult to point at (think cookie banners). And when >> they click they can be greeted with "You know - you can actually remove >> this warning by doing THIS". >> * and we have probably more options - our UX specialists might have some >> good ideas here >> >> For anyone who complains about "people do not act on deprecation." - yeah >> this is true. >> But one universal truth there is that if we won't act, this will change >> nothing in the behaviour of our users - this is 100% certain. >> >> If we do something - it might improve it. We will not get a 100% success >> for sure, but we can at least do something to improve it. >> >> WDYT? >> >> J. >> >> >> On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall <felue...@pm.me.invalid> >> wrote: >> >>> Hi Ping, >>> >>> I had this idea about auto-fixing deprecation warnings too when I >>> started the airflint project. >>> I totally agree with what you said that often people ignore these >>> warnings and therefore a cli tool would come in very handy. It could look >>> similar to the airflint project and also be based on the python refactor >>> package, but I would keep it out of this discussion for now. >>> >>> Regarding the schedule arg, I don’t have a strong opinion there but >>> also tend to having one schedule arg, because of simplicity for the dag >>> author. >>> >>> Best, >>> Felix >>> >>> >>> >>> Sent from Proton Mail for iOS >>> >>> >>> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pin...@umich.edu> wrote: >>> >>> Hi Daniel, >>> >>> +1 for ‘schedule' and thanks for bringing this up. >>> >>> I agree with Vikram, we should be very careful about deprecating >>> existing params even if we have warnings around it. Not sure if this is a >>> general case, but I notice that people ignore warnings all the time until >>> it starts to break. Thus I am thinking if airflow can have a cli tool to >>> help users to batch update these deprecating params just like python's >>> futurize tool. This can greatly ease the burden of upgrading the airflow >>> version. >>> >>> >>> Thanks, >>> >>> Ping >>> >>> >>> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish >>> <daniel.stand...@astronomer.io.invalid> wrote: >>> >>>> In case you forgot about us @Vikram, gentle nudge >>>> >>>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish < >>>> daniel.stand...@astronomer.io> wrote: >>>> >>>>> Well, this is a discussion thread, so let's discuss! >>>>> >>>>> Vikram, what kind of implications are you thinking of? Maybe you could >>>>> provide a little more detail about your concerns? >>>>> >>>>> There are certainly alternatives, and of course each of them has >>>>> tradeoffs. >>>>> >>>>> Here are the options I see: >>>>> >>>>> 1. consolidate to schedule (deprecate schedule_interval and timetable; >>>>> the proposal under discussion in this thread) >>>>> 2. do nothing (keep 3 different scheduling params) >>>>> 3. consolidate to schedule_interval (deprecate timetable) >>>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has added >>>>> it in main) and take List[Dataset] in `schedule_interval` >>>>> >>>>> Which would you propose we go with? >>>>> >>>>> If we have a veto against 1 (and again, we're in hypothetical >>>>> territory since this is still a discussion thread), my vote would >>>>> definitely be to go with 3. >>>>> >>>>> The param `schedule_interval` as catch all is not *sooo *bad, but >>>>> `schedule` definitely seems better. From a new user's perspective, having >>>>> misleading names is confusing, and can be a negative when looking at >>>>> adoption. And for a tenured user, it can nag at you. And dealing with >>>>> deprecations as an Airflow user, well it's sort of part of the job, though >>>>> it can be bothersome, certainly. >>>>> >>>>> Just to look at / compare feasibility. >>>>> >>>>> Deprecating schedule_interval, it would be more prominent than some >>>>> other deprecations because it's a DAG param. That said, I think it's >>>>> pretty >>>>> straightforward to handle. For one, it's at the cardinality of DAG and not >>>>> *task*, so there is less to change with than some other deprecations. >>>>> And being kwargs-only, it's an easy find and replace that could be >>>>> automated with an `upgrade-check` script. It's very easy to detect, either >>>>> to warn about, or ultimately, after upgrade, to fail the DAG parse (which >>>>> forces user to fix). Perhaps we could even experiment with a deprecation >>>>> fix tool, so that users could get help fixing deprecations, not >>>>> necessarily >>>>> in the context of an upgrade, since we're not looking at 3.0 any time >>>>> soon. >>>>> >>>>> We could compare with execution_date, if execution_date is to be >>>>> removed in 3.0. I think the deprecation of execution_date will be a more >>>>> difficult thing for users to handle. For one, potentially higher >>>>> cardinality, since it is used in tasks. But also, the use of >>>>> execution_date >>>>> can be harder to detect, since it can be in templates and elsewhere in >>>>> custom operators. *(With schedule_interval, since it's a top-level >>>>> dag kwarg, it will be easier.)* And execution_date was replaced with >>>>> `logical_date` for similar reasons -- avoiding user confusion. >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka >>>>> <vik...@astronomer.io.invalid> wrote: >>>>> >>>>>> -1 from me. >>>>>> >>>>>> Though I agree in principle with the idea of consolidation, I don't >>>>>> think we should be doing this yet until we understand the implications >>>>>> completely. >>>>>> I am really not in favor of deprecation of the existing params, >>>>>> unless there is really no alternative. >>>>>> >>>>>> >>>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish >>>>>> <daniel.stand...@astronomer.io.invalid> wrote: >>>>>> >>>>>>> So far, seems all in favor. >>>>>>> >>>>>>> I will just highlight, in case it's not clear.... when we release >>>>>>> this (presumably 2.4), basically every single dag in existence will >>>>>>> start >>>>>>> emitting deprecation warnings, and prior to 3.0, basically every dag in >>>>>>> existence will need to be updated. >>>>>>> >>>>>>> Thankfully, for most people, this should be an easy update since DAG >>>>>>> is kwargs only, IIRC. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi >>>>>>> <br...@astronomer.io.invalid> wrote: >>>>>>> >>>>>>>> +1 to consolidating and "schedule". >>>>>>>> >>>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl >>>>>>>> <drew.h...@astronomer.io.invalid> wrote: >>>>>>>> >>>>>>>>> +1 for ‘schedule', and another +1 to the importance of >>>>>>>>> consolidating to one being more important than the name of that one >>>>>>>>> >>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell < >>>>>>>>> josh.d.f...@astronomer.io.INVALID> wrote: >>>>>>>>> >>>>>>>>> Consolidating to a single scheduling parameter is a big +1 from >>>>>>>>> me. `schedule` seems like a nice catch-all. >>>>>>>>> >>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham < >>>>>>>>> jedcunning...@apache.org> wrote: >>>>>>>>> >>>>>>>>>> +1 on moving to a single `schedule` param. >>>>>>>>>> >>>>>>>>> >>>>>>>>>