I'm not in favor of reusing `schedule_interval` for datasets (or folding `timetable` into it either), primarily because it doesn't feel like a good name to describe what it does. If we don't have the appetite to consolidate, my vote is we stick with 3 separate params. So preferably #1, if not then #2 for me.
I'm less concerned about the "everyone has to update every DAG" issue, as it only becomes a problem for Airflow 3, folks can freely ignore it until then, and when they do update it it's a simple find/replace. On Thu, Aug 4, 2022 at 2:17 PM Andrew Gibbs <a.r.gibbs.massma...@gmail.com> wrote: > I'll just +1 that IMO a UI reminder is a great option. > > 2.3.0 added an alert when robots.txt was being scanned, which was great, > because I ignored it for a while, but eventually caved and looked into it. > This I would see as more of the same. I currently get loads of warnings > about things in my config file that have moved; but typically only if I > have to do a `airflow db upgrade` or `airflow dags backfill`. I see the > alert, think "I must fix those", and then the Homer effect kicks in, and > something else takes up the space in my brain. Having them as a banner in > the GUI would 100% get me to sort it sooner or later. > > On Thu, Aug 4, 2022 at 9:32 PM Vikram Koka <vik...@astronomer.io.invalid> > wrote: > >> Hi Daniel, >> >> I completely agree with wanting to consolidate, but I am very concerned >> about people needing to change their existing code. >> As Ping and Felix also said, I worry that people ignore deprecation >> warnings, especially since the original code authors don't really look at >> logs or other pipeline artifacts unless something is broken. >> >> With respect to the options you outlined, I am personally leaning towards >> (4) i.e. "don't deprecate timetable, but remove schedule_on (AIP-48 has >> added it in main) and take List[Dataset] in `schedule_interval`". I could >> be convinced with (3), since I think that "timetable" is relatively recent >> and have not seen as much usage for this invocation yet. I am definitely >> curious about other perspectives on this as well. >> >> I have a hard time with option (1) of deprecating "schedule interval", >> because of the historical nature of it and the code changes that people may >> have to make to deal with it. >> >> Thanks for raising this topic and the discussion, >> >> Vikram >> >> >> On Thu, Aug 4, 2022 at 10:24 AM Daniel Standish >> <daniel.stand...@astronomer.io.invalid> wrote: >> >>> I agree with those (including myself) who have suggested that it would >>> be great to have a deprecation fix tool. >>> >>> I also think that UI notifications would be valuable. Another option >>> would be to provide a less intrusive indicator somewhere (like when chrome >>> lets you know you need to update), and then you click on it and it takes >>> you to a page that lists out all the deprecation warnings. >>> >>> >>> On Thu, Aug 4, 2022 at 1:03 AM Jarek Potiuk <ja...@potiuk.com> wrote: >>> >>>> I know we are distracting a bit from the main subject, but I think it's >>>> worth it. >>>> >>>> I think this is a fact of life that people don't remove deprecations. >>>> But we have a choice of "not doing anything about it" and complaining. Or >>>> doing something that might improve it. >>>> >>>> I usually err on the side of "doing something" when I see a problem. I >>>> am a doer :). >>>> >>>> And I have a possibly controversial proposal. I got the inspiration >>>> from node.js old versions. Whenever an old version of Node is being used, >>>> you have a 20 seconds delay artificially added and information that in >>>> order to remove those 20 seconds you need to upgrade. >>>> >>>> Why don't we do that for some really old/serious deprecations ? Maybe >>>> not 20 seconds. Maybe adding an unremovable RED warning in the UI if you >>>> have any deprecation unhandled. Something that is annoying enough that >>>> will cause people to make an action but not intrusive enough to break >>>> things. >>>> >>>> I think this is a very basic psychological trait: >>>> >>>> If it does not cost the user anything to leave the deprecations, there >>>> will be a big number of people who will do that. >>>> We need to make sure that deprecations are costing our users "a little" >>>> and that they are aware that they can remove the cost (and know how - this >>>> is super important that we only show such a message with clear instructions >>>> on what to do). >>>> >>>> Such a message might be annoying in many ways: >>>> >>>> * it can obstruct significant part of the screen >>>> * It can be flashy and had very annoying colours (think bold pink fonts >>>> on flash-green background for example) >>>> * It can not allow to do anything before the user closes it with a >>>> REALLY small 'x' that it is difficult to point at (think cookie banners). >>>> And when they click they can be greeted with "You know - you can actually >>>> remove this warning by doing THIS". >>>> * and we have probably more options - our UX specialists might have >>>> some good ideas here >>>> >>>> For anyone who complains about "people do not act on deprecation." - >>>> yeah this is true. >>>> But one universal truth there is that if we won't act, this will change >>>> nothing in the behaviour of our users - this is 100% certain. >>>> >>>> If we do something - it might improve it. We will not get a 100% >>>> success for sure, but we can at least do something to improve it. >>>> >>>> WDYT? >>>> >>>> J. >>>> >>>> >>>> On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall <felue...@pm.me.invalid> >>>> wrote: >>>> >>>>> Hi Ping, >>>>> >>>>> I had this idea about auto-fixing deprecation warnings too when I >>>>> started the airflint project. >>>>> I totally agree with what you said that often people ignore these >>>>> warnings and therefore a cli tool would come in very handy. It could look >>>>> similar to the airflint project and also be based on the python refactor >>>>> package, but I would keep it out of this discussion for now. >>>>> >>>>> Regarding the schedule arg, I don’t have a strong opinion there but >>>>> also tend to having one schedule arg, because of simplicity for the >>>>> dag author. >>>>> >>>>> Best, >>>>> Felix >>>>> >>>>> >>>>> >>>>> Sent from Proton Mail for iOS >>>>> >>>>> >>>>> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pin...@umich.edu> wrote: >>>>> >>>>> Hi Daniel, >>>>> >>>>> +1 for ‘schedule' and thanks for bringing this up. >>>>> >>>>> I agree with Vikram, we should be very careful about deprecating >>>>> existing params even if we have warnings around it. Not sure if this is a >>>>> general case, but I notice that people ignore warnings all the time until >>>>> it starts to break. Thus I am thinking if airflow can have a cli tool to >>>>> help users to batch update these deprecating params just like python's >>>>> futurize tool. This can greatly ease the burden of upgrading the airflow >>>>> version. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Ping >>>>> >>>>> >>>>> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish >>>>> <daniel.stand...@astronomer.io.invalid> wrote: >>>>> >>>>>> In case you forgot about us @Vikram, gentle nudge >>>>>> >>>>>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish < >>>>>> daniel.stand...@astronomer.io> wrote: >>>>>> >>>>>>> Well, this is a discussion thread, so let's discuss! >>>>>>> >>>>>>> Vikram, what kind of implications are you thinking of? Maybe you >>>>>>> could provide a little more detail about your concerns? >>>>>>> >>>>>>> There are certainly alternatives, and of course each of them has >>>>>>> tradeoffs. >>>>>>> >>>>>>> Here are the options I see: >>>>>>> >>>>>>> 1. consolidate to schedule (deprecate schedule_interval and >>>>>>> timetable; the proposal under discussion in this thread) >>>>>>> 2. do nothing (keep 3 different scheduling params) >>>>>>> 3. consolidate to schedule_interval (deprecate timetable) >>>>>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has >>>>>>> added it in main) and take List[Dataset] in `schedule_interval` >>>>>>> >>>>>>> Which would you propose we go with? >>>>>>> >>>>>>> If we have a veto against 1 (and again, we're in hypothetical >>>>>>> territory since this is still a discussion thread), my vote would >>>>>>> definitely be to go with 3. >>>>>>> >>>>>>> The param `schedule_interval` as catch all is not *sooo *bad, but >>>>>>> `schedule` definitely seems better. From a new user's perspective, >>>>>>> having >>>>>>> misleading names is confusing, and can be a negative when looking at >>>>>>> adoption. And for a tenured user, it can nag at you. And dealing with >>>>>>> deprecations as an Airflow user, well it's sort of part of the job, >>>>>>> though >>>>>>> it can be bothersome, certainly. >>>>>>> >>>>>>> Just to look at / compare feasibility. >>>>>>> >>>>>>> Deprecating schedule_interval, it would be more prominent than some >>>>>>> other deprecations because it's a DAG param. That said, I think it's >>>>>>> pretty >>>>>>> straightforward to handle. For one, it's at the cardinality of DAG and >>>>>>> not >>>>>>> *task*, so there is less to change with than some other >>>>>>> deprecations. And being kwargs-only, it's an easy find and replace that >>>>>>> could be automated with an `upgrade-check` script. It's very easy to >>>>>>> detect, either to warn about, or ultimately, after upgrade, to fail the >>>>>>> DAG >>>>>>> parse (which forces user to fix). Perhaps we could even experiment with >>>>>>> a >>>>>>> deprecation fix tool, so that users could get help fixing deprecations, >>>>>>> not >>>>>>> necessarily in the context of an upgrade, since we're not looking at 3.0 >>>>>>> any time soon. >>>>>>> >>>>>>> We could compare with execution_date, if execution_date is to be >>>>>>> removed in 3.0. I think the deprecation of execution_date will be a more >>>>>>> difficult thing for users to handle. For one, potentially higher >>>>>>> cardinality, since it is used in tasks. But also, the use of >>>>>>> execution_date >>>>>>> can be harder to detect, since it can be in templates and elsewhere in >>>>>>> custom operators. *(With schedule_interval, since it's a top-level >>>>>>> dag kwarg, it will be easier.)* And execution_date was replaced >>>>>>> with `logical_date` for similar reasons -- avoiding user confusion. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka >>>>>>> <vik...@astronomer.io.invalid> wrote: >>>>>>> >>>>>>>> -1 from me. >>>>>>>> >>>>>>>> Though I agree in principle with the idea of consolidation, I don't >>>>>>>> think we should be doing this yet until we understand the implications >>>>>>>> completely. >>>>>>>> I am really not in favor of deprecation of the existing params, >>>>>>>> unless there is really no alternative. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish >>>>>>>> <daniel.stand...@astronomer.io.invalid> wrote: >>>>>>>> >>>>>>>>> So far, seems all in favor. >>>>>>>>> >>>>>>>>> I will just highlight, in case it's not clear.... when we release >>>>>>>>> this (presumably 2.4), basically every single dag in existence will >>>>>>>>> start >>>>>>>>> emitting deprecation warnings, and prior to 3.0, basically every dag >>>>>>>>> in >>>>>>>>> existence will need to be updated. >>>>>>>>> >>>>>>>>> Thankfully, for most people, this should be an easy update since >>>>>>>>> DAG is kwargs only, IIRC. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi >>>>>>>>> <br...@astronomer.io.invalid> wrote: >>>>>>>>> >>>>>>>>>> +1 to consolidating and "schedule". >>>>>>>>>> >>>>>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl >>>>>>>>>> <drew.h...@astronomer.io.invalid> wrote: >>>>>>>>>> >>>>>>>>>>> +1 for ‘schedule', and another +1 to the importance of >>>>>>>>>>> consolidating to one being more important than the name of that one >>>>>>>>>>> >>>>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell < >>>>>>>>>>> josh.d.f...@astronomer.io.INVALID> wrote: >>>>>>>>>>> >>>>>>>>>>> Consolidating to a single scheduling parameter is a big +1 from >>>>>>>>>>> me. `schedule` seems like a nice catch-all. >>>>>>>>>>> >>>>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham < >>>>>>>>>>> jedcunning...@apache.org> wrote: >>>>>>>>>>> >>>>>>>>>>>> +1 on moving to a single `schedule` param. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>