Fair points all. I withdraw my objection and agree with the consolidation proposal.
Vikram On Fri, Aug 5, 2022 at 10:06 AM Jarek Potiuk <ja...@potiuk.com> wrote: > Let me also add to why I think we should consolidate to one param:: > > * "schedule_interval" > * "schedule_on" > * "timetable" > > Imagine you are a new user coming to Airflow. > > Do you immediately recognise which one is which ? Do you see what the > difference is? > > * "Schedule_on" = "Every Sunday" is equally good as "schedule_on" > "dataset". > * "timetable" - what even is a timetable? Is it a "cron TAB" ? Or what? > > For me even if we wanted to have different names, the current set of names > for those params is just terribly ambiguous. > > Making it "schedule" and acting depending on what you pass to it is so > much better and simpler. > > J. > > > On Fri, Aug 5, 2022 at 6:57 PM Daniel Standish > <daniel.stand...@astronomer.io.invalid> wrote: > >> @Vikram, yeah, I understand that concern. Making a breaking change is >> something we shouldn't do carelessly, without consideration, without good >> reason. But I think we have a good reason here. >> >> Deprecations are somewhat regular in airflow. In 2.0 we moved a lot of >> imports. We also changed the CLI semantics. Probably there were other >> breaking changes that I'm forgetting *(granted sometimes we didn't >> remove the backward-compatibility immediately in 2.0, and that remains an >> option here)*. Probably just about every dag in existence had to be >> changed then too. And that's what a major release is for. >> >> In 3.0, it looks like we're deprecating execution_date... That's even >> bigger. So, were we to do this, (1) moving to `schedule` wouldn't be the >> only notable breaking change and (2) by comparison it's a pretty simple >> change. And if, as part of the upgrade to a major release, you have to do >> renames and such anyway, the marginal cost of one more isn't massive. >> >> I guess my take is that we can't be chained to our previous decisions >> forever. And we have to have the flexibility to adapt our interfaces so >> that they still make sense and are consistent and don't cause excessive >> eyebrow wrinkling. We have adopted semantic versioning to provide a >> mechanism for signaling when breaking changes are coming. And we can add >> tooling to help users along the way. >> >> Additionally, we can't *only *look at the burden of such a change for >> users. We must also weigh this against the burden that we maintain by >> *not* changing, i.e. by having too many params, or ill-named params. >> >> >> On Thu, Aug 4, 2022 at 2:57 PM Jed Cunningham <jedcunning...@apache.org> >> wrote: >> >>> I'm not in favor of reusing `schedule_interval` for datasets (or folding >>> `timetable` into it either), primarily because it doesn't feel like a good >>> name to describe what it does. If we don't have the appetite to >>> consolidate, my vote is we stick with 3 separate params. So preferably #1, >>> if not then #2 for me. >>> >>> I'm less concerned about the "everyone has to update every DAG" issue, >>> as it only becomes a problem for Airflow 3, folks can freely ignore it >>> until then, and when they do update it it's a simple find/replace. >>> >>> >>> On Thu, Aug 4, 2022 at 2:17 PM Andrew Gibbs < >>> a.r.gibbs.massma...@gmail.com> wrote: >>> >>>> I'll just +1 that IMO a UI reminder is a great option. >>>> >>>> 2.3.0 added an alert when robots.txt was being scanned, which was >>>> great, because I ignored it for a while, but eventually caved and looked >>>> into it. This I would see as more of the same. I currently get loads of >>>> warnings about things in my config file that have moved; but typically only >>>> if I have to do a `airflow db upgrade` or `airflow dags backfill`. I see >>>> the alert, think "I must fix those", and then the Homer effect kicks in, >>>> and something else takes up the space in my brain. Having them as a banner >>>> in the GUI would 100% get me to sort it sooner or later. >>>> >>>> On Thu, Aug 4, 2022 at 9:32 PM Vikram Koka <vik...@astronomer.io.invalid> >>>> wrote: >>>> >>>>> Hi Daniel, >>>>> >>>>> I completely agree with wanting to consolidate, but I am very >>>>> concerned about people needing to change their existing code. >>>>> As Ping and Felix also said, I worry that people ignore deprecation >>>>> warnings, especially since the original code authors don't really look at >>>>> logs or other pipeline artifacts unless something is broken. >>>>> >>>>> With respect to the options you outlined, I am personally leaning >>>>> towards (4) i.e. "don't deprecate timetable, but remove schedule_on >>>>> (AIP-48 >>>>> has added it in main) and take List[Dataset] in `schedule_interval`". I >>>>> could be convinced with (3), since I think that "timetable" is relatively >>>>> recent and have not seen as much usage for this invocation yet. I am >>>>> definitely curious about other perspectives on this as well. >>>>> >>>>> I have a hard time with option (1) of deprecating "schedule interval", >>>>> because of the historical nature of it and the code changes that people >>>>> may >>>>> have to make to deal with it. >>>>> >>>>> Thanks for raising this topic and the discussion, >>>>> >>>>> Vikram >>>>> >>>>> >>>>> On Thu, Aug 4, 2022 at 10:24 AM Daniel Standish >>>>> <daniel.stand...@astronomer.io.invalid> wrote: >>>>> >>>>>> I agree with those (including myself) who have suggested that it >>>>>> would be great to have a deprecation fix tool. >>>>>> >>>>>> I also think that UI notifications would be valuable. Another option >>>>>> would be to provide a less intrusive indicator somewhere (like when >>>>>> chrome >>>>>> lets you know you need to update), and then you click on it and it takes >>>>>> you to a page that lists out all the deprecation warnings. >>>>>> >>>>>> >>>>>> On Thu, Aug 4, 2022 at 1:03 AM Jarek Potiuk <ja...@potiuk.com> wrote: >>>>>> >>>>>>> I know we are distracting a bit from the main subject, but I think >>>>>>> it's worth it. >>>>>>> >>>>>>> I think this is a fact of life that people don't remove >>>>>>> deprecations. But we have a choice of "not doing anything about it" and >>>>>>> complaining. Or doing something that might improve it. >>>>>>> >>>>>>> I usually err on the side of "doing something" when I see a problem. >>>>>>> I am a doer :). >>>>>>> >>>>>>> And I have a possibly controversial proposal. I got the inspiration >>>>>>> from node.js old versions. Whenever an old version of Node is being >>>>>>> used, >>>>>>> you have a 20 seconds delay artificially added and information that in >>>>>>> order to remove those 20 seconds you need to upgrade. >>>>>>> >>>>>>> Why don't we do that for some really old/serious deprecations ? >>>>>>> Maybe not 20 seconds. Maybe adding an unremovable RED warning in the UI >>>>>>> if >>>>>>> you have any deprecation unhandled. Something that is annoying enough >>>>>>> that >>>>>>> will cause people to make an action but not intrusive enough to break >>>>>>> things. >>>>>>> >>>>>>> I think this is a very basic psychological trait: >>>>>>> >>>>>>> If it does not cost the user anything to leave the deprecations, >>>>>>> there will be a big number of people who will do that. >>>>>>> We need to make sure that deprecations are costing our users "a >>>>>>> little" and that they are aware that they can remove the cost (and know >>>>>>> how >>>>>>> - this is super important that we only show such a message with clear >>>>>>> instructions on what to do). >>>>>>> >>>>>>> Such a message might be annoying in many ways: >>>>>>> >>>>>>> * it can obstruct significant part of the screen >>>>>>> * It can be flashy and had very annoying colours (think bold pink >>>>>>> fonts on flash-green background for example) >>>>>>> * It can not allow to do anything before the user closes it with a >>>>>>> REALLY small 'x' that it is difficult to point at (think cookie >>>>>>> banners). >>>>>>> And when they click they can be greeted with "You know - you can >>>>>>> actually >>>>>>> remove this warning by doing THIS". >>>>>>> * and we have probably more options - our UX specialists might have >>>>>>> some good ideas here >>>>>>> >>>>>>> For anyone who complains about "people do not act on deprecation." - >>>>>>> yeah this is true. >>>>>>> But one universal truth there is that if we won't act, this will >>>>>>> change nothing in the behaviour of our users - this is 100% certain. >>>>>>> >>>>>>> If we do something - it might improve it. We will not get a 100% >>>>>>> success for sure, but we can at least do something to improve it. >>>>>>> >>>>>>> WDYT? >>>>>>> >>>>>>> J. >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall >>>>>>> <felue...@pm.me.invalid> wrote: >>>>>>> >>>>>>>> Hi Ping, >>>>>>>> >>>>>>>> I had this idea about auto-fixing deprecation warnings too when I >>>>>>>> started the airflint project. >>>>>>>> I totally agree with what you said that often people ignore these >>>>>>>> warnings and therefore a cli tool would come in very handy. It could >>>>>>>> look >>>>>>>> similar to the airflint project and also be based on the python >>>>>>>> refactor >>>>>>>> package, but I would keep it out of this discussion for now. >>>>>>>> >>>>>>>> Regarding the schedule arg, I don’t have a strong opinion there but >>>>>>>> also tend to having one schedule arg, because of simplicity for >>>>>>>> the dag author. >>>>>>>> >>>>>>>> Best, >>>>>>>> Felix >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Sent from Proton Mail for iOS >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pin...@umich.edu> wrote: >>>>>>>> >>>>>>>> Hi Daniel, >>>>>>>> >>>>>>>> +1 for ‘schedule' and thanks for bringing this up. >>>>>>>> >>>>>>>> I agree with Vikram, we should be very careful about deprecating >>>>>>>> existing params even if we have warnings around it. Not sure if this >>>>>>>> is a >>>>>>>> general case, but I notice that people ignore warnings all the time >>>>>>>> until >>>>>>>> it starts to break. Thus I am thinking if airflow can have a cli tool >>>>>>>> to >>>>>>>> help users to batch update these deprecating params just like python's >>>>>>>> futurize tool. This can greatly ease the burden of upgrading the >>>>>>>> airflow >>>>>>>> version. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Ping >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish >>>>>>>> <daniel.stand...@astronomer.io.invalid> wrote: >>>>>>>> >>>>>>>>> In case you forgot about us @Vikram, gentle nudge >>>>>>>>> >>>>>>>>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish < >>>>>>>>> daniel.stand...@astronomer.io> wrote: >>>>>>>>> >>>>>>>>>> Well, this is a discussion thread, so let's discuss! >>>>>>>>>> >>>>>>>>>> Vikram, what kind of implications are you thinking of? Maybe you >>>>>>>>>> could provide a little more detail about your concerns? >>>>>>>>>> >>>>>>>>>> There are certainly alternatives, and of course each of them has >>>>>>>>>> tradeoffs. >>>>>>>>>> >>>>>>>>>> Here are the options I see: >>>>>>>>>> >>>>>>>>>> 1. consolidate to schedule (deprecate schedule_interval and >>>>>>>>>> timetable; the proposal under discussion in this thread) >>>>>>>>>> 2. do nothing (keep 3 different scheduling params) >>>>>>>>>> 3. consolidate to schedule_interval (deprecate timetable) >>>>>>>>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has >>>>>>>>>> added it in main) and take List[Dataset] in `schedule_interval` >>>>>>>>>> >>>>>>>>>> Which would you propose we go with? >>>>>>>>>> >>>>>>>>>> If we have a veto against 1 (and again, we're in hypothetical >>>>>>>>>> territory since this is still a discussion thread), my vote would >>>>>>>>>> definitely be to go with 3. >>>>>>>>>> >>>>>>>>>> The param `schedule_interval` as catch all is not *sooo *bad, >>>>>>>>>> but `schedule` definitely seems better. From a new user's >>>>>>>>>> perspective, >>>>>>>>>> having misleading names is confusing, and can be a negative when >>>>>>>>>> looking at >>>>>>>>>> adoption. And for a tenured user, it can nag at you. And dealing with >>>>>>>>>> deprecations as an Airflow user, well it's sort of part of the job, >>>>>>>>>> though >>>>>>>>>> it can be bothersome, certainly. >>>>>>>>>> >>>>>>>>>> Just to look at / compare feasibility. >>>>>>>>>> >>>>>>>>>> Deprecating schedule_interval, it would be more prominent than >>>>>>>>>> some other deprecations because it's a DAG param. That said, I think >>>>>>>>>> it's >>>>>>>>>> pretty straightforward to handle. For one, it's at the cardinality >>>>>>>>>> of DAG >>>>>>>>>> and not *task*, so there is less to change with than some other >>>>>>>>>> deprecations. And being kwargs-only, it's an easy find and replace >>>>>>>>>> that >>>>>>>>>> could be automated with an `upgrade-check` script. It's very easy to >>>>>>>>>> detect, either to warn about, or ultimately, after upgrade, to fail >>>>>>>>>> the DAG >>>>>>>>>> parse (which forces user to fix). Perhaps we could even experiment >>>>>>>>>> with a >>>>>>>>>> deprecation fix tool, so that users could get help fixing >>>>>>>>>> deprecations, not >>>>>>>>>> necessarily in the context of an upgrade, since we're not looking at >>>>>>>>>> 3.0 >>>>>>>>>> any time soon. >>>>>>>>>> >>>>>>>>>> We could compare with execution_date, if execution_date is to be >>>>>>>>>> removed in 3.0. I think the deprecation of execution_date will be a >>>>>>>>>> more >>>>>>>>>> difficult thing for users to handle. For one, potentially higher >>>>>>>>>> cardinality, since it is used in tasks. But also, the use of >>>>>>>>>> execution_date >>>>>>>>>> can be harder to detect, since it can be in templates and elsewhere >>>>>>>>>> in >>>>>>>>>> custom operators. *(With schedule_interval, since it's a >>>>>>>>>> top-level dag kwarg, it will be easier.)* And execution_date was >>>>>>>>>> replaced with `logical_date` for similar reasons -- avoiding user >>>>>>>>>> confusion. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka >>>>>>>>>> <vik...@astronomer.io.invalid> wrote: >>>>>>>>>> >>>>>>>>>>> -1 from me. >>>>>>>>>>> >>>>>>>>>>> Though I agree in principle with the idea of consolidation, I >>>>>>>>>>> don't think we should be doing this yet until we understand the >>>>>>>>>>> implications completely. >>>>>>>>>>> I am really not in favor of deprecation of the existing params, >>>>>>>>>>> unless there is really no alternative. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish >>>>>>>>>>> <daniel.stand...@astronomer.io.invalid> wrote: >>>>>>>>>>> >>>>>>>>>>>> So far, seems all in favor. >>>>>>>>>>>> >>>>>>>>>>>> I will just highlight, in case it's not clear.... when we >>>>>>>>>>>> release this (presumably 2.4), basically every single dag in >>>>>>>>>>>> existence will >>>>>>>>>>>> start emitting deprecation warnings, and prior to 3.0, basically >>>>>>>>>>>> every dag >>>>>>>>>>>> in existence will need to be updated. >>>>>>>>>>>> >>>>>>>>>>>> Thankfully, for most people, this should be an easy update >>>>>>>>>>>> since DAG is kwargs only, IIRC. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi >>>>>>>>>>>> <br...@astronomer.io.invalid> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> +1 to consolidating and "schedule". >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl >>>>>>>>>>>>> <drew.h...@astronomer.io.invalid> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> +1 for ‘schedule', and another +1 to the importance of >>>>>>>>>>>>>> consolidating to one being more important than the name of that >>>>>>>>>>>>>> one >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell < >>>>>>>>>>>>>> josh.d.f...@astronomer.io.INVALID> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Consolidating to a single scheduling parameter is a big +1 >>>>>>>>>>>>>> from me. `schedule` seems like a nice catch-all. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham < >>>>>>>>>>>>>> jedcunning...@apache.org> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> +1 on moving to a single `schedule` param. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>