One question: are we deprecating the existing parameters straight away or leaving it without a warning for one release? (I think someone talked about that at one point)
On 6 August 2022 19:46:02 BST, Daniel Standish <daniel.stand...@astronomer.io.INVALID> wrote: >Well, thanks for forcing us to make the case :) > >I'll initiate a lazy consensus "vote" > > >On Fri, Aug 5, 2022 at 1:01 PM Vikram Koka <vik...@astronomer.io.invalid> >wrote: > >> Fair points all. >> I withdraw my objection and agree with the consolidation proposal. >> >> Vikram >> >> >> On Fri, Aug 5, 2022 at 10:06 AM Jarek Potiuk <ja...@potiuk.com> wrote: >> >>> Let me also add to why I think we should consolidate to one param:: >>> >>> * "schedule_interval" >>> * "schedule_on" >>> * "timetable" >>> >>> Imagine you are a new user coming to Airflow. >>> >>> Do you immediately recognise which one is which ? Do you see what the >>> difference is? >>> >>> * "Schedule_on" = "Every Sunday" is equally good as "schedule_on" >>> "dataset". >>> * "timetable" - what even is a timetable? Is it a "cron TAB" ? Or what? >>> >>> For me even if we wanted to have different names, the current set of >>> names for those params is just terribly ambiguous. >>> >>> Making it "schedule" and acting depending on what you pass to it is so >>> much better and simpler. >>> >>> J. >>> >>> >>> On Fri, Aug 5, 2022 at 6:57 PM Daniel Standish >>> <daniel.stand...@astronomer.io.invalid> wrote: >>> >>>> @Vikram, yeah, I understand that concern. Making a breaking change is >>>> something we shouldn't do carelessly, without consideration, without good >>>> reason. But I think we have a good reason here. >>>> >>>> Deprecations are somewhat regular in airflow. In 2.0 we moved a lot of >>>> imports. We also changed the CLI semantics. Probably there were other >>>> breaking changes that I'm forgetting *(granted sometimes we didn't >>>> remove the backward-compatibility immediately in 2.0, and that remains an >>>> option here)*. Probably just about every dag in existence had to be >>>> changed then too. And that's what a major release is for. >>>> >>>> In 3.0, it looks like we're deprecating execution_date... That's even >>>> bigger. So, were we to do this, (1) moving to `schedule` wouldn't be the >>>> only notable breaking change and (2) by comparison it's a pretty simple >>>> change. And if, as part of the upgrade to a major release, you have to do >>>> renames and such anyway, the marginal cost of one more isn't massive. >>>> >>>> I guess my take is that we can't be chained to our previous decisions >>>> forever. And we have to have the flexibility to adapt our interfaces so >>>> that they still make sense and are consistent and don't cause excessive >>>> eyebrow wrinkling. We have adopted semantic versioning to provide a >>>> mechanism for signaling when breaking changes are coming. And we can add >>>> tooling to help users along the way. >>>> >>>> Additionally, we can't *only *look at the burden of such a change for >>>> users. We must also weigh this against the burden that we maintain by >>>> *not* changing, i.e. by having too many params, or ill-named params. >>>> >>>> >>>> On Thu, Aug 4, 2022 at 2:57 PM Jed Cunningham <jedcunning...@apache.org> >>>> wrote: >>>> >>>>> I'm not in favor of reusing `schedule_interval` for datasets (or >>>>> folding `timetable` into it either), primarily because it doesn't feel >>>>> like >>>>> a good name to describe what it does. If we don't have the appetite to >>>>> consolidate, my vote is we stick with 3 separate params. So preferably #1, >>>>> if not then #2 for me. >>>>> >>>>> I'm less concerned about the "everyone has to update every DAG" issue, >>>>> as it only becomes a problem for Airflow 3, folks can freely ignore it >>>>> until then, and when they do update it it's a simple find/replace. >>>>> >>>>> >>>>> On Thu, Aug 4, 2022 at 2:17 PM Andrew Gibbs < >>>>> a.r.gibbs.massma...@gmail.com> wrote: >>>>> >>>>>> I'll just +1 that IMO a UI reminder is a great option. >>>>>> >>>>>> 2.3.0 added an alert when robots.txt was being scanned, which was >>>>>> great, because I ignored it for a while, but eventually caved and looked >>>>>> into it. This I would see as more of the same. I currently get loads of >>>>>> warnings about things in my config file that have moved; but typically >>>>>> only >>>>>> if I have to do a `airflow db upgrade` or `airflow dags backfill`. I see >>>>>> the alert, think "I must fix those", and then the Homer effect kicks in, >>>>>> and something else takes up the space in my brain. Having them as a >>>>>> banner >>>>>> in the GUI would 100% get me to sort it sooner or later. >>>>>> >>>>>> On Thu, Aug 4, 2022 at 9:32 PM Vikram Koka >>>>>> <vik...@astronomer.io.invalid> wrote: >>>>>> >>>>>>> Hi Daniel, >>>>>>> >>>>>>> I completely agree with wanting to consolidate, but I am very >>>>>>> concerned about people needing to change their existing code. >>>>>>> As Ping and Felix also said, I worry that people ignore deprecation >>>>>>> warnings, especially since the original code authors don't really look >>>>>>> at >>>>>>> logs or other pipeline artifacts unless something is broken. >>>>>>> >>>>>>> With respect to the options you outlined, I am personally leaning >>>>>>> towards (4) i.e. "don't deprecate timetable, but remove schedule_on >>>>>>> (AIP-48 >>>>>>> has added it in main) and take List[Dataset] in `schedule_interval`". I >>>>>>> could be convinced with (3), since I think that "timetable" is >>>>>>> relatively >>>>>>> recent and have not seen as much usage for this invocation yet. I am >>>>>>> definitely curious about other perspectives on this as well. >>>>>>> >>>>>>> I have a hard time with option (1) of deprecating "schedule >>>>>>> interval", because of the historical nature of it and the code changes >>>>>>> that >>>>>>> people may have to make to deal with it. >>>>>>> >>>>>>> Thanks for raising this topic and the discussion, >>>>>>> >>>>>>> Vikram >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 4, 2022 at 10:24 AM Daniel Standish >>>>>>> <daniel.stand...@astronomer.io.invalid> wrote: >>>>>>> >>>>>>>> I agree with those (including myself) who have suggested that it >>>>>>>> would be great to have a deprecation fix tool. >>>>>>>> >>>>>>>> I also think that UI notifications would be valuable. Another >>>>>>>> option would be to provide a less intrusive indicator somewhere (like >>>>>>>> when >>>>>>>> chrome lets you know you need to update), and then you click on it and >>>>>>>> it >>>>>>>> takes you to a page that lists out all the deprecation warnings. >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Aug 4, 2022 at 1:03 AM Jarek Potiuk <ja...@potiuk.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I know we are distracting a bit from the main subject, but I think >>>>>>>>> it's worth it. >>>>>>>>> >>>>>>>>> I think this is a fact of life that people don't remove >>>>>>>>> deprecations. But we have a choice of "not doing anything about it" >>>>>>>>> and >>>>>>>>> complaining. Or doing something that might improve it. >>>>>>>>> >>>>>>>>> I usually err on the side of "doing something" when I see a >>>>>>>>> problem. I am a doer :). >>>>>>>>> >>>>>>>>> And I have a possibly controversial proposal. I got the inspiration >>>>>>>>> from node.js old versions. Whenever an old version of Node is being >>>>>>>>> used, >>>>>>>>> you have a 20 seconds delay artificially added and information that in >>>>>>>>> order to remove those 20 seconds you need to upgrade. >>>>>>>>> >>>>>>>>> Why don't we do that for some really old/serious deprecations ? >>>>>>>>> Maybe not 20 seconds. Maybe adding an unremovable RED warning in the >>>>>>>>> UI if >>>>>>>>> you have any deprecation unhandled. Something that is annoying enough >>>>>>>>> that >>>>>>>>> will cause people to make an action but not intrusive enough to break >>>>>>>>> things. >>>>>>>>> >>>>>>>>> I think this is a very basic psychological trait: >>>>>>>>> >>>>>>>>> If it does not cost the user anything to leave the deprecations, >>>>>>>>> there will be a big number of people who will do that. >>>>>>>>> We need to make sure that deprecations are costing our users "a >>>>>>>>> little" and that they are aware that they can remove the cost (and >>>>>>>>> know how >>>>>>>>> - this is super important that we only show such a message with clear >>>>>>>>> instructions on what to do). >>>>>>>>> >>>>>>>>> Such a message might be annoying in many ways: >>>>>>>>> >>>>>>>>> * it can obstruct significant part of the screen >>>>>>>>> * It can be flashy and had very annoying colours (think bold pink >>>>>>>>> fonts on flash-green background for example) >>>>>>>>> * It can not allow to do anything before the user closes it with a >>>>>>>>> REALLY small 'x' that it is difficult to point at (think cookie >>>>>>>>> banners). >>>>>>>>> And when they click they can be greeted with "You know - you can >>>>>>>>> actually >>>>>>>>> remove this warning by doing THIS". >>>>>>>>> * and we have probably more options - our UX specialists might have >>>>>>>>> some good ideas here >>>>>>>>> >>>>>>>>> For anyone who complains about "people do not act on deprecation." >>>>>>>>> - yeah this is true. >>>>>>>>> But one universal truth there is that if we won't act, this will >>>>>>>>> change nothing in the behaviour of our users - this is 100% certain. >>>>>>>>> >>>>>>>>> If we do something - it might improve it. We will not get a 100% >>>>>>>>> success for sure, but we can at least do something to improve it. >>>>>>>>> >>>>>>>>> WDYT? >>>>>>>>> >>>>>>>>> J. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall >>>>>>>>> <felue...@pm.me.invalid> wrote: >>>>>>>>> >>>>>>>>>> Hi Ping, >>>>>>>>>> >>>>>>>>>> I had this idea about auto-fixing deprecation warnings too when I >>>>>>>>>> started the airflint project. >>>>>>>>>> I totally agree with what you said that often people ignore these >>>>>>>>>> warnings and therefore a cli tool would come in very handy. It could >>>>>>>>>> look >>>>>>>>>> similar to the airflint project and also be based on the python >>>>>>>>>> refactor >>>>>>>>>> package, but I would keep it out of this discussion for now. >>>>>>>>>> >>>>>>>>>> Regarding the schedule arg, I don’t have a strong opinion there >>>>>>>>>> but also tend to having one schedule arg, because of simplicity >>>>>>>>>> for the dag author. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Felix >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Sent from Proton Mail for iOS >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pin...@umich.edu> wrote: >>>>>>>>>> >>>>>>>>>> Hi Daniel, >>>>>>>>>> >>>>>>>>>> +1 for ‘schedule' and thanks for bringing this up. >>>>>>>>>> >>>>>>>>>> I agree with Vikram, we should be very careful about deprecating >>>>>>>>>> existing params even if we have warnings around it. Not sure if this >>>>>>>>>> is a >>>>>>>>>> general case, but I notice that people ignore warnings all the time >>>>>>>>>> until >>>>>>>>>> it starts to break. Thus I am thinking if airflow can have a cli >>>>>>>>>> tool to >>>>>>>>>> help users to batch update these deprecating params just like >>>>>>>>>> python's >>>>>>>>>> futurize tool. This can greatly ease the burden of upgrading the >>>>>>>>>> airflow >>>>>>>>>> version. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Ping >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish >>>>>>>>>> <daniel.stand...@astronomer.io.invalid> wrote: >>>>>>>>>> >>>>>>>>>>> In case you forgot about us @Vikram, gentle nudge >>>>>>>>>>> >>>>>>>>>>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish < >>>>>>>>>>> daniel.stand...@astronomer.io> wrote: >>>>>>>>>>> >>>>>>>>>>>> Well, this is a discussion thread, so let's discuss! >>>>>>>>>>>> >>>>>>>>>>>> Vikram, what kind of implications are you thinking of? Maybe you >>>>>>>>>>>> could provide a little more detail about your concerns? >>>>>>>>>>>> >>>>>>>>>>>> There are certainly alternatives, and of course each of them has >>>>>>>>>>>> tradeoffs. >>>>>>>>>>>> >>>>>>>>>>>> Here are the options I see: >>>>>>>>>>>> >>>>>>>>>>>> 1. consolidate to schedule (deprecate schedule_interval and >>>>>>>>>>>> timetable; the proposal under discussion in this thread) >>>>>>>>>>>> 2. do nothing (keep 3 different scheduling params) >>>>>>>>>>>> 3. consolidate to schedule_interval (deprecate timetable) >>>>>>>>>>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has >>>>>>>>>>>> added it in main) and take List[Dataset] in `schedule_interval` >>>>>>>>>>>> >>>>>>>>>>>> Which would you propose we go with? >>>>>>>>>>>> >>>>>>>>>>>> If we have a veto against 1 (and again, we're in hypothetical >>>>>>>>>>>> territory since this is still a discussion thread), my vote would >>>>>>>>>>>> definitely be to go with 3. >>>>>>>>>>>> >>>>>>>>>>>> The param `schedule_interval` as catch all is not *sooo *bad, >>>>>>>>>>>> but `schedule` definitely seems better. From a new user's >>>>>>>>>>>> perspective, >>>>>>>>>>>> having misleading names is confusing, and can be a negative when >>>>>>>>>>>> looking at >>>>>>>>>>>> adoption. And for a tenured user, it can nag at you. And dealing >>>>>>>>>>>> with >>>>>>>>>>>> deprecations as an Airflow user, well it's sort of part of the >>>>>>>>>>>> job, though >>>>>>>>>>>> it can be bothersome, certainly. >>>>>>>>>>>> >>>>>>>>>>>> Just to look at / compare feasibility. >>>>>>>>>>>> >>>>>>>>>>>> Deprecating schedule_interval, it would be more prominent than >>>>>>>>>>>> some other deprecations because it's a DAG param. That said, I >>>>>>>>>>>> think it's >>>>>>>>>>>> pretty straightforward to handle. For one, it's at the cardinality >>>>>>>>>>>> of DAG >>>>>>>>>>>> and not *task*, so there is less to change with than some other >>>>>>>>>>>> deprecations. And being kwargs-only, it's an easy find and replace >>>>>>>>>>>> that >>>>>>>>>>>> could be automated with an `upgrade-check` script. It's very easy >>>>>>>>>>>> to >>>>>>>>>>>> detect, either to warn about, or ultimately, after upgrade, to >>>>>>>>>>>> fail the DAG >>>>>>>>>>>> parse (which forces user to fix). Perhaps we could even experiment >>>>>>>>>>>> with a >>>>>>>>>>>> deprecation fix tool, so that users could get help fixing >>>>>>>>>>>> deprecations, not >>>>>>>>>>>> necessarily in the context of an upgrade, since we're not looking >>>>>>>>>>>> at 3.0 >>>>>>>>>>>> any time soon. >>>>>>>>>>>> >>>>>>>>>>>> We could compare with execution_date, if execution_date is to be >>>>>>>>>>>> removed in 3.0. I think the deprecation of execution_date will be >>>>>>>>>>>> a more >>>>>>>>>>>> difficult thing for users to handle. For one, potentially higher >>>>>>>>>>>> cardinality, since it is used in tasks. But also, the use of >>>>>>>>>>>> execution_date >>>>>>>>>>>> can be harder to detect, since it can be in templates and >>>>>>>>>>>> elsewhere in >>>>>>>>>>>> custom operators. *(With schedule_interval, since it's a >>>>>>>>>>>> top-level dag kwarg, it will be easier.)* And execution_date >>>>>>>>>>>> was replaced with `logical_date` for similar reasons -- avoiding >>>>>>>>>>>> user >>>>>>>>>>>> confusion. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka >>>>>>>>>>>> <vik...@astronomer.io.invalid> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> -1 from me. >>>>>>>>>>>>> >>>>>>>>>>>>> Though I agree in principle with the idea of consolidation, I >>>>>>>>>>>>> don't think we should be doing this yet until we understand the >>>>>>>>>>>>> implications completely. >>>>>>>>>>>>> I am really not in favor of deprecation of the existing params, >>>>>>>>>>>>> unless there is really no alternative. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish >>>>>>>>>>>>> <daniel.stand...@astronomer.io.invalid> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> So far, seems all in favor. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I will just highlight, in case it's not clear.... when we >>>>>>>>>>>>>> release this (presumably 2.4), basically every single dag in >>>>>>>>>>>>>> existence will >>>>>>>>>>>>>> start emitting deprecation warnings, and prior to 3.0, basically >>>>>>>>>>>>>> every dag >>>>>>>>>>>>>> in existence will need to be updated. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thankfully, for most people, this should be an easy update >>>>>>>>>>>>>> since DAG is kwargs only, IIRC. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi >>>>>>>>>>>>>> <br...@astronomer.io.invalid> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> +1 to consolidating and "schedule". >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl >>>>>>>>>>>>>>> <drew.h...@astronomer.io.invalid> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> +1 for ‘schedule', and another +1 to the importance of >>>>>>>>>>>>>>>> consolidating to one being more important than the name of >>>>>>>>>>>>>>>> that one >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell < >>>>>>>>>>>>>>>> josh.d.f...@astronomer.io.INVALID> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Consolidating to a single scheduling parameter is a big +1 >>>>>>>>>>>>>>>> from me. `schedule` seems like a nice catch-all. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham < >>>>>>>>>>>>>>>> jedcunning...@apache.org> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> +1 on moving to a single `schedule` param. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>