Fair points all.
I withdraw my objection and agree with the consolidation proposal.

Vikram


On Fri, Aug 5, 2022 at 10:06 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Let me also add to why I think we should consolidate to one param::
>
> * "schedule_interval"
> * "schedule_on"
> * "timetable"
>
> Imagine you are a new user coming to Airflow.
>
> Do you immediately recognise which one is which ? Do you see what the
> difference is?
>
> * "Schedule_on" = "Every Sunday" is equally good as "schedule_on"
> "dataset".
> * "timetable" - what even is a timetable? Is it a "cron TAB" ? Or what?
>
> For me even if we wanted to have different names, the current set of names
> for those params is just terribly ambiguous.
>
> Making it "schedule" and acting depending on what you pass to it is so
> much better and simpler.
>
> J.
>
>
> On Fri, Aug 5, 2022 at 6:57 PM Daniel Standish
> <daniel.stand...@astronomer.io.invalid> wrote:
>
>> @Vikram, yeah, I understand that concern.  Making a breaking change is
>> something we shouldn't do carelessly, without consideration, without good
>> reason.  But I think we have a good reason here.
>>
>> Deprecations are somewhat regular in airflow.  In 2.0 we moved a lot of
>> imports.  We also changed the CLI semantics.  Probably there were other
>> breaking changes that I'm forgetting *(granted sometimes we didn't
>> remove the backward-compatibility immediately in 2.0, and that remains an
>> option here)*.  Probably just about every dag in existence had to be
>> changed then too.  And that's what a major release is for.
>>
>> In 3.0, it looks like we're deprecating execution_date...  That's even
>> bigger.  So, were we to do this, (1) moving to `schedule` wouldn't be the
>> only notable breaking change and (2) by comparison it's a pretty simple
>> change.  And if, as part of the upgrade to a major release, you have to do
>> renames and such anyway, the marginal cost of one more isn't massive.
>>
>> I guess my take is that we can't be chained to our previous decisions
>> forever.  And we have to have the flexibility to adapt our interfaces so
>> that they still make sense and are consistent and don't cause excessive
>> eyebrow wrinkling.  We have adopted semantic versioning to provide a
>> mechanism for signaling when breaking changes are coming.  And we can add
>> tooling to help users along the way.
>>
>> Additionally, we can't *only *look at the burden of such a change for
>> users. We must also weigh this against the burden that we maintain by
>> *not* changing, i.e. by having too many params, or ill-named params.
>>
>>
>> On Thu, Aug 4, 2022 at 2:57 PM Jed Cunningham <jedcunning...@apache.org>
>> wrote:
>>
>>> I'm not in favor of reusing `schedule_interval` for datasets (or folding
>>> `timetable` into it either), primarily because it doesn't feel like a good
>>> name to describe what it does. If we don't have the appetite to
>>> consolidate, my vote is we stick with 3 separate params. So preferably #1,
>>> if not then #2 for me.
>>>
>>> I'm less concerned about the "everyone has to update every DAG" issue,
>>> as it only becomes a problem for Airflow 3, folks can freely ignore it
>>> until then, and when they do update it it's a simple find/replace.
>>>
>>>
>>> On Thu, Aug 4, 2022 at 2:17 PM Andrew Gibbs <
>>> a.r.gibbs.massma...@gmail.com> wrote:
>>>
>>>> I'll just +1 that IMO a UI reminder is a great option.
>>>>
>>>> 2.3.0 added an alert when robots.txt was being scanned, which was
>>>> great, because I ignored it for a while, but eventually caved and looked
>>>> into it. This I would see as more of the same. I currently get loads of
>>>> warnings about things in my config file that have moved; but typically only
>>>> if I have to do a `airflow db upgrade` or `airflow dags backfill`. I see
>>>> the alert, think "I must fix those", and then the Homer effect kicks in,
>>>> and something else takes up the space in my brain. Having them as a banner
>>>> in the GUI would 100% get me to sort it sooner or later.
>>>>
>>>> On Thu, Aug 4, 2022 at 9:32 PM Vikram Koka <vik...@astronomer.io.invalid>
>>>> wrote:
>>>>
>>>>> Hi Daniel,
>>>>>
>>>>> I completely agree with wanting to consolidate, but I am very
>>>>> concerned about people needing to change their existing code.
>>>>> As Ping and Felix also said, I worry that people ignore deprecation
>>>>> warnings, especially since the original code authors don't really look at
>>>>> logs or other pipeline artifacts unless something is broken.
>>>>>
>>>>> With respect to the options you outlined, I am personally leaning
>>>>> towards (4) i.e. "don't deprecate timetable, but remove schedule_on 
>>>>> (AIP-48
>>>>> has added it in main) and take List[Dataset] in `schedule_interval`". I
>>>>> could be convinced with (3), since I think that "timetable" is relatively
>>>>> recent and have not seen as much usage for this invocation yet. I am
>>>>> definitely curious about other perspectives on this as well.
>>>>>
>>>>> I have a hard time with option (1) of deprecating "schedule interval",
>>>>> because of the historical nature of it and the code changes that people 
>>>>> may
>>>>> have to make to deal with it.
>>>>>
>>>>> Thanks for raising this topic and the discussion,
>>>>>
>>>>> Vikram
>>>>>
>>>>>
>>>>> On Thu, Aug 4, 2022 at 10:24 AM Daniel Standish
>>>>> <daniel.stand...@astronomer.io.invalid> wrote:
>>>>>
>>>>>> I agree with those (including myself) who have suggested that it
>>>>>> would be great to have a deprecation fix tool.
>>>>>>
>>>>>> I also think that UI notifications would be valuable.  Another option
>>>>>> would be to provide a less intrusive indicator somewhere (like when 
>>>>>> chrome
>>>>>> lets you know you need to update), and then you click on it and it takes
>>>>>> you to a page that lists out all the deprecation warnings.
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 4, 2022 at 1:03 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>>>
>>>>>>> I know we are distracting a bit from the main subject, but I think
>>>>>>> it's worth it.
>>>>>>>
>>>>>>> I think this is a fact of life that people don't remove
>>>>>>> deprecations. But we have a choice of "not doing anything about it" and
>>>>>>> complaining. Or doing something that might improve it.
>>>>>>>
>>>>>>> I usually err on the side of "doing something" when I see a problem.
>>>>>>> I am a doer :).
>>>>>>>
>>>>>>> And I have a possibly controversial proposal. I got the inspiration
>>>>>>> from node.js old versions. Whenever an old version of Node is being 
>>>>>>> used,
>>>>>>> you have a 20 seconds delay artificially added and information that in
>>>>>>> order to remove those 20 seconds you need to upgrade.
>>>>>>>
>>>>>>> Why don't we do that for some really old/serious deprecations ?
>>>>>>> Maybe not 20 seconds. Maybe adding an unremovable RED warning in the UI 
>>>>>>> if
>>>>>>> you have any deprecation unhandled. Something that is annoying enough 
>>>>>>> that
>>>>>>> will cause people to make an action but not intrusive enough to break
>>>>>>> things.
>>>>>>>
>>>>>>> I think this is a very basic psychological trait:
>>>>>>>
>>>>>>> If it does not cost the user anything to leave the deprecations,
>>>>>>> there will be a big number of people who will do that.
>>>>>>> We need to make sure that deprecations are costing our users "a
>>>>>>> little" and that they are aware that they can remove the cost (and know 
>>>>>>> how
>>>>>>> - this is super important that we only show such a message with clear
>>>>>>> instructions on what to do).
>>>>>>>
>>>>>>> Such a message might be annoying in many ways:
>>>>>>>
>>>>>>> * it can obstruct significant part of the screen
>>>>>>> * It can be flashy and had very annoying colours (think bold pink
>>>>>>> fonts on flash-green background for example)
>>>>>>> * It can not allow to do anything before the user closes it with a
>>>>>>> REALLY small 'x' that it is difficult to point at (think cookie 
>>>>>>> banners).
>>>>>>> And when they click they can be greeted with "You know - you can 
>>>>>>> actually
>>>>>>> remove this warning by doing THIS".
>>>>>>> * and we have probably more options - our UX specialists might have
>>>>>>> some good ideas here
>>>>>>>
>>>>>>> For anyone who complains about "people do not act on deprecation." -
>>>>>>> yeah this is true.
>>>>>>> But one universal truth there is that if we won't act, this will
>>>>>>> change nothing in the behaviour of our users - this is 100% certain.
>>>>>>>
>>>>>>> If we do something - it might improve it. We will not get a 100%
>>>>>>> success for sure, but we can at least do something to improve it.
>>>>>>>
>>>>>>> WDYT?
>>>>>>>
>>>>>>> J.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall
>>>>>>> <felue...@pm.me.invalid> wrote:
>>>>>>>
>>>>>>>> Hi Ping,
>>>>>>>>
>>>>>>>> I had this idea about auto-fixing deprecation warnings too when I
>>>>>>>> started the airflint project.
>>>>>>>> I totally agree with what you said that often people ignore these
>>>>>>>> warnings and therefore a cli tool would come in very handy. It could 
>>>>>>>> look
>>>>>>>> similar to the airflint project and also be based on the python 
>>>>>>>> refactor
>>>>>>>> package, but I would keep it out of this discussion for now.
>>>>>>>>
>>>>>>>> Regarding the schedule arg, I don’t have a strong opinion there but
>>>>>>>> also tend to having one schedule arg, because of simplicity for
>>>>>>>> the dag author.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Felix
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Sent from Proton Mail for iOS
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pin...@umich.edu> wrote:
>>>>>>>>
>>>>>>>> Hi Daniel,
>>>>>>>>
>>>>>>>> +1 for ‘schedule' and thanks for bringing this up.
>>>>>>>>
>>>>>>>> I agree with Vikram, we should be very careful about deprecating
>>>>>>>> existing params even if we have warnings around it. Not sure if this 
>>>>>>>> is a
>>>>>>>> general case, but I notice that people ignore warnings all the time 
>>>>>>>> until
>>>>>>>> it starts to break. Thus I am thinking if airflow can have a cli tool 
>>>>>>>> to
>>>>>>>> help users to batch update these deprecating params just like python's
>>>>>>>> futurize tool. This can greatly ease the burden of upgrading the 
>>>>>>>> airflow
>>>>>>>> version.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Ping
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish
>>>>>>>> <daniel.stand...@astronomer.io.invalid> wrote:
>>>>>>>>
>>>>>>>>> In case you forgot about us @Vikram, gentle nudge
>>>>>>>>>
>>>>>>>>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish <
>>>>>>>>> daniel.stand...@astronomer.io> wrote:
>>>>>>>>>
>>>>>>>>>> Well, this is a discussion thread, so let's discuss!
>>>>>>>>>>
>>>>>>>>>> Vikram, what kind of implications are you thinking of? Maybe you
>>>>>>>>>> could provide a little more detail about your concerns?
>>>>>>>>>>
>>>>>>>>>> There are certainly alternatives, and of course each of them has
>>>>>>>>>> tradeoffs.
>>>>>>>>>>
>>>>>>>>>> Here are the options I see:
>>>>>>>>>>
>>>>>>>>>> 1. consolidate to schedule (deprecate schedule_interval and
>>>>>>>>>> timetable; the proposal under discussion in this thread)
>>>>>>>>>> 2. do nothing (keep 3 different scheduling params)
>>>>>>>>>> 3. consolidate to schedule_interval (deprecate timetable)
>>>>>>>>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has
>>>>>>>>>> added it in main) and take List[Dataset] in `schedule_interval`
>>>>>>>>>>
>>>>>>>>>> Which would you propose we go with?
>>>>>>>>>>
>>>>>>>>>> If we have a veto against 1 (and again, we're in hypothetical
>>>>>>>>>> territory since this is still a discussion thread), my vote would
>>>>>>>>>> definitely be to go with 3.
>>>>>>>>>>
>>>>>>>>>> The param `schedule_interval` as catch all is not *sooo *bad,
>>>>>>>>>> but `schedule` definitely seems better. From a new user's 
>>>>>>>>>> perspective,
>>>>>>>>>> having misleading names is confusing, and can be a negative when 
>>>>>>>>>> looking at
>>>>>>>>>> adoption. And for a tenured user, it can nag at you. And dealing with
>>>>>>>>>> deprecations as an Airflow user, well it's sort of part of the job, 
>>>>>>>>>> though
>>>>>>>>>> it can be bothersome, certainly.
>>>>>>>>>>
>>>>>>>>>> Just to look at / compare feasibility.
>>>>>>>>>>
>>>>>>>>>> Deprecating schedule_interval, it would be more prominent than
>>>>>>>>>> some other deprecations because it's a DAG param. That said, I think 
>>>>>>>>>> it's
>>>>>>>>>> pretty straightforward to handle. For one, it's at the cardinality 
>>>>>>>>>> of DAG
>>>>>>>>>> and not *task*, so there is less to change with than some other
>>>>>>>>>> deprecations. And being kwargs-only, it's an easy find and replace 
>>>>>>>>>> that
>>>>>>>>>> could be automated with an `upgrade-check` script. It's very easy to
>>>>>>>>>> detect, either to warn about, or ultimately, after upgrade, to fail 
>>>>>>>>>> the DAG
>>>>>>>>>> parse (which forces user to fix). Perhaps we could even experiment 
>>>>>>>>>> with a
>>>>>>>>>> deprecation fix tool, so that users could get help fixing 
>>>>>>>>>> deprecations, not
>>>>>>>>>> necessarily in the context of an upgrade, since we're not looking at 
>>>>>>>>>> 3.0
>>>>>>>>>> any time soon.
>>>>>>>>>>
>>>>>>>>>> We could compare with execution_date, if execution_date is to be
>>>>>>>>>> removed in 3.0. I think the deprecation of execution_date will be a 
>>>>>>>>>> more
>>>>>>>>>> difficult thing for users to handle. For one, potentially higher
>>>>>>>>>> cardinality, since it is used in tasks. But also, the use of 
>>>>>>>>>> execution_date
>>>>>>>>>> can be harder to detect, since it can be in templates and elsewhere 
>>>>>>>>>> in
>>>>>>>>>> custom operators. *(With schedule_interval, since it's a
>>>>>>>>>> top-level dag kwarg, it will be easier.)* And execution_date was
>>>>>>>>>> replaced with `logical_date` for similar reasons -- avoiding user 
>>>>>>>>>> confusion.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka
>>>>>>>>>> <vik...@astronomer.io.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>>> -1 from me.
>>>>>>>>>>>
>>>>>>>>>>> Though I agree in principle with the idea of consolidation, I
>>>>>>>>>>> don't think we should be doing this yet until we understand the
>>>>>>>>>>> implications completely.
>>>>>>>>>>> I am really not in favor of deprecation of the existing params,
>>>>>>>>>>> unless there is really no alternative.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish
>>>>>>>>>>> <daniel.stand...@astronomer.io.invalid> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> So far, seems all in favor.
>>>>>>>>>>>>
>>>>>>>>>>>> I will just highlight, in case it's not clear.... when we
>>>>>>>>>>>> release this (presumably 2.4), basically every single dag in 
>>>>>>>>>>>> existence will
>>>>>>>>>>>> start emitting deprecation warnings, and prior to 3.0, basically 
>>>>>>>>>>>> every dag
>>>>>>>>>>>> in existence will need to be updated.
>>>>>>>>>>>>
>>>>>>>>>>>> Thankfully, for most people, this should be an easy update
>>>>>>>>>>>> since DAG is kwargs only, IIRC.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi
>>>>>>>>>>>> <br...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> +1 to consolidating and "schedule".
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl
>>>>>>>>>>>>> <drew.h...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1 for ‘schedule', and another +1 to the importance of
>>>>>>>>>>>>>> consolidating to one being more important than the name of that 
>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell <
>>>>>>>>>>>>>> josh.d.f...@astronomer.io.INVALID> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Consolidating to a single scheduling parameter is a big +1
>>>>>>>>>>>>>> from me. `schedule` seems like a nice catch-all.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham <
>>>>>>>>>>>>>> jedcunning...@apache.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1 on moving to a single `schedule` param.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>

Reply via email to