Hi Ping,

I had this idea about auto-fixing deprecation warnings too when I started the 
airflint project.
I totally agree with what you said that often people ignore these warnings and 
therefore a cli tool would come in very handy. It could look similar to the 
airflint project and also be based on the python refactor package, but I would 
keep it out of this discussion for now.

Regarding the schedule arg, I don’t have a strong opinion there but also tend 
to having one schedule arg, because of simplicity for the dag author.

Best,
Felix

Sent from Proton Mail for iOS

On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pin...@umich.edu> wrote:

> Hi Daniel,
>
> +1 for ‘schedule' and thanks for bringing this up.
>
> I agree with Vikram, we should be very careful about deprecating existing 
> params even if we have warnings around it. Not sure if this is a general 
> case, but I notice that people ignore warnings all the time until it starts 
> to break. Thus I am thinking if airflow can have a cli tool to help users to 
> batch update these deprecating params just like python's futurize tool. This 
> can greatly ease the burden of upgrading the airflow version.
>
> Thanks,
>
> Ping
>
> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish 
> <daniel.stand...@astronomer.io.invalid> wrote:
>
>> In case you forgot about us @Vikram, gentle nudge
>>
>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish 
>> <daniel.stand...@astronomer.io> wrote:
>>
>>> Well, this is a discussion thread, so let's discuss!
>>>
>>> Vikram, what kind of implications are you thinking of? Maybe you could 
>>> provide a little more detail about your concerns?
>>>
>>> There are certainly alternatives, and of course each of them has tradeoffs.
>>>
>>> Here are the options I see:
>>>
>>> 1. consolidate to schedule (deprecate schedule_interval and timetable; the 
>>> proposal under discussion in this thread)
>>> 2. do nothing (keep 3 different scheduling params)
>>> 3. consolidate to schedule_interval (deprecate timetable)
>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has added it 
>>> in main) and take List[Dataset] in `schedule_interval`
>>>
>>> Which would you propose we go with?
>>>
>>> If we have a veto against 1 (and again, we're in hypothetical territory 
>>> since this is still a discussion thread), my vote would definitely be to go 
>>> with 3.
>>>
>>> The param `schedule_interval` as catch all is not sooo bad, but `schedule` 
>>> definitely seems better. From a new user's perspective, having misleading 
>>> names is confusing, and can be a negative when looking at adoption. And for 
>>> a tenured user, it can nag at you. And dealing with deprecations as an 
>>> Airflow user, well it's sort of part of the job, though it can be 
>>> bothersome, certainly.
>>>
>>> Just to look at / compare feasibility.
>>>
>>> Deprecating schedule_interval, it would be more prominent than some other 
>>> deprecations because it's a DAG param. That said, I think it's pretty 
>>> straightforward to handle. For one, it's at the cardinality of DAG and not 
>>> task, so there is less to change with than some other deprecations. And 
>>> being kwargs-only, it's an easy find and replace that could be automated 
>>> with an `upgrade-check` script. It's very easy to detect, either to warn 
>>> about, or ultimately, after upgrade, to fail the DAG parse (which forces 
>>> user to fix). Perhaps we could even experiment with a deprecation fix tool, 
>>> so that users could get help fixing deprecations, not necessarily in the 
>>> context of an upgrade, since we're not looking at 3.0 any time soon.
>>>
>>> We could compare with execution_date, if execution_date is to be removed in 
>>> 3.0. I think the deprecation of execution_date will be a more difficult 
>>> thing for users to handle. For one, potentially higher cardinality, since 
>>> it is used in tasks. But also, the use of execution_date can be harder to 
>>> detect, since it can be in templates and elsewhere in custom operators. 
>>> (With schedule_interval, since it's a top-level dag kwarg, it will be 
>>> easier.) And execution_date was replaced with `logical_date` for similar 
>>> reasons -- avoiding user confusion.
>>>
>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka <vik...@astronomer.io.invalid> 
>>> wrote:
>>>
>>>> -1 from me.
>>>>
>>>> Though I agree in principle with the idea of consolidation, I don't think 
>>>> we should be doing this yet until we understand the implications 
>>>> completely.
>>>> I am really not in favor of deprecation of the existing params, unless 
>>>> there is really no alternative.
>>>>
>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish 
>>>> <daniel.stand...@astronomer.io.invalid> wrote:
>>>>
>>>>> So far, seems all in favor.
>>>>>
>>>>> I will just highlight, in case it's not clear.... when we release this 
>>>>> (presumably 2.4), basically every single dag in existence will start 
>>>>> emitting deprecation warnings, and prior to 3.0, basically every dag in 
>>>>> existence will need to be updated.
>>>>>
>>>>> Thankfully, for most people, this should be an easy update since DAG is 
>>>>> kwargs only, IIRC.
>>>>>
>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi <br...@astronomer.io.invalid> 
>>>>> wrote:
>>>>>
>>>>>> +1 to consolidating and "schedule".
>>>>>>
>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl 
>>>>>> <drew.h...@astronomer.io.invalid> wrote:
>>>>>>
>>>>>>> +1 for ‘schedule', and another +1 to the importance of consolidating to 
>>>>>>> one being more important than the name of that one
>>>>>>>
>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell 
>>>>>>>> <josh.d.f...@astronomer.io.INVALID> wrote:
>>>>>>>>
>>>>>>>> Consolidating to a single scheduling parameter is a big +1 from me. 
>>>>>>>> `schedule` seems like a nice catch-all.
>>>>>>>>
>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham 
>>>>>>>> <jedcunning...@apache.org> wrote:
>>>>>>>>
>>>>>>>>> +1 on moving to a single `schedule` param.

Reply via email to