Very much agree with TP! On Sat, May 15, 2021 at 6:06 AM TP Chung <[email protected]> wrote:
> I feel you are sort of operating on a different level of abstraction from > AIP-39. While it is true that Airflow does generally take a declarative > approach for scheduling currently (which is a good thing and should be > continued), AIP-39 is more about providing a foundation so richer things > can be declared. Its design does not preclude declarative things to be > implemented, much like how most of Python is procedural in the first place, > but that did not prevent Airflow from having a declarative interface. > > Timetable does not take away much of the declarative possibility, since we > can easily have something like > > DAG( > ... > timetable=CalendarTimeTable( > calendar=…, > execution_plan=…, > ), > ) > > that implements what you want. But the nice thing about this extra > abstraction is it keeps doors open for things that might not work well for > calendars. You may argue those are uncommon cases, but what prompted AIP-39 > in the first place were uncommon cases not considered (or intentionally > ignored for simplicity) by the original Airflow implementation in the first > place. AIP-39 does well providing a good foundation for most flexibility > without sacrificing much of the declarative goodness (if at all; it’s > arguable the TimeTable class is actually an improvement for explicitness). > > TP > > > On 14 May 2021, at 04:59, Malthe <[email protected]> wrote: > > When it comes to scheduling, Airflow does take a rather declarative > approach I would say, but it is certainly correct that it very much stops > there. > > I appreciate the arguments favoring a more object-oriented design, but I > do think that adding a couple of additional scheduling options could go a > very long way in terms of providing that extra bit of scheduling > flexibility – while preserving the "scripting ergonomics". > > The current proposal leaves most of the interesting use-cases on the table > rather than aiming to show that the abstraction actually meets the > requirements. > > Cheers > > On Thu, 13 May 2021 at 15:01, Kaxil Naik <[email protected]> wrote: > >> And also the proposed items with Timetables are more "extensible" too -- >> Users can develop some classes for their own use and create a library for >> reusing it. >> >> Using arguments like you are proposing @malthe -- it can be difficult to >> understand on all the "related" arguments to understand the scheduling / >> schedule_interval. >> >> On Thu, May 13, 2021 at 3:46 PM Jarek Potiuk <[email protected]> wrote: >> >>> I much more on Ash's proposal with this one. I think we do not want to >>> optimize for a number of changes but instead we want to make sure what we >>> come up with is an easy and natural to use long-term solution. Even if it >>> departs a lot from what we have now, a few months (or maybe years) after >>> it becomes mainstream nobody will remember the "old way" hopefully. >>> >>> The idea of "pythonic" pluggable schedule rather than "declarative" way >>> goes perfectly in-line with the whole concept of Airflow where DAGs are >>> defined as Python code rather than declaratively. So making schedules >>> follow the same approach seems very natural for anyone who uses Airflow. >>> >>> >>> J. >>> >>> >>> On Thu, May 13, 2021 at 9:27 AM Malthe <[email protected]> wrote: >>> >>>> I'm a bit late to the discussion, but it might be interesting to >>>> explore a more simple approach, cutting back on the number of changes. >>>> >>>> As a general remark, "execution_date" may be a slightly difficult >>>> concept to grasp, but changing it now is perhaps counterproductive. >>>> >>>> From the AIP examples: >>>> >>>> 1. The MON-FRI problem could be handled using an optional keyword >>>> argument "execution_interval" which defaults to `None` (meaning automatic – >>>> this is the current behavior). But instead a `timedelta` could be provided, >>>> i.e. `timedelta(days=1)`. >>>> >>>> 2. This could easily be supported >>>> <https://github.com/kiorky/croniter/pull/46#issuecomment-838544908> in >>>> croniter. >>>> >>>> 3. The trading days only (or business days, etc) problem is handled in >>>> other systems using a calendar option. It might be that an optional keyword >>>> argument "calendar" could be used to automatically skip days that are not >>>> included in the calendar – the default calendar would include all days. If >>>> `calendar` is some calendar object, `~calendar` could be the inverse, >>>> allowing a DAG to be easily scheduled on holidays. A requirement to >>>> schedule on the nth day of the calendar (e.g. 10th business day of the >>>> month) could be implemented using a derived calendar >>>> `calendar.nth_day_of_month(10)` which would further filter down the number >>>> of included days based on an existing calendar. >>>> >>>> 4. The cron- vs data scheduling seems to come down to whether the dag >>>> run is kicked off at the start of the period or immediately after. This >>>> could be handled using an optional keyword argument "execution_plan" which >>>> defaults to INTERVAL_END (the current behavior), but can be optionally set >>>> to INTERVAL_START. The "execution_date" column then remains unchanged, but >>>> the actual dag run time will be vary according to which execution plan was >>>> specified. >>>> >>>> Cheers >>>> >>>> On Fri, 12 Mar 2021 at 07:02, Xinbin Huang <[email protected]> >>>> wrote: >>>> >>>>> I agree with Tomek. >>>>> >>>>> TBH, *Timetable *to me does seem to be a complex concept, and I can't >>>>> quite understand what it is at first sight. >>>>> >>>>> I think *Schedule *does convey the message better - consider >>>>> the sentence: "*the Scheduler arranges jobs to run based on some >>>>> _______**.*" Here, *"schedules" *seem to fit better than >>>>> *"timetables"* >>>>> >>>>> As for search results on schedule vs scheduler, I wonder if >>>>> reorganizing the docs to have `schedule` in the top section and have >>>>> `scheduler` under operation::architecture will help with the search >>>>> result? >>>>> (I don't know much about SEO) >>>>> >>>>> Nevertheless, the naming shouldn't be a blocker for this feature to >>>>> move forward. >>>>> >>>>> Best >>>>> Bin >>>>> >>>>> On Thu, Mar 11, 2021 at 4:31 PM Tomasz Urbaszek <[email protected]> >>>>> wrote: >>>>> >>>>>> > Timetable is a synonym of ’Schedule’ >>>>>> >>>>>> I'm not a native speaker and I don't get it easily as a synonym. >>>>>> >>>>>> To be honest the "timetable" sounds like a complex piece of software. >>>>>> Personally I experienced that people use schedule and >>>>>> schedule_interval interchangeably. Additionally, schedule being more >>>>>> linked >>>>>> to scheduler imho is an advantage because it suggests some connection >>>>>> between these two. >>>>>> >>>>>> I feel that by introducing timetable we will bring yet more >>>>>> complexity to airflow vocabulary. And I personally would treat it as yet >>>>>> another moving part of an already complex system. >>>>>> >>>>>> I think we can move forward with this feature. We renamed "functional >>>>>> DAGs" to "Taskflow API" so, naming is not a blocker. If we can't get >>>>>> consensus we can always ask the users - they will use the feature. >>>>>> >>>>>> Best, >>>>>> Tomek >>>>>> >>>>>> >>>>>> On Fri, 12 Mar 2021 at 01:15, James Timmins < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> *Timetable vs Schedule* >>>>>>> Re Timetable. I agree that if this was a greenfield project, it >>>>>>> might make sense to use Schedule. But as it stands, we need to find the >>>>>>> right balance between the most specific name and being sufficiently >>>>>>> unique >>>>>>> that it’s easy to work with in code and, perhaps most importantly, easy >>>>>>> to >>>>>>> find when searching on Google and in the Airflow Docs. >>>>>>> >>>>>>> There are more than 10,000 references to `schedule*` in the Airflow >>>>>>> codebase. `schedule` and `scheduler` are also identical to most search >>>>>>> engines/libraries, since they have the same stem, `schedule`. This means >>>>>>> that when a user Googles `Airflow Schedule`, they will get back >>>>>>> intermixed >>>>>>> results of the Schedule class and the Scheduler. >>>>>>> >>>>>>> Timetable is a synonym of ’Schedule’, so it passes the accuracy >>>>>>> test, won’t ever be ambiguous in code, and is distinct in search >>>>>>> results. >>>>>>> >>>>>>> *Should "interval-less DAGs” have data_interval_start and end >>>>>>> available in the context?* >>>>>>> I think they should be present so it’s consistent across DAGs. Let’s >>>>>>> not make users think too hard about what values are available in what >>>>>>> context. What if someone sets the interval to 0? What if sometimes the >>>>>>> interval is 0, and sometimes it’s 1 hour? Rather than changing the rules >>>>>>> depending on usage, it’s easiest to have one rule that the users can >>>>>>> depend >>>>>>> upon. >>>>>>> >>>>>>> *Re set_context_variables()* >>>>>>> What context is being defined here? The code comment says "Update or >>>>>>> set new context variables to become available in task templates and >>>>>>> operators.” The Timetable seems like the wrong place for variables that >>>>>>> will get passed into task templates/operators, unless this is actually a >>>>>>> way to pass Airflow macros into the Timetable context. In which case I >>>>>>> fully support this. If not, we may want to add this functionality. >>>>>>> >>>>>>> James >>>>>>> On Mar 11, 2021, 9:40 AM -0800, Kaxil Naik <[email protected]>, >>>>>>> wrote: >>>>>>> >>>>>>> Yup I have no strong opinions on either so happy to keep it >>>>>>> TimeTable or if there is another suggestion. >>>>>>> >>>>>>> Regards, >>>>>>> Kaxil >>>>>>> >>>>>>> On Thu, Mar 11, 2021 at 5:00 PM James Timmins < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Respectfully, I strongly disagree with the renaming of Timetable to >>>>>>>> Schedule. Schedule and Scheduler aren't meaningfully different, which >>>>>>>> can >>>>>>>> lead to a lot of confusion. Even as a native English speaker, and >>>>>>>> someone >>>>>>>> who works on Airflow full time, I routinely need to ask for >>>>>>>> clarification >>>>>>>> about what schedule-related concept someone is referring to. I foresee >>>>>>>> Schedule and Scheduler as two distinct yet closely related concepts >>>>>>>> becoming a major source of confusion. >>>>>>>> >>>>>>>> If folks dislike Timetable, we could certainly change to something >>>>>>>> else, but let's not use something so similar to existing Airflow >>>>>>>> classes. >>>>>>>> >>>>>>>> -James >>>>>>>> >>>>>>>> On Thu, Mar 11, 2021 at 2:13 AM Ash Berlin-Taylor <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Summary of changes so far on the AIP: >>>>>>>>> >>>>>>>>> My proposed rename of DagRun.execution_date is now >>>>>>>>> DagRun.schedule_date (previously I had proposed run_date. Thanks >>>>>>>>> dstandish!) >>>>>>>>> >>>>>>>>> Timetable classes are renamed to Schedule classes (CronSchedule >>>>>>>>> etc), similarly the DAG argument is now schedule (reminder: >>>>>>>>> schedule_interval will not be removed or deprecated, and will still >>>>>>>>> be the >>>>>>>>> way to use "simple" expressions) >>>>>>>>> >>>>>>>>> -ash >>>>>>>>> >>>>>>>>> On Wed, 10 Mar, 2021 at 14:15, Ash Berlin-Taylor <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Could change Timetable To Schedule -- that would mean the DAG arg >>>>>>>>> becomes `schedule=CronSchedule(...)` -- a bit close to the current >>>>>>>>> `schedule_interval` but I think clear enough difference. >>>>>>>>> >>>>>>>>> I do like the name but my one worry with "schedule" is that >>>>>>>>> Scheduler and Schedule are very similar, and might be be confused >>>>>>>>> with each >>>>>>>>> other for non-native English speakers? (I defer to others' judgment >>>>>>>>> here, >>>>>>>>> as this is not something I can experience myself.) >>>>>>>>> >>>>>>>>> @Kevin Yang <[email protected]> @Daniel Standish >>>>>>>>> <[email protected]> any final input on this AIP? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, 9 Mar, 2021 at 16:59, Kaxil Naik <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi Ash and all, >>>>>>>>> >>>>>>>>> >>>>>>>>> What do people think of this? Worth it? Too complex to reason >>>>>>>>>> about what context variables might exist as a result? >>>>>>>>> >>>>>>>>> >>>>>>>>> I think I wouldn't worry about it right now or maybe not as part >>>>>>>>> of this AIP. Currently, in one of the Github Issue, a user mentioned >>>>>>>>> that >>>>>>>>> it is not straightforward to know what is inside the context >>>>>>>>> dictionary- >>>>>>>>> https://github.com/apache/airflow/issues/14396. So maybe we can >>>>>>>>> tackle this issue separately once the AbstractTimetable is built. >>>>>>>>> >>>>>>>>> Should "interval-less DAGs" (ones using "CronTimetable" in my >>>>>>>>>> proposal vs "DataTimetable") have data_interval_start and end >>>>>>>>>> available in >>>>>>>>>> the context? >>>>>>>>> >>>>>>>>> >>>>>>>>> hmm.. I would say No but then it contradicts my suggestion to >>>>>>>>> remove context dict from this AIP. If we are going to use it in >>>>>>>>> scheduler >>>>>>>>> then yes, where data_interval_start = data_interval_end from >>>>>>>>> CronTimetable. >>>>>>>>> >>>>>>>>> Does anyone have any better names than TimeDeltaTimetable, >>>>>>>>>> DataTimetable, and CronTimetable? (We can probably change these >>>>>>>>>> names right >>>>>>>>>> up until release, so not important to get this correct *now*.) >>>>>>>>> >>>>>>>>> >>>>>>>>> No strong opinion here. Just an alternate suggestion can >>>>>>>>> be TimeDeltaSchedule, DataSchedule and CronSchedule >>>>>>>>> >>>>>>>>> >>>>>>>>>> Should I try to roll AIP-30 >>>>>>>>>> <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-30%3A+State+persistence> >>>>>>>>>> in >>>>>>>>>> to this, or should we make that a future addition? (My vote is for >>>>>>>>>> future >>>>>>>>>> addition) >>>>>>>>> >>>>>>>>> >>>>>>>>> I would vote for Future addition too. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Kaxil >>>>>>>>> >>>>>>>>> On Sat, Mar 6, 2021 at 11:05 AM Ash Berlin-Taylor <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I think, yes, AIP-35 or something like it would happily co-exist >>>>>>>>>> with this proposal. >>>>>>>>>> >>>>>>>>>> @Daniel <[email protected]> and I have been discussing this a >>>>>>>>>> bit on Slack, and one of the questions he asked was if the concept of >>>>>>>>>> data_interval should be moved from DagRun as James and I suggested >>>>>>>>>> down on >>>>>>>>>> to the individual task: >>>>>>>>>> >>>>>>>>>> suppose i have a new dag hitting 5 api endpoints and pulling data >>>>>>>>>> to s3. suppose that yesterday 4 of them succeeded but one failed. >>>>>>>>>> today, 4 >>>>>>>>>> of them should pull from yesterday. but the one that failed should >>>>>>>>>> pull >>>>>>>>>> from 2 days back. so even though these normally have the same >>>>>>>>>> interval, >>>>>>>>>> today they should not. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> My view on this is two fold: one, this should primarily be >>>>>>>>>> handled by retries on the task, and secondly, having different >>>>>>>>>> TaskIstances >>>>>>>>>> in the same DagRun have different data intervals would be much >>>>>>>>>> harder to >>>>>>>>>> reason about/design the UI around, so for those reasons I still think >>>>>>>>>> interval should be a DagRun-level concept. >>>>>>>>>> >>>>>>>>>> (He has a stalled AIP-30 where he proposed something to address >>>>>>>>>> this kind of "watermark" case, which we might pick up next after >>>>>>>>>> this AIP >>>>>>>>>> is complete) >>>>>>>>>> >>>>>>>>>> One thing we might want to do is extend the interface of >>>>>>>>>> AbstractTimetable to be able to add/update parameters in the context >>>>>>>>>> dict, >>>>>>>>>> so the interface could become this: >>>>>>>>>> >>>>>>>>>> class AbstractTimetable(ABC): >>>>>>>>>> @abstractmethod >>>>>>>>>> def next_dagrun_info( >>>>>>>>>> date_last_automated_dagrun: Optional[pendulum.DateTime], >>>>>>>>>> >>>>>>>>>> session: Session, >>>>>>>>>> ) -> Optional[DagRunInfo]: >>>>>>>>>> """ >>>>>>>>>> Get information about the next DagRun of this dag after >>>>>>>>>> ``date_last_automated_dagrun`` -- the >>>>>>>>>> execution date, and the earliest it could be scheduled >>>>>>>>>> >>>>>>>>>> :param date_last_automated_dagrun: The >>>>>>>>>> max(execution_date) of existing >>>>>>>>>> "automated" DagRuns for this dag (scheduled or >>>>>>>>>> backfill, but not >>>>>>>>>> manual) >>>>>>>>>> """ >>>>>>>>>> >>>>>>>>>> @abstractmethod >>>>>>>>>> def set_context_variables(self, dagrun: DagRun, context: Dict >>>>>>>>>> [str, Any]) -> None: >>>>>>>>>> """ >>>>>>>>>> Update or set new context variables to become available >>>>>>>>>> in task templates and operators. >>>>>>>>>> """ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> What do people think of this? Worth it? Too complex to reason >>>>>>>>>> about what context variables might exist as a result? >>>>>>>>>> >>>>>>>>>> *Outstanding question*: >>>>>>>>>> >>>>>>>>>> - Should "interval-less DAGs" (ones using "CronTimetable" in >>>>>>>>>> my proposal vs "DataTimetable") have data_interval_start and end >>>>>>>>>> available >>>>>>>>>> in the context? >>>>>>>>>> - Does anyone have any better names than TimeDeltaTimetable, >>>>>>>>>> DataTimetable, and CronTimetable? (We can probably change these >>>>>>>>>> names right >>>>>>>>>> up until release, so not important to get this correct *now*.) >>>>>>>>>> - Should I try to roll AIP-30 >>>>>>>>>> >>>>>>>>>> <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-30%3A+State+persistence> >>>>>>>>>> in to this, or should we make that a future addition? (My vote is >>>>>>>>>> for >>>>>>>>>> future addition) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'd like to start voting on this AIP next week (probably on >>>>>>>>>> Tuesday) as I think this will be a powerful feature that eases >>>>>>>>>> confusing to >>>>>>>>>> new users. >>>>>>>>>> >>>>>>>>>> -Ash >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, 2 Mar, 2021 at 23:05, Alex Inhert <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Is this AIP going to co-exist with AIP-35 "Add Signal Based >>>>>>>>>> Scheduling To Airflow"? >>>>>>>>>> I think streaming was also discussed there (though it wasn't >>>>>>>>>> really the use case). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 02.03.2021, 22:10, "Ash Berlin-Taylor" <[email protected]>: >>>>>>>>>> >>>>>>>>>> Hi Kevin, >>>>>>>>>> >>>>>>>>>> Interesting idea. My original idea was actually for >>>>>>>>>> "interval-less DAGs" (i.e. ones where it's just "run at this time") >>>>>>>>>> would >>>>>>>>>> not have data_interval_start or end, but (while drafting the AIP) we >>>>>>>>>> decided that it was probably "easier" if those values were always >>>>>>>>>> datetimes. >>>>>>>>>> >>>>>>>>>> That said, I think having the DB model have those values be >>>>>>>>>> nullable would future proof it without needing another migration to >>>>>>>>>> change >>>>>>>>>> it. Do you think this is worth doing now? >>>>>>>>>> >>>>>>>>>> I haven't (yet! It's on my list) spent any significant time >>>>>>>>>> thinking about how to make Airflow play nicely with streaming jobs. >>>>>>>>>> If >>>>>>>>>> anyone else has ideas here please share them >>>>>>>>>> >>>>>>>>>> -ash >>>>>>>>>> >>>>>>>>>> On Sat, 27 Feb, 2021 at 16:09, Kevin Yang <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi Ash and James, >>>>>>>>>> >>>>>>>>>> This is an exciting move. What do you think about using this >>>>>>>>>> opportunity to extend Airflow's support to streaming like use cases? >>>>>>>>>> I.e >>>>>>>>>> DAGs/tasks that want to run forever like a service. For such use >>>>>>>>>> cases, >>>>>>>>>> schedule interval might not be meaningful, then do we want to make >>>>>>>>>> the date >>>>>>>>>> interval param optional to DagRun and task instances? That sounds >>>>>>>>>> like a >>>>>>>>>> pretty major change to the underlying model of Airflow, but this AIP >>>>>>>>>> is so >>>>>>>>>> far the best opportunity I saw that can level up Airflow's support >>>>>>>>>> for >>>>>>>>>> streaming/service use cases. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Kevin Y >>>>>>>>>> >>>>>>>>>> On Fri, Feb 26, 2021 at 8:56 AM Daniel Standish < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>> Very excited to see this proposal come through and love the >>>>>>>>>> direction this has gone. >>>>>>>>>> >>>>>>>>>> Couple comments... >>>>>>>>>> >>>>>>>>>> *Tree view / Data completeness view* >>>>>>>>>> >>>>>>>>>> When you design your tasks with the canonical idempotence >>>>>>>>>> pattern, the tree view shows you both data completeness and task >>>>>>>>>> execution >>>>>>>>>> history (success / failure etc). >>>>>>>>>> >>>>>>>>>> When you don't use that pattern (which is my general preference), >>>>>>>>>> tree view is only task execution history. >>>>>>>>>> >>>>>>>>>> This change has the potential to unlock a data completeness view >>>>>>>>>> for canonical tasks. It's possible that the "data completeness >>>>>>>>>> view" can >>>>>>>>>> simply be the tree view. I.e. somehow it can use these new classes >>>>>>>>>> to know >>>>>>>>>> what data was successfully filled and what data wasn't. >>>>>>>>>> >>>>>>>>>> To the extent we like the idea of either extending / plugging / >>>>>>>>>> modifying tree view, or adding a distinct data completeness view, we >>>>>>>>>> might >>>>>>>>>> want to anticipate the needs of that in this change. And maybe no >>>>>>>>>> alteration to the proposal would be needed but just want to throw >>>>>>>>>> the idea >>>>>>>>>> out there. >>>>>>>>>> >>>>>>>>>> *Watermark workflow / incremental processing* >>>>>>>>>> >>>>>>>>>> A common pattern in data warehousing is pulling data >>>>>>>>>> incrementally from a source. >>>>>>>>>> >>>>>>>>>> A standard way to achieve this is at the start of the task, >>>>>>>>>> select max `updated_at` in source table and hold on to that value >>>>>>>>>> for a >>>>>>>>>> minute. This is your tentative new high watermark. >>>>>>>>>> Now it's time to pull your data. If your task ran before, grab >>>>>>>>>> last high watermark. If not, use initial load value. >>>>>>>>>> If successful, update high watermark. >>>>>>>>>> >>>>>>>>>> On my team we implemented this with a stateful tasks / stateful >>>>>>>>>> processes concept (there's a dormant draft AIP here >>>>>>>>>> <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-30%3A+State+persistence>) >>>>>>>>>> and a WatermarkOperator that handled the boilerplate*. >>>>>>>>>> >>>>>>>>>> Again here, I don't have a specific suggestion at this moment. >>>>>>>>>> But I wanted to articulate this workflow because it is common and it >>>>>>>>>> wasn't >>>>>>>>>> immediately obvious to me in reading AIP-39 how I would use it to >>>>>>>>>> implement >>>>>>>>>> it. >>>>>>>>>> >>>>>>>>>> AIP-39 makes airflow more data-aware. So if it can support this >>>>>>>>>> kind of workflow that's great. @Ash Berlin-Taylor >>>>>>>>>> <[email protected]> do you have thoughts on how it might be >>>>>>>>>> compatible with this kind of thing as is? >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> * The base operator is designed so that Subclasses only need to >>>>>>>>>> implement two methods: >>>>>>>>>> - `get_high_watermark`: produce the tentative new high >>>>>>>>>> watermark >>>>>>>>>> ' `watermark_execute`: analogous to implementing poke in a >>>>>>>>>> sensor, this is where your work is done. `execute` is left to the >>>>>>>>>> base >>>>>>>>>> class, and it orchestrates (1) getting last high watermark or inital >>>>>>>>>> load >>>>>>>>>> value and (2) updating new high watermark if job successful. >>>>>>>>>> >>>>>>>>>> >>> >>> -- >>> +48 660 796 129 >>> >> > -- +48 660 796 129
