On Sat, 25 Jan 2020 at 21:20, Jarek Potiuk <[email protected]> wrote: > My interpretation of it is that - on a very high level - Serverless approach > is not very good when there is quite a state to be shared between tasks.
Well serverless does not mean that all state is torn down between executions – but that it's a possibility. Often times, there's going to be data cached on the function host and/or multiple function executions per instance setup. > [...] There are also limitations > when it comes to task running time - it is not uncommon in Airflow that a > task can run for many hours. Both limitations make it not very well suited > for a serverless approach IMHO. That depends on the platform. A serverless function is not necessarily required to finish within a certain amount of time (although this changes the pricing considerably). In any case, if a task takes longer than a couple of minutes, it could be offloaded to a beefier computing system (for example Beam, Spark, or perhaps just a container instance or VM that be started on-demand). In this scenario, Airflow is more of an orchestrator than worker. --\-- cheers
