Hi Stefan, Afaik there isn't a more efficient way of doing this. DAGs that are relying on a lot of sensors are experiencing the same issues. The only way right now, I can think of, is doing updating the state directly in the database. But then you need to know what you are doing. I can image that this would be feasible by using an AWS lambda function. Hope this helps.
Cheers, Fokko 2018-05-26 17:50 GMT+02:00 Stefan Seelmann <m...@stefan-seelmann.de>: > Hello, > > I have a DAG (externally triggered) where some processing is done at an > external system (EC2 instance). The processing is started by an Airflow > task (via HTTP request). The DAG should only continue once that > processing is completed. In a first naive implementation I created a > sensor that gets the progress (via HTTP request) and only if status is > "finished" returns true and the DAG run continues. That works but... > > ... the external processing can take hours or days, and during that time > a worker is occupied which does nothing but HTTP GET and sleep. There > will be hundreds of DAG runs in parallel which means hundreds of workers > are occupied. > > I looked into other operators that do computation on external systems > (ECSOperator, AWSBatchOperator) but they also follow that pattern and > just wait/sleep. > > So I want to ask if there is a more efficient way to build such a > workflow with Airflow? > > Kind Regards, > Stefan >