Hi Sibarata -

I discussed your use case with some of our data engineers, we might have
some recommendations on how to better execute this - let us know if you
want to chat.

-Ry
Airflow Committer + Founder/CTO of Astronomer

On Mon, Sep 7, 2020 at 4:30 AM Sibabrata Pattanaik (spattana)
<[email protected]> wrote:

> Hello Team,
>
> Currently we are using airflow version - 1.10.10 to data ingest.
>
> In our DAG, we create tasks dynamically based on data volume , i.e if data
> volume is high, number of parallel tasks increases and if the data volume
> is less number of parallel tasks reduces in the next run or vice versa.
> As DAG execution instance use the same table to update, we use
> 'wait_for_downstream'  to True to maintain the data consistency and make
> sure next run should not happen if the previous run is in progress or
> failed.
>
> In this scenario, we are seeing one issue i.e. If previous instances has
> less number of tasks then the current one because of dynamic task creation,
> then the current DAG is always in waiting state . As the current DAG  is
> waiting for the new task/s which are generated during the run but not
> exists in the previous DAG instance, but waiting for the same tasks to be
> in completion state in the previous DAG.  As soon as we manually mark those
> tasks as completed in the previous DAG instance, current DAG start running .
>
> Let me know if you have any work around for this scenario.
>
> Thanks
> Sibabrata Pattanaik
> --------------------------
> [email protected]
> VOIP 84260416
> +91 80  44260416
> --------------------------
>
>

Reply via email to