Max,

Thank you for the quick response, that is very helpful and great material for 
my investigations!

Thanks again,
Stéphane


> On Jun 11, 2018, at 3:11 PM, Maxime Beauchemin <maximebeauche...@gmail.com> 
> wrote:
> 
> DagBag import timeouts happen when people do more than just "configuration
> as code" in their module scope (say doing actual compute in module scope,
> which is a no-no). They may also happen if you read things from flimsy
> external systems that may introduce delays. Say you read pipeline
> configuration from Zookeeper or from a database or network drive and
> somehow that operation is timing out.
> 
> Also with Airflow (at the moment) you are responsible to synchronize the
> pipeline definitions (DAGS_FOLDER) on all machines across the cluster. If
> they are not in sync you'll have problems with symptoms that may look like
> "dag_id not found". That happens when the scheduler is aware of DAGs that
> workers may not be aware of.
> 
> Max
> 
> On Mon, Jun 11, 2018 at 12:42 PM Stephane Bonneaud <steph...@fathomhealth.co>
> wrote:
> 
>> Hi there,
>> 
>> We’re using Airflow in our startup and it’s been great in many ways,
>> thanks for the work you guys are doing!
>> 
>> Unfortunately, we’re hitting a bunch of issues with ops timing out, DAGs
>> failing for unclear reasons, with no logs or the following error:
>> "airflow.exceptions.AirflowException: dag_id could not be found”. This
>> seems to happen when enough DAGs are running at the same time, though it
>> can also happen more rarely here and there. But, the best way to reproduce
>> the error with our setup is to run enough DAGs at once. Most of the time,
>> clearing the DAG run or ops that have failed and letting the DAG re-run is
>> enough to fix the problem.
>> 
>> I found resources pointing to the dagbag_import_timeout, e.g.,
>> https://stackoverflow.com/questions/43235130/airflow-dag-id-could-not-be-found
>> <
>> https://stackoverflow.com/questions/43235130/airflow-dag-id-could-not-be-found
>>> .
>> I did play with that parameter, and other parameters as well. And it does
>> seem that they help, i.e., I can run more DAGs at once, but
>>        (1) if I run enough DAGs at once, I still see ops and DAGs
>> failing, so the problem is not fixed ;
>>        (2) more importantly, I don’t fully understand the problem. I have
>> some ideas on what is happening, but maybe I’m totally wrong?
>> 
>> Any recommendations on how I should investigate that?
>> 
>> Thank you very much!
>> Have a nice rest of the day,
>> Stéphane
>> http://stephanebonneaud.com <http://stephanebonneaud.com/>
>> 
>> 

Reply via email to