Got it! I don’t think it’s this last case, but I’ll keep my eye open for it 
anyway.
Really, thanks again, I appreciate the help! Will let you know what I find if I 
feel it may be of some use for you.

Stéphane


> On Jun 11, 2018, at 3:31 PM, Maxime Beauchemin <maximebeauche...@gmail.com> 
> wrote:
> 
> One more thing is if one of your worker has a missing dependency required
> for a specific DAG. For example you read configuration from zookeeper in
> the DAG file, but only one worker is missing the Zookeeper client python
> lib, but the scheduler has the lib. You can imagine that the scheduler will
> send the job over to the worker, and the worker can't interpret the DAG
> file.
> 
> 
> On Mon, Jun 11, 2018 at 3:22 PM Stephane Bonneaud <steph...@fathomhealth.co>
> wrote:
> 
>> Max,
>> 
>> Thank you for the quick response, that is very helpful and great material
>> for my investigations!
>> 
>> Thanks again,
>> Stéphane
>> 
>> 
>>> On Jun 11, 2018, at 3:11 PM, Maxime Beauchemin <
>> maximebeauche...@gmail.com> wrote:
>>> 
>>> DagBag import timeouts happen when people do more than just
>> "configuration
>>> as code" in their module scope (say doing actual compute in module scope,
>>> which is a no-no). They may also happen if you read things from flimsy
>>> external systems that may introduce delays. Say you read pipeline
>>> configuration from Zookeeper or from a database or network drive and
>>> somehow that operation is timing out.
>>> 
>>> Also with Airflow (at the moment) you are responsible to synchronize the
>>> pipeline definitions (DAGS_FOLDER) on all machines across the cluster. If
>>> they are not in sync you'll have problems with symptoms that may look
>> like
>>> "dag_id not found". That happens when the scheduler is aware of DAGs that
>>> workers may not be aware of.
>>> 
>>> Max
>>> 
>>> On Mon, Jun 11, 2018 at 12:42 PM Stephane Bonneaud <
>> steph...@fathomhealth.co>
>>> wrote:
>>> 
>>>> Hi there,
>>>> 
>>>> We’re using Airflow in our startup and it’s been great in many ways,
>>>> thanks for the work you guys are doing!
>>>> 
>>>> Unfortunately, we’re hitting a bunch of issues with ops timing out, DAGs
>>>> failing for unclear reasons, with no logs or the following error:
>>>> "airflow.exceptions.AirflowException: dag_id could not be found”. This
>>>> seems to happen when enough DAGs are running at the same time, though it
>>>> can also happen more rarely here and there. But, the best way to
>> reproduce
>>>> the error with our setup is to run enough DAGs at once. Most of the
>> time,
>>>> clearing the DAG run or ops that have failed and letting the DAG re-run
>> is
>>>> enough to fix the problem.
>>>> 
>>>> I found resources pointing to the dagbag_import_timeout, e.g.,
>>>> 
>> https://stackoverflow.com/questions/43235130/airflow-dag-id-could-not-be-found
>>>> <
>>>> 
>> https://stackoverflow.com/questions/43235130/airflow-dag-id-could-not-be-found
>>>>> .
>>>> I did play with that parameter, and other parameters as well. And it
>> does
>>>> seem that they help, i.e., I can run more DAGs at once, but
>>>>       (1) if I run enough DAGs at once, I still see ops and DAGs
>>>> failing, so the problem is not fixed ;
>>>>       (2) more importantly, I don’t fully understand the problem. I
>> have
>>>> some ideas on what is happening, but maybe I’m totally wrong?
>>>> 
>>>> Any recommendations on how I should investigate that?
>>>> 
>>>> Thank you very much!
>>>> Have a nice rest of the day,
>>>> Stéphane
>>>> http://stephanebonneaud.com <http://stephanebonneaud.com/>
>>>> 
>>>> 
>> 
>> 

Reply via email to