Chen take a look at `processor_poll_interval` and `min_file_process_interval` options in airflow configuration. But still, I would strongly recommend removing from your DAGs any top-level code that is executed.
Cheers, Tomek On Sun, Oct 4, 2020 at 6:45 PM Chen Michaeli <penp...@gmail.com> wrote: > > Hi, a quick follow-up. > > Is there a parameter I can configure to alter that behavior? > > Say I want a specific DAG/all DAGs to be parsed every X minutes instead of > the default few seconds? > > Thanks again :) > > בתאריך יום ג׳, 22 בספט׳ 2020, 20:15, מאת Tomasz Urbaszek > <turbas...@apache.org>: >> >> The DAG is parsed every few seconds (by scheduler). It means that any >> top-level code is executed every few seconds. So if you will request an >> external API or database on DAG level (not in operator) it means that the >> request will be send quite often and that's definitely not an expected >> behavior :) >> >> Cheers, >> Tomek >> >> On Mon, Sep 21, 2020 at 11:23 PM Chen Michaeli <penp...@gmail.com> wrote: >>> >>> Hello, I am using Apache Airflow for my fun and experience and it is great! >>> I hope I was meant to send questions to this address, please correct me if >>> I'm wrong. >>> >>> I was wondering why I shouldn't let the DAG itself do any data gathering? >>> >>> For example and for the sake of simplicity, I have a pipeline that reads a >>> file name from a s3 bucket, and than stores it in a mysql table. >>> >>> Normally I would use one sensor or operator to get the file name, and than >>> a second operator to store it in mysql. (While for example using xCom to >>> communicate the name between them). >>> >>> I understand this might be the preffered course of action, and that is what >>> I currently do! >>> However, what I don't understand is why can't I just get the file name >>> within the DAG itself. >>> Why is it considered to be a bad practice to do any data related processing >>> or gathering in the DAG? >>> >>> I can use the AWS API to easily retrieve the file name and store it in a >>> regular Python "global" variable. Than I will only have one operator that >>> takes this file name and stores it in mysql. >>> >>> Each time the DAG will be parsed for execution, my code that uses the AWS >>> API will run again and provide me with a new file name. >>> >>> Am I missing something? >>> >>> Thank you very much, this has gotten me so curious!