Hi, a quick follow-up. Is there a parameter I can configure to alter that behavior?
Say I want a specific DAG/all DAGs to be parsed every X minutes instead of the default few seconds? Thanks again :) בתאריך יום ג׳, 22 בספט׳ 2020, 20:15, מאת Tomasz Urbaszek < turbas...@apache.org>: > The DAG is parsed every few seconds (by scheduler). It means that any > top-level code is executed every few seconds. So if you will request an > external API or database on DAG level (not in operator) it means that the > request will be send quite often and that's definitely not an expected > behavior :) > > Cheers, > Tomek > > On Mon, Sep 21, 2020 at 11:23 PM Chen Michaeli <penp...@gmail.com> wrote: > >> Hello, I am using Apache Airflow for my fun and experience and it is >> great! >> I hope I was meant to send questions to this address, please correct me >> if I'm wrong. >> >> I was wondering why I shouldn't let the DAG itself do any data gathering? >> >> For example and for the sake of simplicity, I have a pipeline that reads >> a file name from a s3 bucket, and than stores it in a mysql table. >> >> Normally I would use one sensor or operator to get the file name, and >> than a second operator to store it in mysql. (While for example using xCom >> to communicate the name between them). >> >> I understand this might be the preffered course of action, and that is >> what I currently do! >> However, what I don't understand is why can't I just get the file name >> within the DAG itself. >> Why is it considered to be a bad practice to do any data related >> processing or gathering in the DAG? >> >> I can use the AWS API to easily retrieve the file name and store it in a >> regular Python "global" variable. Than I will only have one operator that >> takes this file name and stores it in mysql. >> >> Each time the DAG will be parsed for execution, my code that uses the AWS >> API will run again and provide me with a new file name. >> >> Am I missing something? >> >> Thank you very much, this has gotten me so curious! >> >