Chen take a look at `processor_poll_interval` and
`min_file_process_interval` options in airflow configuration. But
still, I would strongly recommend removing from your DAGs any
top-level code that is executed.

Cheers,
Tomek


On Sun, Oct 4, 2020 at 6:45 PM Chen Michaeli <penp...@gmail.com> wrote:
>
> Hi, a quick follow-up.
>
> Is there a parameter I can configure to alter that behavior?
>
> Say I want a specific DAG/all DAGs to be parsed every X minutes instead of 
> the default few seconds?
>
> Thanks again :)
>
> בתאריך יום ג׳, 22 בספט׳ 2020, 20:15, מאת Tomasz Urbaszek 
> ‏<turbas...@apache.org>:
>>
>> The DAG is parsed every few seconds (by scheduler). It means that any 
>> top-level code is executed every few seconds. So if you will request an 
>> external API or database on DAG level (not in operator) it means that the 
>> request will be send quite often and that's definitely not an expected 
>> behavior :)
>>
>> Cheers,
>> Tomek
>>
>> On Mon, Sep 21, 2020 at 11:23 PM Chen Michaeli <penp...@gmail.com> wrote:
>>>
>>> Hello, I am using Apache Airflow for my fun and experience and it is great!
>>> I hope I was meant to send questions to this address, please correct me if 
>>> I'm wrong.
>>>
>>> I was wondering why I shouldn't let the DAG itself do any data gathering?
>>>
>>> For example and for the sake of simplicity, I have a pipeline that reads a 
>>> file name from a s3 bucket, and than stores it in a mysql table.
>>>
>>> Normally I would use one sensor or operator to get the file name, and than 
>>> a second operator to store it in mysql. (While for example using xCom to 
>>> communicate the name between them).
>>>
>>> I understand this might be the preffered course of action, and that is what 
>>> I currently do!
>>> However, what I don't understand is why can't I just get the file name 
>>> within the DAG itself.
>>> Why is it considered to be a bad practice to do any data related processing 
>>> or gathering in the DAG?
>>>
>>> I can use the AWS API to easily retrieve the file name and store it in a 
>>> regular Python "global" variable. Than I will only have one operator that 
>>> takes this file name and stores it in mysql.
>>>
>>> Each time the DAG will be parsed for execution, my code that uses the AWS 
>>> API will run again and provide me with a new file name.
>>>
>>> Am I missing something?
>>>
>>> Thank you very much, this has gotten me so curious!

Reply via email to