Re: Recommended backend metastore for Airflow

2018-12-10 Thread ramandumcs
Thanks Ash, We are trying to run 1000 concurrent Dags and are facing scalability issues with mysql. So we are exploring other backend stores pgsql and mssql. Any recommendation on airflow config like heartbeat interval, pool size etc.. to support this much workload Thanks, Raman Gupta On

Recommended backend metastore for Airflow

2018-12-10 Thread ramandumcs
Hi All, It seems that Airflow supports mysql, postgresql and mssql as backend store. Any recommendation on using one over other. We are expecting to run 1000(s) of concurrent Dags which would generate heavy load on backend store. Any pointer on this would be useful. Thanks, Raman Gupta

Passing Custom Env variables while launching k8 Pod

2018-10-29 Thread ramandumcs
Hi All, Is there a way to provide env variables while launching K8 pod through K8 executor. we need to pass some env variable which are referred inside our Airflow Operator. so can we provide custom env variable to docker run command while launching task pod. Currently it seems that it

SqlAlchemy Pool config parameters to minimize connectivity issue impact

2018-09-26 Thread ramandumcs
Hi All, We are observing sometimes Dag tasks get failed because of some connectivity issues with Mysql server. So Are there any recommended settings for mysql pool's related parameters like sql_alchemy_pool_size = 5 sql_alchemy_pool_recycle = 3600 to minimise the connectivity issue impact.

Re: Delay between tasks executions issue

2018-09-11 Thread ramandumcs
We use max_threads = number of scheduler cores. On 2018/09/11 09:49:53, Chandu Kavar wrote: > Thanks Raman, > > Understood. > > We have around 500 DAGs. What value do you suggest for max_threads? > > On Tue, Sep 11, 2018, 5:44 PM ramandu...@gmail.com > wrote: > > > Hi Chandu, > > How many

Re: Delay between tasks executions issue

2018-09-11 Thread ramandumcs
Hi Chandu, How many dag files are there on the scheduler. As per my understanding scheduler processes each file to trigger any dag/task run. It spawns number of processes equivalent to "max_threads" count to parallelize file processing. So you can try by increasing airflow config's max_threads

Re: Best Practice of Airflow Setting-Up & Usage

2018-09-07 Thread ramandumcs
Yeah, we are seeing scheduler becoming bottleneck as number of DAG files increase as scheduler can scale vertically and not horizontally. We are trying with multiple independent airflow setup and are distributing the load between them. But managing these many airflow clusters is becoming a

Re: Best Practice of Airflow Setting-Up & Usage

2018-09-06 Thread ramandumcs
Hi, We have a requirement to scale to run 1000(s) concurrent dags. With celery executor we observed that Airflow worker gets stuck sometimes if connection to redis/mysql breaks (https://github.com/celery/celery/issues/3932 https://github.com/celery/celery/issues/4457) Currently we are using

Re: Task is Stuck in Up_For_Retry

2018-08-21 Thread ramandumcs
Hi All, As per http://docs.sqlalchemy.org/en/latest/core/connections.html link db engine is not portable across process boundaries "For a multiple-process application that uses the os.fork system call, or for example the Python multiprocessing module, it’s usually required that a separate

Re: Airflow Meetup in India

2018-08-21 Thread ramandumcs
Thanks Sumit, We work for Adobe and will try to have it in bangalore based on the participation. Would it be possible for you to share the poll link. Thanks, Raman Gupta On 2018/08/21 06:38:37, Sumit Maheshwari wrote: > Hi Raman, > > Folks from Qubole certainly join/talk if the meetup is

Airflow Meetup in India

2018-08-20 Thread ramandumcs
Hi All, Do we have airflow meetup(s) in India. We are based out of india and are using Apache Airflow as an orchestration engine to author and manage 1000(s) of multi-step workflows. Would be interested in joining/conducting Airflow Meetup in India. Thanks, Raman Gupta

Re: Task is Stuck in Up_For_Retry

2018-08-17 Thread ramandumcs
We are getting the logs like {local_executor.py:43} INFO - LocalWorker running airflow run {models.py:1595} ERROR - Executor reports task instance %s finished (%s) although the task says its %s. Was the task killed externally? {models.py:1616} INFO - Marking task as UP_FOR_RETRY It seems that

Re: Task is Stuck in Up_For_Retry

2018-08-17 Thread ramandumcs
Thanks Taylor, We are getting this issue even after restart. We are observing that task instance state is transitioned from scheduled->queued->up_for_retry and dag gets stuck in up_for_retry state. Behind the scenes executor keep on retrying the dag's task exceeding the max retry limit. In

Task is Stuck in Up_For_Retry

2018-08-16 Thread ramandumcs
Hi All, We are using airflow 1.9 with Local Executor more. Intermittently we are observing that tasks are getting stuck in "up_for_retry" mode and are getting retried again and again exceeding their configured max retries count. like we have configured max retries as 2 but task is retried 15

Scheduler not honouring non_pooled_task_slot_count config

2018-07-09 Thread ramandumcs
We are using airflow version 1.9 with celery executor. And we are observing that Airflow Scheduler is not honouring the "non_pooled_task_slot_count" config. We are using default setting which is set to 128. But we could schedule and run >128 tasks concurrently. >From code it seems that

Scheduler crashed due to mysql connectivity errors

2018-06-29 Thread ramandumcs
Hi All, We are using airflow 1.9 and are observing scheduler crashes due to mysql connectivity related issues. like "scheduler is crashed because of OperationalError: (_mysql_exceptions.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at:

Re: Concurrency Settings for Celery Executor

2018-06-21 Thread ramandumcs
Thanks George. On 2018/06/20 20:06:38, George Leslie-Waksman wrote: > "celeryd_concurrency" and "parallelism" serve different purposes > > "celeryd_concurrency" determines how many worker subprocesses will be spun > up on an airflow worker instance (how many concurrent tasks per machine) > >

Concurrency Settings for Celery Executor

2018-06-13 Thread ramandumcs
Hi All, There seems to be couple of settings in airflow.cfg which controls the number of tasks that can run in parallel on Airflow( "parallelism" and "celeryd_concurrency") In case of celeryExecutor which one is honoured. Do we need to set both or setting only celeryd_concurrency would work.

Re: Single Airflow Instance Vs Multiple Airflow Instance

2018-06-08 Thread ramandumcs
Thanks Kevin, I am specifically interested in scheduler settings like scheduler_zombie_task_threshold, max_tis_per_query We are expecting the load in terms of 1000(s) concurrent Dags so any airflow setting which might help us in achieving this target would be useful. There will be 1000(s) local

Re: Single Airflow Instance Vs Multiple Airflow Instance

2018-06-07 Thread ramandumcs
We have similar use case where we need to support multiple teams and expected load is 1000(s) active TIs. We are exploring setting up multiple airflow cluster on for each team and scale that cluster horizontally through celery executor. @Ruiquin could you please share some details on airflow

Re: Disable Processing of DAG file

2018-05-30 Thread ramandumcs
Thanks Maxime, we have 100(s) of dags with schedule set to @once with new DAGs keep on coming in the system. Scheduler process each and every DAG inside the local DAG folder. Each Dag file processing takes around 400 millisecond and we have set max_threads to 8(As we have 8 core machine). i.e

Disable Processing of DAG file

2018-05-28 Thread ramandumcs
Hi All, We have a use case where there would be 100(s) of DAG files with schedule set to "@once". Currently it seems that scheduler processes each and every file and creates a Dag Object. Is there a way or config to tell scheduler to stop processing certain files. Thanks, Raman Gupta

Scheduler gets slow with increasing number of Dags

2018-05-23 Thread ramandumcs
Hi All, We have a use case where there are 100s of DAGs in the scheduler's local dag folder but at a time only ~50 dags are active(Other dags are disabled). New dags keep on adding to the local Dag folder. We are observing that scheduler is taking lot of time(around 20 minutes) in picking

Dags getting failed after 24 hours

2018-05-21 Thread ramandumcs
Hi All, We have a long running DAG which is expected to take around 48 hours. But we are observing that its get killed by Airflow scheduler after ~24 hrs. We are not setting any Dag/task execution timeout explicitly. Is there any default timeout value that get used. We are using LocalExecutor

Tasks are marked as zombie

2018-05-11 Thread ramandumcs
Hi All, We are facing a issue in which Tasks are marked as Zombie. We are running airflow in LocalExecutor Mode and there is not much load on the machine or on Mysql server. There is one config which seem to change this behaviour # Local task jobs periodically heartbeat to the DB. If the job

Airflow Scheduler Crashing

2018-05-09 Thread ramandumcs
Hi All, We are running airflow version 1.9 in LocalExecutor mode. We are observing that scheduler is crashed after few hours with below stack logs(Seems to be an issue with Mysql Connection. Is there any fix or workaround for this) Traceback (most recent call last): File

Template for Operator' Json Argument

2018-05-05 Thread ramandumcs
Hi All, We have implemented a custom operator which is derived from baseOperator. Custom operator takes a JSON argument. Some fields of this Json are string and others are integer. We need to templatised the string fields only and not integer. But on doing this we are getting the error

Configuration passing through airflow cli (trigger_dag)

2018-05-02 Thread ramandumcs
Hi, I need to pass certain arguments to my custom operator at run time. It seems that airflow cli's trigger_dag command support passing conf at run time which referred in operator's execute function through {{ dag_run.conf['name'] }} template. But I am not being able to read the "name" conf

Task get stuck in up_for_retry state

2018-04-23 Thread ramandumcs
Hi , I am providing the "retries" and "retry_delay" as an argument to one of my operator in the DAG. But the corresponding task get stuck in "up_for_retry" state and is not being retried by scheduler. retry_delay is set to timedelta(seconds=5) so it should get retried after 5 seconds. Thanks,

Re: Cancel a Running dag

2018-04-18 Thread ramandumcs
We are exploring following approach for DAG cancellation. Please let us know if you see any issue with this 1) Set/create the xcom variable "cancel":"true". It would be set out of the band by updating the xcom Table in metadata store. 2) Operators would have the code to periodically check for

Re: Benchmarking of Airflow Scheduler with Celery Executor

2018-04-13 Thread ramandumcs
Thanks Ry, Just wondering if there is any approximate number on concurrent tasks a scheduler can run on say 16 GB RAM and 8 core machine. If its already been done that would be useful. We did some benchmarking with local executor and observed that each TaskInstance was taking ~100MB of memory so

Re: Cancel a Running dag

2018-04-13 Thread ramandumcs
Thanks Bolke, Will it become part of Airflow 1.10 release. Is there any tentative timeline for same. -Raman Gupta On 2018/04/12 19:19:07, Bolke de Bruin wrote: > This is now fixed in master. Clearing tasks will now properly terminate a > running task. If you pause the dag

Re: Cancel a Running dag

2018-04-12 Thread ramandumcs
Thanks Laura, We are using the CeleryExecutor. Just wondering if marking the TaskInstances as failed in metadata store would also work. -Raman On 2018/04/12 16:27:00, Laura Lorenz wrote: > I use the CeleryExecutor and have used a mix of `celery control` and >

Benchmarking of Airflow Scheduler with Celery Executor

2018-04-12 Thread ramandumcs
Hi All, We have requirement to run 10k(s) of concurrent tasks. We are exploring Airflow's Celery Executor for same. Horizontally Scaling of worker nodes seem possible but it can only have one active scheduler. So will Airflow scheduler be able to handle these many concurrent tasks. Is there any

Cancel a Running dag

2018-04-12 Thread ramandumcs
Hi All, We have a use case to cancel the already running DAG. So is there any recommended way to do so. Thanks, Raman

Airflow Scalability with Local Executor

2018-03-28 Thread ramandumcs
Hi All, We have a use case to support 1000 concurrent DAGs. These dags would have have couple of Http task which would be submitting jobs to external services. Each DAG could run for couple of hours. HTTP tasks are periodically checking(with sleep 20) the job status. We tried running 1000 such