[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430385#comment-17430385
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-----------------------------------------

jledru-redoute commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-946459118


   Hello,
   Still on version 1.10.12 managed by cloud composer but we are intending to 
move quite quickly to airflow 2. 
   But it seems that this issue is not really resolved on the version 2.
   We are experiencing this issue not every day, but quite often and always on 
the same dags. Those dags are dynamically generated by the same python file in 
airflow based on conf files scanning. It took generally around 12s to parse, so 
I don't think this is the issue.  It looks like this : 
   ```
   for country in DAG_PARAMS['countries']:
   
       for audience_type in AUDIENCES_TYPE:
   
           # get audiences conf file to generate the dags
           conf_files = glob.glob(
               f"/home/airflow/gcs/data/CAM/{ country['country_code'] 
}/COMPOSER_PARAM_SOURCES/{ audience_type['type'] }/*")
   
           audiences_list = []
   
           for conf_file in conf_files:
   
               string_conf = open(conf_file, 'rb').read().decode("UTF-8")
               audiences_list.append(json.loads(string_conf))
   
           for letter in ascii_uppercase:
               dag_aud_list = [
                   aud for aud in audiences_list if aud["CATEG_CODE"][0] == 
letter]
   
               if dag_aud_list:
                   dag = create_dag(audience_type, country, dag_aud_list)
                   globals()[
                       f"{ audience_type['type'] }_{ country['country_code'] 
}_{ letter }_dag"] = dag
   ```
   I understand it is not quite recommanded (however what is preco for this 
type of DAG) but that's the way it is done.
   It generates for now around 10 dags with approx 35 init sensor in reschedule 
mode every 20 minutes.
   Worker machine is n1-standard-4 set with worker_concurrency at 24. 
   Therefore yesterday on 35 celerys task set to be reschedule, 32 of them were 
rescheduled on the same worker (there are 3 workers) at quite the same time 
(I'm not sure how to see of the worker concurrency was respected or not but I 
doubt it) causing 17 of them to fail with this specific issue ...
   If I understand, set worker_autoscale to "4,2" (and keeping 
worker_concurrency to 24) would resolve the issue ? 
   Thanks,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5071
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DAG, scheduler
>    Affects Versions: 1.10.3
>            Reporter: msempere
>            Priority: Critical
>             Fix For: 1.10.12
>
>         Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance <TaskInstance: X 
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance <TaskInstance: X 
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to