[ https://issues.apache.org/jira/browse/AIRFLOW-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16998428#comment-16998428 ]
ASF subversion and git services commented on AIRFLOW-5931: ---------------------------------------------------------- Commit 5a14b9925567618dae38e5774b7f77e954e214d7 in airflow's branch refs/heads/v1-10-test from Ash Berlin-Taylor [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=5a14b99 ] [AIRFLOW-5931] Use os.fork when appropriate to speed up task execution. (#6627) * [AIRFLOW-5931] Use os.fork when appropriate to speed up task execution. Rather than running a fresh python interpreter which then has to re-load all of Airflow and its dependencies we should use os.fork when it is available/suitable which should speed up task running, espeically for short lived tasks. I've profiled this and it took the task duration (as measured by the `duration` column in the TI table) from an average of 14.063s down to just 0.932s! * Allow `reap_process_group` to kill processes even when the "group leader" has already exited. * Don't re-initialize JSON/stdout logging ElasticSearch inside forked processes Most of the time we will run the "raw" task in a forked subprocess (the only time we don't is when we use impersonation) that will have the logging already configured. So if the EsTaskHandler has already been configured we don't want to "re"configure it -- otherwise it will disable JSON output for the actual task! > Spawning new python interpreter for every task slow > --------------------------------------------------- > > Key: AIRFLOW-5931 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5931 > Project: Apache Airflow > Issue Type: Improvement > Components: executors, worker > Affects Versions: 2.0.0 > Reporter: Ash Berlin-Taylor > Assignee: Ash Berlin-Taylor > Priority: Major > Fix For: 1.10.7 > > > There are a number of places in the Executors and Task Runners where we spawn > a whole new python interpreter. > My profiling has shown that this is slow. Rather than running a fresh python > interpreter which then has to re-load all of Airflow and its dependencies we > should use {{os.fork}} when it is available/suitable which should speed up > task running, espeically for short lived tasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)