[
https://issues.apache.org/jira/browse/AIRFLOW-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912096#comment-16912096
]
ASF subversion and git services commented on AIRFLOW-5100:
----------------------------------------------------------
Commit 17159f41aeb130c465f2a390cd982794908006ad in airflow's branch
refs/heads/v1-10-test from Jonathan Lange
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=17159f4 ]
[AIRFLOW-5100] Respect safe_mode configuration setting when parsing DAG files
(#5757)
The scheduler calls `list_py_file_paths` to find DAGs to schedule. It does so
without passing any parameters other than the directory. This means that
it *won't* discover DAGs that are missing the words "airflow" and "DAG" even
if DAG_DISCOVERY_SAFE_MODE is disabled.
Since `list_py_file_paths` will refer to the configuration if
`include_examples` is not provided, it makes sense to have the same behaviour
for `safe_mode`.
(cherry picked from commit c4a9d8b92adfcbbde32974e06cc34675954aae93)
> Airflow scheduler does not respect safe mode setting
> ----------------------------------------------------
>
> Key: AIRFLOW-5100
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5100
> Project: Apache Airflow
> Issue Type: Bug
> Components: scheduler
> Affects Versions: 1.10.3
> Reporter: Jonathan Lange
> Priority: Major
> Fix For: 1.10.5
>
>
> We recently disabled safe mode in our Airflow 1.10.3 deployment and then
> removed some needless comments from our DAGs that mentioned "airflow" and
> "DAG".
> After deploying (and after several days!), we found that although these DAGs
> still appeared in the UI, they were not running. They didn't have "squares"
> in the tree view indicating that they should be run.
> We restored the words "airflow" and "DAG" to these jobs, and they were
> scheduled again.
> After digging into the code, it looks like the {{SchedulerJob}} calls
> {{list_py_file_paths}} without specifying {{safe_mode}}, and
> {{list_py_file_paths}} defaults to {{safe_mode=True}}, rather than consulting
> the configuration as it does for {{include_examples}}:
> [https://github.com/apache/airflow/blob/master/airflow/jobs/scheduler_job.py#L1278]
> [https://github.com/apache/airflow/blob/master/airflow/utils/dag_processing.py#L291-L304]
> I suggest the following change, to make the behaviour of
> {{list_py_file_paths}} more consistent with itself:
> {code:python}
> modified airflow/utils/dag_processing.py
> @@ -287,7 +287,7 @@ def correct_maybe_zipped(fileloc):
> COMMENT_PATTERN = re.compile(r"\s*#.*")
>
>
> -def list_py_file_paths(directory, safe_mode=True,
> +def list_py_file_paths(directory, safe_mode=None,
> include_examples=None):
> """
> Traverse a directory and look for Python files.
> @@ -299,6 +299,8 @@ def list_py_file_paths(directory, safe_mode=True,
> :return: a list of paths to Python files in the specified directory
> :rtype: list[unicode]
> """
> + if safe_mode is None:
> + safe_mode = conf.getboolean('core', 'DAG_DISCOVERY_SAFE_MODE')
> if include_examples is None:
> include_examples = conf.getboolean('core', 'LOAD_EXAMPLES')
> file_paths = []
> {code}
> I tried to find a way to write tests for this, but I couldn't figure it out.
> I sort of expected a function that looked at a bunch of files and returned a
> collection of DAGs, but I couldn't find it, and couldn't really get the theme
> behind {{DagFileProcessorAgent}} and friends.
>
> I haven't tried to produce a minimal example of this error, and have not
> confirmed that the above patch fixes the problem.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)