[
https://issues.apache.org/jira/browse/AIRFLOW-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902722#comment-16902722
]
ASF GitHub Bot commented on AIRFLOW-5100:
-----------------------------------------
jml commented on pull request #5757: [AIRFLOW-5100] Use safe_mode configuration
setting by default
URL: https://github.com/apache/airflow/pull/5757
The scheduler calls `list_py_file_paths` to find DAGs to schedule. It does so
without passing any parameters other than the directory. This means that
it *won't* discover DAGs that are missing the words "airflow" and "DAG" even
if DAG_DISCOVERY_SAFE_MODE is disabled.
Since `list_py_file_paths` will refer to the configuration if
`include_examples` is not provided, it makes sense to have the same behaviour
for `safe_mode`.
Make sure you have checked _all_ steps below.
### Jira
- [x] My PR addresses the following [Airflow
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
### Description
See above. There are no UI changes.
### Tests
I stared at the code a long time and couldn't figure out a good place to
test.
- `list_py_file_paths` doesn't have direct tests
- adding a test for this particular branch might be OK, but feels a bit
like a [change detector
test](https://testing.googleblog.com/2015/01/testing-on-toilet-change-detector-tests.html)
- If you can point me to a context manager to temporarily override a
configuration setting, I could probably write some direct tests that use
`tests/dags/` as the source of DAGs
- `SchedulerJob` is actually exhibiting the bug, and is where the behaviour
needs to be correct
- There's no obvious API to exercise that would let me test this
- I could theoretically extract a `_make_processor_agent` from `_execute`,
and assert things about its return value, but I don't want spend time on that
without a core contributor approving it first
- I would still need a context manager to temporarily override
configuration settings
### Commits
- [x] My commits all reference Jira issues in their subject lines, and I
have squashed multiple commits if they address the same issue. In addition, my
commits follow the guidelines from "[How to write a good git commit
message](http://chris.beams.io/posts/git-commit/)"
### Documentation
- [x] In case of new functionality, my PR adds documentation that describes
how to use it.
- All the public functions and the classes in the PR contain docstrings
that explain what it does
- If you implement backwards incompatible changes, please leave a note in
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so
we can assign it to a appropriate release
### Code Quality
- [x] Passes `flake8`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Airflow scheduler does not respect safe mode setting
> ----------------------------------------------------
>
> Key: AIRFLOW-5100
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5100
> Project: Apache Airflow
> Issue Type: Bug
> Components: scheduler
> Affects Versions: 1.10.3
> Reporter: Jonathan Lange
> Priority: Major
>
> We recently disabled safe mode in our Airflow 1.10.3 deployment and then
> removed some needless comments from our DAGs that mentioned "airflow" and
> "DAG".
> After deploying (and after several days!), we found that although these DAGs
> still appeared in the UI, they were not running. They didn't have "squares"
> in the tree view indicating that they should be run.
> We restored the words "airflow" and "DAG" to these jobs, and they were
> scheduled again.
> After digging into the code, it looks like the {{SchedulerJob}} calls
> {{list_py_file_paths}} without specifying {{safe_mode}}, and
> {{list_py_file_paths}} defaults to {{safe_mode=True}}, rather than consulting
> the configuration as it does for {{include_examples}}:
> [https://github.com/apache/airflow/blob/master/airflow/jobs/scheduler_job.py#L1278]
> [https://github.com/apache/airflow/blob/master/airflow/utils/dag_processing.py#L291-L304]
> I suggest the following change, to make the behaviour of
> {{list_py_file_paths}} more consistent with itself:
> {code:python}
> modified airflow/utils/dag_processing.py
> @@ -287,7 +287,7 @@ def correct_maybe_zipped(fileloc):
> COMMENT_PATTERN = re.compile(r"\s*#.*")
>
>
> -def list_py_file_paths(directory, safe_mode=True,
> +def list_py_file_paths(directory, safe_mode=None,
> include_examples=None):
> """
> Traverse a directory and look for Python files.
> @@ -299,6 +299,8 @@ def list_py_file_paths(directory, safe_mode=True,
> :return: a list of paths to Python files in the specified directory
> :rtype: list[unicode]
> """
> + if safe_mode is None:
> + safe_mode = conf.getboolean('core', 'DAG_DISCOVERY_SAFE_MODE')
> if include_examples is None:
> include_examples = conf.getboolean('core', 'LOAD_EXAMPLES')
> file_paths = []
> {code}
> I tried to find a way to write tests for this, but I couldn't figure it out.
> I sort of expected a function that looked at a bunch of files and returned a
> collection of DAGs, but I couldn't find it, and couldn't really get the theme
> behind {{DagFileProcessorAgent}} and friends.
>
> I haven't tried to produce a minimal example of this error, and have not
> confirmed that the above patch fixes the problem.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)