ephraimbuddy commented on code in PR #50371: URL: https://github.com/apache/airflow/pull/50371#discussion_r2095162188
########## airflow-core/src/airflow/dag_processing/processor.py: ########## @@ -94,8 +96,35 @@ def _parse_file_entrypoint(): comms_decoder.send_request(log, result) +def _pre_import_airflow_modules(file_path: str, log: FilteringBoundLogger) -> None: + """ + Pre-import Airflow modules found in the given file. + + This prevents modules from being re-imported in each processing process, + saving CPU time and memory. + + Args: + file_path: Path to the file to scan for imports + log: Logger instance to use for warnings + + parsing_pre_import_modules: + default value is True + """ + if not conf.getboolean("scheduler", "parsing_pre_import_modules", fallback=True): + return + + for module in iter_airflow_imports(file_path): + try: + importlib.import_module(module) + except ModuleNotFoundError as e: + log.warning("Error when trying to pre-import module '%s' found in %s: %s", module, file_path, e) + + def _parse_file(msg: DagFileParseRequest, log: FilteringBoundLogger) -> DagFileParsingResult | None: # TODO: Set known_pool names on DagBag! + + _pre_import_airflow_modules(msg.file, log) Review Comment: Doing this per file seems wrong. It should be at the start of the DAG processor. Have you benchmarked this with current dag processor? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org