dwreeves opened a new pull request, #62365:
URL: https://github.com/apache/airflow/pull/62365
<!-- SPDX-License-Identifier: Apache-2.0
https://www.apache.org/licenses/LICENSE-2.0 -->
This PR lazy-loads Snowflake modules to improve performance when importing
from `airflow.providers.snowflake`.
Because `SnowflakeSqlApiHook` is a subclass of `SnowflakeHook`, and the
operators module imports the `SnowflakeSqlApiHook` globally, this improves the
load time of effectively every import from `airflow.providers.snowflake` and
puts significantly less stress on the scheduler instance when parsing the DAG.
See linked issue for full benchmarking code. TLDR, the cost of importing
Snowflake modules is fairly significant:
```
─────────────────────────── Snowflake Hook — Import Cost Benchmark
───────────────────────────
Python: 3.13.1 (main, Jan 14 2025, 23:48:54) [Clang 19.1.6 ]
Runs per scenario: 10 | also benchmark w/o .pyc: no
avg ms (±stdev) | avg RSS MB — 10 runs each
╭───────────────────────────────┬───────────────────┬──────────────────╮
│ Scenario │ Time w/ .pyc (ms) │ RSS w/ .pyc (MB) │
├───────────────────────────────┼───────────────────┼──────────────────┤
│ No Snowflake, no SQLAlchemy │ 60 ±2 │ 17.2 │
│ No Snowflake, yes SQLAlchemy │ 121 ±1 (+61) │ 31.0 (+13.9) │
│ Yes Snowflake, yes SQLAlchemy │ 291 ±5 (+231) │ 59.8 (+42.6) │
╰───────────────────────────────┴───────────────────┴──────────────────╯
(+N) = delta vs first scenario | ±N = stdev across runs
```
The one thing I am unclear about is covering this with tests. I have
[written code
before](https://github.com/ewels/rich-click/blob/main/tests/test_no_rich_imports.py)
which asserts the non-import of a module, and I also see this in some cases in
Airflow (see
[\[1\]](https://github.com/apache/airflow/blob/762ad78d7d89beb9786a5cabad9f7cb9d42c880a/airflow-core/tests/unit/serialization/test_serialized_objects.py#L4),
[\[2\]](https://github.com/apache/airflow/blob/762ad78d7d89beb9786a5cabad9f7cb9d42c880a/shared/secrets_masker/tests/secrets_masker/test_secrets_masker.py#L136)),
however I'm not sure if that's done generally or if it makes sense here.
* closes: #62362
---
##### Was generative AI tooling used to co-author this PR?
- [ ] Yes (please specify the tool below)
(No AI code is present in this commit. However, I did use code generated by
Claude to help benchmark the performance improvement.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]