dwreeves opened a new pull request, #62365:
URL: https://github.com/apache/airflow/pull/62365

    <!-- SPDX-License-Identifier: Apache-2.0
         https://www.apache.org/licenses/LICENSE-2.0 -->
   
   This PR lazy-loads Snowflake modules to improve performance when importing 
from `airflow.providers.snowflake`.
   
   Because `SnowflakeSqlApiHook` is a subclass of `SnowflakeHook`, and the 
operators module imports the `SnowflakeSqlApiHook` globally, this improves the 
load time of effectively every import from `airflow.providers.snowflake` and 
puts significantly less stress on the scheduler instance when parsing the DAG.
   
   See linked issue for full benchmarking code. TLDR, the cost of importing 
Snowflake modules is fairly significant:
   
   ```
   ─────────────────────────── Snowflake Hook — Import Cost Benchmark 
───────────────────────────
     Python: 3.13.1 (main, Jan 14 2025, 23:48:54) [Clang 19.1.6 ]
     Runs per scenario: 10  |  also benchmark w/o .pyc: no
   
               avg ms (±stdev)  |  avg RSS MB  —  10 runs each             
   ╭───────────────────────────────┬───────────────────┬──────────────────╮
   │ Scenario                      │ Time w/ .pyc (ms) │ RSS w/ .pyc (MB) │
   ├───────────────────────────────┼───────────────────┼──────────────────┤
   │ No Snowflake, no SQLAlchemy   │             60 ±2 │             17.2 │
   │ No Snowflake, yes SQLAlchemy  │     121 ±1  (+61) │    31.0  (+13.9) │
   │ Yes Snowflake, yes SQLAlchemy │    291 ±5  (+231) │    59.8  (+42.6) │
   ╰───────────────────────────────┴───────────────────┴──────────────────╯
     (+N) = delta vs first scenario  |  ±N = stdev across runs
   ```
   
   The one thing I am unclear about is covering this with tests. I have 
[written code 
before](https://github.com/ewels/rich-click/blob/main/tests/test_no_rich_imports.py)
 which asserts the non-import of a module, and I also see this in some cases in 
Airflow (see 
[\[1\]](https://github.com/apache/airflow/blob/762ad78d7d89beb9786a5cabad9f7cb9d42c880a/airflow-core/tests/unit/serialization/test_serialized_objects.py#L4),
 
[\[2\]](https://github.com/apache/airflow/blob/762ad78d7d89beb9786a5cabad9f7cb9d42c880a/shared/secrets_masker/tests/secrets_masker/test_secrets_masker.py#L136)),
 however I'm not sure if that's done generally or if it makes sense here.
   
   * closes: #62362
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [ ] Yes (please specify the tool below)
   
   (No AI code is present in this commit. However, I did use code generated by 
Claude to help benchmark the performance improvement.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to