potiuk commented on issue #67515:
URL: https://github.com/apache/airflow/issues/67515#issuecomment-4559994135

   Also another thing that you might learn from next time when you use Claude 
and do not understand what it does. 
   
   The basic problem with your script (the one you Clauded) is that it measures 
importing **all** modules from all the packages in the provider. Which is 
completely nuts. No wonder that you get google as biggest offender - because it 
has the biggest number of modules and packages. This is **NOT** what is 
happening in Airflow when Dag is parsed and Task is executed. Not even close.
   
   Those are google provider stats:
   
   ```
   Google provider (providers/google/src/):
   - 49 packages (directories with __init__.py)
   - 279 modules (.py files excluding __init__.py)
   - 328 total .py files
   ```
   
   Yes... If you import 279 modules (which your AI -generated script does) - it 
can take a LOT of time.
   
   But if your Dag does:
   
   ```
   from airflow.providers.google.cloud.dataproc import 
DataprocStartClusterOperator
   ```
   
   It will load very few of those modules - all of them needed to get the right 
types, validate them, import classes that are needed as internal representation 
of classess needed to construct the objects etc. etc.
   
   Your measurements are not measuring savings you can get here - all those 
imports are most likely needed anyway to create DataprocStartClusterOperator.
   
   Your measurement actually show something different. They are not even 
showing the effect of making some imports lazy - because you have not checked 
which of those imports actually **can** be made lazy.
   
   You reports basically show what savings you can get if you load "all modules 
and all packages from the providers" vs. "not loading them at all". None of 
this is what is even close to any realistic things - either done by Airflow 
currently, nor anything that you can achieve by lazy imports. 
   
   So I suggest you go back to the drawing board and ask your Claude to 
generate real measurement if you want to advocate for lazy loading idea for 
providers.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to