wjddn279 commented on PR #65943:
URL: https://github.com/apache/airflow/pull/65943#issuecomment-4439175172

   @diogosilva30 
   Interesting. I read through your analysis and it looks like a correct 
explanation.
   
   I have a question — **how often does this error occur? Does the user's code 
change frequently?**
   
   If I understand correctly, the issue is that among the multiple threads 
running in the edge worker, if a fork is performed while another thread (one 
not performing the fork) is in the middle of an import, it can cause problems 
in the import system. If that's the case, the problem would arise when a new 
module is being imported in another thread. 
   
   Even with lazy loading, since the edge worker follows a fixed footprint, I'm 
curious whether new module loading happens frequently. Since a module that has 
already been imported once should no longer be a source of the problem, I would 
expect the frequency to gradually decrease over time.
   
   Applying the same approach that exists in Celery seems like a good idea. 
However, the trade-offs should be carefully understood. With airflow, simply 
loading the airflow module alone loads 100mb of libraries. The existing fork 
approach significantly reduces PSS through COW, but this approach causes memory 
to increase linearly with the number of concurrent executions. And slow loading 
is a bonus downside.
   
   below is checking the PSS usage when just `import 
airflow.sdk.execution_time.execute_workload` in subprocess
   
   ```
   === A. subprocess.Popen (fresh interpreter, no sharing) === parent pid=99  
RSS=118.4 MiB  PSS=98.3 MiB
        pid    RSS MiB    PSS MiB   Private MiB
        100      117.9       97.8          95.8
        101      117.9       97.8          95.8
        102      117.9       97.8          95.8
   
   === B. multiprocessing.Process (fork, COW with parent) === parent pid=99  
RSS=118.5 MiB  PSS=30.3 MiB
        pid    RSS MiB    PSS MiB   Private MiB
        110       99.7       12.7           4.0
        111       99.7       12.7           4.0
        112       99.7       12.7           4.0
   
   ```
   
     As jens mentioned, the problem is clear enough that it could have been 
reported by now, so it's also a bit curious that it hasn't been.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to