Re: [PR] Solution for #19968, python sdk justs stage up-to-date versions on the required files [beam]

via GitHub Tue, 07 Oct 2025 22:26:53 -0700


ksobrenat32 commented on code in PR #36249:
URL: https://github.com/apache/beam/pull/36249#discussion_r2412490524



##########
sdks/python/apache_beam/runners/portability/stager.py:
##########
@@ -780,7 +785,12 @@ def _populate_requirements_cache(
             platform_tag
         ])
       _LOGGER.info('Executing command: %s', cmd_args)
-      processes.check_output(cmd_args, stderr=processes.STDOUT)
+      output = processes.check_output(cmd_args, stderr=subprocess.STDOUT)
+      downloaded_packages = []
+      for line in output.decode('utf-8').split('\n'):

Review Comment:
   My initial understanding, as reflected in the current implementation, was 
that we only needed to stage newly downloaded or updated packages. I had 
assumed that since packages in the local cache are already available, staging 
them would be redundant.
   
   My logic is as follows:
   
   1. Identify all packages required by the requirements.txt file.
   2. Identify all other required PyPI packages.
   3. Download any of these packages that are not already in the cache.
   4. Stage only the newly downloaded packages.
   
   You've correctly pointed out that this means cached packages aren't staged. 
To make sure I get the fix right, could you help me understand the downstream 
process and why it's necessary to stage the cached packages as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Solution for #19968, python sdk justs stage up-to-date versions on the required files [beam]

Reply via email to