ksobrenat32 commented on code in PR #36249:
URL: https://github.com/apache/beam/pull/36249#discussion_r2412490524
##########
sdks/python/apache_beam/runners/portability/stager.py:
##########
@@ -780,7 +785,12 @@ def _populate_requirements_cache(
platform_tag
])
_LOGGER.info('Executing command: %s', cmd_args)
- processes.check_output(cmd_args, stderr=processes.STDOUT)
+ output = processes.check_output(cmd_args, stderr=subprocess.STDOUT)
+ downloaded_packages = []
+ for line in output.decode('utf-8').split('\n'):
Review Comment:
My initial understanding, as reflected in the current implementation, was
that we only needed to stage newly downloaded or updated packages. I had
assumed that since packages in the local cache are already available, staging
them would be redundant.
My logic is as follows:
1. Identify all packages required by the requirements.txt file.
2. Identify all other required PyPI packages.
3. Download any of these packages that are not already in the cache.
4. Stage only the newly downloaded packages.
You've correctly pointed out that this means cached packages aren't staged.
To make sure I get the fix right, could you help me understand the downstream
process and why it's necessary to stage the cached packages as well?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]