Mathdee opened a new pull request, #37379:
URL: https://github.com/apache/beam/pull/37379
**Issue:**
As described in #37370, running pipelines with large options on Dataflow
causes `fork/exec /usr/local/bin/python: argument list too long`.
This occurs because the bootloader passes the complete JSON config via
`PIPELINE_OPTIONS` environment variable --> exceeds the OS `ARG_MAX` limit.
**The Fix:**
This change uses an identical pattern from the Go SDK (Issue #27839, Commit
e31e885) to Python.
1. **boot.go(file):** Writing the pipeline options to a temp file
(`pipeline_options.json`) and sets the `PIPELINE_OPTIONS_FILE` environment
variable.
2. **sdk_worker_main.py(file):** This then checks for
`PIPELINE_OPTIONS_FILE` and loads the configs from the disk if present.
**Outcome:**
* Fixes #37370
* Feature Parity with Java and Go SDK's for handling large pipeline options
* Verified with unit tests that ensured priority of file-based loading.
-----
- [X] Mention the appropriate issue in your description (for example:
`addresses #123`), if applicable. This will automatically add a link to the
pull request in the issue. If you would like the issue to automatically close
on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]