kunwp1 opened a new issue, #5547:
URL: https://github.com/apache/texera/issues/5547
### Task Summary
The JVM launches each Python worker in `PythonWorkflowWorker` by building a
long list of **positional** command-line arguments, and
`texera_run_python_worker.py` unpacks them positionally (`(…, a, b, c, …) =
sys.argv`, then forwards them into `StorageConfig.initialize(...)`). That list
has grown to around 20 arguments.
Because the two sides agree only by **index**, adding, removing, or
reordering one argument means editing both in lockstep. If they ever drift,
arguments are silently misassigned (a value lands in the wrong field) instead
of failing loudly.
Surfaced in review of #5280, which added the 20th positional argument (the
large-binary base URI). It follows the existing convention, so it is fine as-is
— this is a maintainability/robustness follow-up, not a bug in that PR.
**Root cause:** worker startup config is passed by argv position, with no
names.
```
Before: JVM Seq(a1, a2, …, a20) ──by position──▶ py (a1, …, a20) =
sys.argv
add/reorder one → silent misalignment if the two sides drift
After: JVM {"endpoint": …, "largeBinaryBaseUri": …} ──by name──▶ py
cfg["largeBinaryBaseUri"]
add a field → no positional coupling; a missing/renamed key fails
clearly
```
**Proposed:** pass startup config by name — e.g. a single JSON object, or
`argparse` `--key value` flags — so the two sides agree by key, and a missing
field raises a clear error.
**Affected:**
-
`amber/src/main/scala/org/apache/texera/amber/engine/architecture/pythonworker/PythonWorkflowWorker.scala`
(builds the arg list)
- `amber/src/main/python/texera_run_python_worker.py` (unpacks `sys.argv`)
- `amber/src/main/python/core/storage/storage_config.py`
(`StorageConfig.initialize` positional params)
### Task Type
- [x] Refactor / Cleanup
- [ ] DevOps / Deployment / CI
- [ ] Testing / QA
- [ ] Documentation
- [ ] Performance
- [ ] Other
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]