kunwp1 opened a new issue, #5547:
URL: https://github.com/apache/texera/issues/5547

   ### Task Summary
   
   The JVM launches each Python worker in `PythonWorkflowWorker` by building a 
long list of **positional** command-line arguments, and 
`texera_run_python_worker.py` unpacks them positionally (`(…, a, b, c, …) = 
sys.argv`, then forwards them into `StorageConfig.initialize(...)`). That list 
has grown to around 20 arguments.
   
   Because the two sides agree only by **index**, adding, removing, or 
reordering one argument means editing both in lockstep. If they ever drift, 
arguments are silently misassigned (a value lands in the wrong field) instead 
of failing loudly.
   
   Surfaced in review of #5280, which added the 20th positional argument (the 
large-binary base URI). It follows the existing convention, so it is fine as-is 
— this is a maintainability/robustness follow-up, not a bug in that PR.
   
   **Root cause:** worker startup config is passed by argv position, with no 
names.
   
   ```
   Before:  JVM  Seq(a1, a2, …, a20)  ──by position──▶  py  (a1, …, a20) = 
sys.argv
            add/reorder one → silent misalignment if the two sides drift
   After:   JVM  {"endpoint": …, "largeBinaryBaseUri": …}  ──by name──▶  py  
cfg["largeBinaryBaseUri"]
            add a field → no positional coupling; a missing/renamed key fails 
clearly
   ```
   
   **Proposed:** pass startup config by name — e.g. a single JSON object, or 
`argparse` `--key value` flags — so the two sides agree by key, and a missing 
field raises a clear error.
   
   **Affected:**
   - 
`amber/src/main/scala/org/apache/texera/amber/engine/architecture/pythonworker/PythonWorkflowWorker.scala`
 (builds the arg list)
   - `amber/src/main/python/texera_run_python_worker.py` (unpacks `sys.argv`)
   - `amber/src/main/python/core/storage/storage_config.py` 
(`StorageConfig.initialize` positional params)
   
   ### Task Type
   - [x] Refactor / Cleanup
   - [ ] DevOps / Deployment / CI
   - [ ] Testing / QA
   - [ ] Documentation
   - [ ] Performance
   - [ ] Other
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to