aglinxinyuan commented on code in PR #4560:
URL: https://github.com/apache/texera/pull/4560#discussion_r3165769809
##########
amber/src/main/python/core/runnables/data_processor.py:
##########
@@ -49,20 +49,17 @@ def run(self) -> None:
with self._context.tuple_processing_manager.context_switch_condition:
self._context.tuple_processing_manager.context_switch_condition.wait()
self._running.set()
- self._switch_context()
Review Comment:
Good catch — I had the same hesitation, since `_post_switch_context_checks`
runs `_check_and_process_debug_command` and is what makes a debug command
queued during worker setup fire before any data is processed.
The reason the post-init `_switch_context()` cannot stay as-is: with it,
MainLoop's first `_switch_context()` is consumed by the round-trip with this
init switch (DataProc wakes from line 50, sets `_running`, notifies MainLoop
and waits). At that point the executor for the queued first input has not run
yet, so MainLoop returns from `process_input_state`'s switch reading
`current_output_state == None` and silently drops the first state/tuple/marker
— which is exactly the original #4421 / #4559 bug.
Fix in `d890004c11`: replace the post-init `_switch_context()` with a direct
`_post_switch_context_checks()` call. DataProc still runs the same checks
(debug command, console messages, exception) before entering the while loop,
but does it without consuming a notify/wait round-trip:
```python
with cond:
cond.wait() # initial wait, woken by MainLoop's first switch
self._running.set()
self._post_switch_context_checks() # was self._switch_context()
while self._running.is_set():
...
```
This preserves the "debug command before first task" behavior you flagged
while letting the first input get processed in its own cycle. Verified by
running all 8 tests in `test_main_loop.py` (passes); a
debug-command-before-first-data scenario isn't covered by existing tests, so I
left a comment in `run()` explaining the intent.
Does that match what you had in mind?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]