Yicong-Huang opened a new pull request, #4717:
URL: https://github.com/apache/texera/pull/4717
### What changes were proposed in this PR?
Fix the root cause of #4705: lift `ExecutorManager`'s tmp module-name
counter from a per-instance attribute to a process-wide `itertools.count(1)`,
so generated module names (`udf-vN`) never collide across instances. The `if
module_name in sys.modules:` recovery branch (clear + reload) becomes
unreachable and is removed.
```diff
class ExecutorManager:
+ _module_name_counter = itertools.count(1)
+
def __init__(self):
self.executor: Optional[Operator] = None
self.operator_module_name: Optional[str] = None
- self.executor_version: int = 0 # incremental only
def gen_module_file_name(self):
- self.executor_version += 1
- module_name = f"udf-v{self.executor_version}"
+ module_name = f"udf-v{next(ExecutorManager._module_name_counter)}"
file_name = f"{module_name}.py"
return module_name, file_name
def load_executor_definition(self, code):
module_name, file_name = self.gen_module_file_name()
with self.fs.open(file_name, "w") as file:
file.write(code)
- if module_name in sys.modules:
- executor_module = importlib.import_module(module_name)
- executor_module.__dict__.clear()
- executor_module.__dict__["__name__"] = module_name
- executor_module = importlib.reload(executor_module)
- else:
- executor_module = importlib.import_module(module_name)
+ executor_module = importlib.import_module(module_name)
self.operator_module_name = module_name
...
```
### Why this matters
In production, a Python worker is its own subprocess and holds exactly one
`ExecutorManager`; a single instance's counter is monotonic across
`update_executor` calls, so `udf-v1`, `udf-v2`, … already never collided. The
clear+reload branch was effectively test-only dead code.
But under pytest, dozens of fixtures construct fresh `ExecutorManager`s in
the same Python process. Each one started at `executor_version = 0` and
immediately tried to load `udf-v1`. The second instance found `udf-v1` already
in `sys.modules` from `test_executor_manager.py`'s `SAMPLE_OPERATOR_CODE`
(which defines `class TestOperator`). The clear+reload recovery silently failed
on Python 3.11 specifically and returned the stale `TestOperator` to whichever
spec needed `CountBatchOperator`. The visible symptom alternated between
`AttributeError: 'TestOperator' object has no attribute 'count'` and a 1-minute
pytest hang.
A process-wide counter makes module names unique across every instance in
the same interpreter, so the `if module_name in sys.modules` branch is never
taken in either production or tests, and the 3.11-specific reload path is never
exercised.
### Out of scope
- Lifecycle cleanup of `sys.path` / `sys.modules` in
`ExecutorManager.close()`. Each `udf-vN` module still survives the manager that
loaded it, but the worker process is short-lived and `/var/tmp` is cleaned by
the OS.
### Any related issues, documentation, discussions?
Closes #4705. Indirectly unblocks the recurring backport flakes seen on PRs
#4636, #4648, and others (where the 3.11 leg either errored on `TestOperator`
or hung in `DataProcessingSpec`).
### How was this PR tested?
```
sbt -no-colors -batch \
'WorkflowExecutionService / Test / testOnly \
org.apache.texera.amber.engine.architecture.managers.test_executor_manager \
org.apache.texera.amber.engine.runnables.test_main_loop'
# 23 passed
```
The combined run that previously interleaved `TestOperator` from
`test_executor_manager.py` into `test_main_loop.py`'s fixtures now passes
cleanly: every spec sees its own freshly-imported module.
`test_executor_manager.py` was updated for the new semantics:
- `test_initialization` no longer asserts `executor_version == 0` (the
attribute is gone).
- `test_accept_python_language_*_operator` now assert
`operator_module_name.startswith("udf-v")` instead of `== "udf-v1"` — the
absolute value depends on prior tests in the same session.
- `test_gen_module_file_name_increments` asserts relative monotonicity
rather than absolute values.
### Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7, 1M context)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]