bobbai00 opened a new issue, #4576:
URL: https://github.com/apache/texera/issues/4576
### What happened?
When a HashJoin operator is executed through the sync execution API (`POST
/api/execution/{wid}/{cuid}/run`, used by `agent-service`), it consistently
returns an empty result. The build phase appears to finish, then the execution
terminates before the probe phase produces output.
Expected: HashJoin returns the joined rows, same as a normal frontend
execution.
### How to reproduce?
1. Build a workflow with a HashJoin (any two upstream sources + HashJoin).
2. Trigger execution through the agent (`executeOperator` tool) targeting
the HashJoin, or call `/api/execution/{wid}/{cuid}/run` directly with
`targetOperatorIds = [hashJoinId]`.
3. The response comes back with `success: true`, `state: "Completed"`, and
`outputTuples: 0`.
### Version
1.1.0-incubating (Pre-release/Master)
### Commit Hash (Optional)
af313e7160488f2ca5ae25bf88277b37a6bf5c08
### Possible cause
`HashJoinOpDesc.getPhysicalPlan` produces two PhysicalOps (`build`, `probe`)
sharing one logical id, separated by a blocking edge. The scheduler places them
in two regions and runs them sequentially.
`SyncExecutionResource.allTargetsCompleted` checks
`stats.operatorInfo.get(opId).operatorState == COMPLETED`, where `operatorInfo`
is produced by `WorkflowExecution.getAllRegionExecutionsStats`. That method
aggregates by `logicalOpId.id` over only the *registered* `RegionExecution`s.
Between "build region completed" and "probe region instantiated," only the
build PhysicalOp is registered, so `aggregateStates(Iterable(COMPLETED))`
returns `COMPLETED`. The sync resource then takes the `TargetResultsReady`
branch, kills the execution, and reads the probe's (still-empty) output storage.
The same race applies to any logical operator that compiles to multiple
PhysicalOps separated by a blocking edge (e.g. Aggregate). It does not surface
in normal frontend execution because the frontend waits for full workflow
termination instead of per-target completion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]