Yicong-Huang opened a new pull request, #4680:
URL: https://github.com/apache/texera/pull/4680
### What changes were proposed in this PR?
Remove the `Thread.sleep(1000)` workaround inside the result-reading block
of \`DataProcessingSpec\` and \`ReconfigurationSpec\`. The sleep was annotated
with a TODO calling it a workaround for "the issue of reporting `completed`
status too early" — the current engine path is synchronous end-to-end, so the
workaround is dead code.
```diff
.map(terminalOpId => {
- //TODO: remove the delay after fixing the issue of reporting
"completed" status too early.
- Thread.sleep(1000)
val uri = getResultUriByLogicalPortId(...)
...
})
```
The synchronous path that emits \`COMPLETED\`:
1. \`DataProcessor.outputOneTuple\` handles \`FinalizePort\` →
\`outputManager.closeOutputStorageWriterIfNeeded(portId)\`
2. \`closeOutputStorageWriterIfNeeded\` puts a terminate signal on the
writer thread's queue and \`join()\`s it
3. The writer thread, on terminate, calls \`IcebergTableWriter.close()\` →
\`flushBuffer()\` → \`table.newAppend().appendFile(dataFile).commit()\`
4. Only then does \`DataProcessor\` fire \`portCompleted\` to the controller
5. After all ports complete, controller emits
\`ExecutionStateUpdate(COMPLETED)\` to the client
6. Read side does \`IcebergDocument.get()\` → \`seekToUsableFile()\` →
\`table.refresh()\` before scan
### Any related issues, documentation, discussions?
Closes #4679.
### How was this PR tested?
Ran the two specs together with \`-T 1\` (sequential) on a clean local
checkout, 5 consecutive runs:
\`\`\`
sbt 'WorkflowExecutionService / Test / testOnly \\
org.apache.texera.amber.engine.e2e.DataProcessingSpec \\
org.apache.texera.amber.engine.e2e.ReconfigurationSpec'
run 1: 21/21 passed in 39.2 s
run 2: 21/21 passed in 39.5 s
run 3: 21/21 passed in 43.2 s
run 4: 21/21 passed in 37.3 s
run 5: 21/21 passed in 36.9 s
\`\`\`
Compared to the same two specs with the sleep on the same machine:
| spec | before | after | delta |
|---|---|---|---|
| DataProcessingSpec | 36.5 s | 16.9 s | -19.6 s |
| ReconfigurationSpec | 27.2 s | 18.7 s | -8.5 s |
DataProcessingSpec saves more than 16 × 1 s because the sleep was inside a
per-terminal-port \`.map(...)\` block — workflows with multiple terminal ports
paid the cost more than once per test.
### Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7, 1M context)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]