aglinxinyuan commented on code in PR #4206:
URL: https://github.com/apache/texera/pull/4206#discussion_r3406925921
##########
amber/src/main/scala/org/apache/texera/amber/engine/architecture/scheduling/RegionExecutionCoordinator.scala:
##########
@@ -576,8 +576,17 @@ class RegionExecutionCoordinator(
region.getOperator(outputPortId.opId).outputPorts(outputPortId.portId)._3
val schema =
schemaOptional.getOrElse(throw new IllegalStateException("Schema is
missing"))
- DocumentFactory.createDocument(resultURI, schema)
- DocumentFactory.createDocument(stateURI, State.schema)
+ // LoopEnd operators may re-execute the region multiple times; on
+ // subsequent iterations the result/state documents already exist,
+ // and `createDocument` (overrideIfExists=true) would clobber them.
+ // Skip the create call when the document is already there.
+ val isLoopEndRegion = region.getOperators.exists(_.isLoopEnd)
Review Comment:
Both halves now in the branch.
**Rename** (landed in 3d4f15b445): `isLoopEnd` →
`reusesOutputStorageOnReExecution` on `PhysicalOp` (+
`withReusesOutputStorageOnReExecution`), the `RegionExecutionCoordinator`
guard, `LoopEndOpDesc`, and the specs.
**Test for the skip-create branch** (043ed302c3): I pulled the
create-or-reuse decision out of the private `createOutputPortStorageObjects`
into a pure companion method:
```scala
def provisionOutputDocument(uri, reuseExistingStorage, documentExists,
createDocument): Boolean
```
with the storage ops injected, so the decision is unit-testable without an
iceberg backend or a live region. `RegionOutputProvisioningSpec` pins the four
cases with a `createDocument` spy — directly the "pre-create, then check reused
not recreated" you asked for:
* **reuse + existing document → NOT recreated** (createDocument never
called) — accumulated loop output survives the re-run;
* reuse + no document yet → created (first iteration);
* no-reuse + existing → recreated/overwritten (fresh every run);
* no-reuse + none → created;
* plus: no-reuse short-circuits and never even probes `documentExists`.
On verification: I confirmed the production change compiles — the only
remaining `amber` compile errors are the pre-existing `PveManager` /
`virtual_environments` JOOQ issue from #5577 (unrelated; my local DB isn't
migrated with that table, CI builds against a fresh schema). The new spec is a
pure ScalaTest unit with no iceberg/actor dependency, so it runs in the normal
amber test job.
(For context on why these kept showing as "cited but not in the branch": the
branch has been force-rebased a few times, which dropped the earlier commits —
re-applied now on the current tip.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]