LuciferYang commented on PR #364:
URL:
https://github.com/apache/doris-spark-connector/pull/364#issuecomment-4852382470
@JNSimba Thanks for the reply. Let me address the third point directly,
because I think there's a misunderstanding about how this PR handles
`RUNNING`/`PREPARE`.
**This PR never treats a `RUNNING`/`PREPARE` (or `PRECOMMITTED`) label as
success.** A reused label is only treated as an idempotent no-op when the
backend reports `ExistingJobStatus == FINISHED`:
```java
static boolean isAlreadyCommitted(boolean labelReused, String status, String
existingJobStatus) {
return labelReused
&& LABEL_ALREADY_EXISTS_STATUS.equalsIgnoreCase(status)
&& EXISTING_JOB_FINISHED.equalsIgnoreCase(existingJobStatus);
}
```
Mapping this against FE's `LabelAlreadyUsedException`:
- `PREPARE` / `UNKNOWN` → `"RUNNING"`
- `PRECOMMITTED` → `"PRECOMMITTED"`
- `COMMITTED` / `VISIBLE` → `"FINISHED"`
So when the reused label is still `RUNNING`/`PREPARE`, `isAlreadyCommitted`
returns `false`, we fall through to `throw StreamLoadException`, and it stays a
**retriable failure** — never masked as success. The retry then waits the full
configured interval before re-issuing under the same label, giving the original
txn time to advance to `FINISHED` (→ deduped as a no-op) or to be aborted/timed
out on the backend (→ the label is released, so the same-label retry is then
accepted as a normal, non-duplicate load). If the original txn is still
unresolved when the retry budget is exhausted, the write fails loudly with that
`StreamLoadException` rather than silently double-writing. So we are **not**
relying on "reusing a label stuck in PREPARE" to succeed — that case surfaces
as a failure, exactly as it should.
On `flink #523`: it targets the same **auto-commit** path, not 2PC (its own
log shows `"TwoPhaseCommit": "false"` with `ExistingJobStatus: RUNNING` / txn
`status [PREPARE]`). It takes the opposite approach — mint a **new** label on
every retry to sidestep a stuck-label collision. The trade-off is that it gives
up dedup: a batch that already committed under the old label gets re-loaded
under the new one and written twice. This PR instead reuses the label and
treats a `FINISHED` collision as an idempotent no-op, which is what makes the
auto-commit retry exactly-once. In fact the stuck-`PREPARE` case #523 worried
about is the very `RUNNING`/`PREPARE` case handled above: here it stays a
retriable failure and resolves once the original txn advances or aborts,
instead of being renamed away. (2PC never reuses labels at all —
`onBatchFailed()` is a no-op when 2PC is enabled, since its exactly-once
guarantee comes from the precommit/commit protocol; label reuse is strictly for
the auto
-commit path.)
On abort-before-reuse: for an auto-commit stream load the connector holds no
txn id, so there's nothing to abort from the client side; and issuing an abort
against a label whose load may have already committed would be exactly the
double-write we're trying to avoid. So abort-before-reuse isn't applicable —
and, given the `FINISHED`-only guard above, isn't needed.
**On using a dedicated table for the ITCase:** `testFailoverForRetry`
already uses its own table (`initializeTable` does `DROP TABLE IF EXISTS` +
`CREATE`), so there's no cross-test label collision to fix there. What that
test exercises is the **reused-label retry path itself**: with `batch.size=1`
and `address varchar(4)`, an over-long row (`123456` / `12345678`) is rejected,
retried under the same label, and after `ALTER TABLE ... MODIFY COLUMN` widens
the column the retry lands correctly — i.e. reusing a label across a retry
doesn't corrupt the result. Giving each run a unique table wouldn't make that
any stronger; it would just sidestep the very path under test. The exactly-once
*dedup* semantics — a reused label whose prior load reports `FINISHED` is
treated as an idempotent no-op, while `RUNNING`/`PRECOMMITTED` and a
fresh-label collision are not — are covered directly by unit tests
(`alreadyCommittedOnlyForReusedLabelThatActuallyFinished`,
`definiteRejectKeepsAFreshLa
bel`, `freshLabelCollisionIsADefiniteReject`), which is the right level to
assert them deterministically.
Happy to walk through any specific state transition you're worried about.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]