LuciferYang commented on PR #364:
URL: 
https://github.com/apache/doris-spark-connector/pull/364#issuecomment-4852382470

   @JNSimba Thanks for the reply. Let me address the third point directly, 
because I think there's a misunderstanding about how this PR handles 
`RUNNING`/`PREPARE`.
   
   **This PR never treats a `RUNNING`/`PREPARE` (or `PRECOMMITTED`) label as 
success.** A reused label is only treated as an idempotent no-op when the 
backend reports `ExistingJobStatus == FINISHED`:
   
   ```java
   static boolean isAlreadyCommitted(boolean labelReused, String status, String 
existingJobStatus) {
       return labelReused
               && LABEL_ALREADY_EXISTS_STATUS.equalsIgnoreCase(status)
               && EXISTING_JOB_FINISHED.equalsIgnoreCase(existingJobStatus);
   }
   ```
   
   Mapping this against FE's `LabelAlreadyUsedException`:
   - `PREPARE` / `UNKNOWN` → `"RUNNING"`
   - `PRECOMMITTED` → `"PRECOMMITTED"`
   - `COMMITTED` / `VISIBLE` → `"FINISHED"`
   
   So when the reused label is still `RUNNING`/`PREPARE`, `isAlreadyCommitted` 
returns `false`, we fall through to `throw StreamLoadException`, and it stays a 
**retriable failure** — never masked as success. The retry then waits the full 
configured interval before re-issuing under the same label, giving the original 
txn time to advance to `FINISHED` (→ deduped as a no-op) or to be aborted/timed 
out on the backend (→ the label is released, so the same-label retry is then 
accepted as a normal, non-duplicate load). If the original txn is still 
unresolved when the retry budget is exhausted, the write fails loudly with that 
`StreamLoadException` rather than silently double-writing. So we are **not** 
relying on "reusing a label stuck in PREPARE" to succeed — that case surfaces 
as a failure, exactly as it should.
   
   On `flink #523`: it targets the same **auto-commit** path, not 2PC (its own 
log shows `"TwoPhaseCommit": "false"` with `ExistingJobStatus: RUNNING` / txn 
`status [PREPARE]`). It takes the opposite approach — mint a **new** label on 
every retry to sidestep a stuck-label collision. The trade-off is that it gives 
up dedup: a batch that already committed under the old label gets re-loaded 
under the new one and written twice. This PR instead reuses the label and 
treats a `FINISHED` collision as an idempotent no-op, which is what makes the 
auto-commit retry exactly-once. In fact the stuck-`PREPARE` case #523 worried 
about is the very `RUNNING`/`PREPARE` case handled above: here it stays a 
retriable failure and resolves once the original txn advances or aborts, 
instead of being renamed away. (2PC never reuses labels at all — 
`onBatchFailed()` is a no-op when 2PC is enabled, since its exactly-once 
guarantee comes from the precommit/commit protocol; label reuse is strictly for 
the auto
 -commit path.)
   
   On abort-before-reuse: for an auto-commit stream load the connector holds no 
txn id, so there's nothing to abort from the client side; and issuing an abort 
against a label whose load may have already committed would be exactly the 
double-write we're trying to avoid. So abort-before-reuse isn't applicable — 
and, given the `FINISHED`-only guard above, isn't needed.
   
   **On using a dedicated table for the ITCase:** `testFailoverForRetry` 
already uses its own table (`initializeTable` does `DROP TABLE IF EXISTS` + 
`CREATE`), so there's no cross-test label collision to fix there. What that 
test exercises is the **reused-label retry path itself**: with `batch.size=1` 
and `address varchar(4)`, an over-long row (`123456` / `12345678`) is rejected, 
retried under the same label, and after `ALTER TABLE ... MODIFY COLUMN` widens 
the column the retry lands correctly — i.e. reusing a label across a retry 
doesn't corrupt the result. Giving each run a unique table wouldn't make that 
any stronger; it would just sidestep the very path under test. The exactly-once 
*dedup* semantics — a reused label whose prior load reports `FINISHED` is 
treated as an idempotent no-op, while `RUNNING`/`PRECOMMITTED` and a 
fresh-label collision are not — are covered directly by unit tests 
(`alreadyCommittedOnlyForReusedLabelThatActuallyFinished`, 
`definiteRejectKeepsAFreshLa
 bel`, `freshLabelCollisionIsADefiniteReject`), which is the right level to 
assert them deterministically.
   
   Happy to walk through any specific state transition you're worried about.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to