github-actions[bot] commented on PR #61046: URL: https://github.com/apache/doris/pull/61046#issuecomment-4001683799
## Code Review Summary **PR #61046: [fix](job) fix routine load job incorrectly cancelled on FE restart after swap table** ### Overall Assessment: No issues found. The fix is correct and well-targeted. ### Review Details **Problem & Root Cause:** The PR correctly identifies that gsonPostProcess() re-parses the original CREATE ROUTINE LOAD SQL (origStmt) which contains the table name at creation time. After a REPLACE TABLE (swap) + DROP TABLE sequence, that original name no longer exists, causing checkDBTable() to throw AnalysisException and the job to be incorrectly cancelled. **Fix Analysis:** - The fix resolves the current table name from the catalog using the persisted tableId (which is stable across renames/swaps) before calling validate(). This is the correct approach since tableId is already properly maintained through swap operations. - The setTableName() call updates the field used by both checkDBTable() (line 434) and checkLoadProperties() (line 508), so both table lookups will use the corrected name. - Guard conditions (!isMultiTable && tableId != 0) are appropriate - multi-table jobs don't have a single target table, and tableId == 0 already implies multi-table mode. - The ifPresent() pattern correctly handles the case where the table has genuinely been deleted - setTableName won't be called, and validate() will properly fail with the stale name. - The defensive catch (Exception ignored) block is acceptable since getDb() and getTable() return Optionals and shouldn't throw, but it provides a safety net. - DB resolution uses dbId (not name) both in the existing code (line 1978) and the new code (line 1995), so DB rename scenarios are also handled correctly. **Test:** The regression test properly reproduces the exact scenario described in the bug: create table, create routine load, swap, drop, restart FE, then verify the job state is not CANCELLED. The test uses the docker/cluster framework with FE restart, which is the correct approach for testing gsonPostProcess() replay behavior. ### Critical Checkpoints: - **Correctness:** The fix correctly addresses the root cause by resolving table name from ID before validation. No logical errors found. - **Backward compatibility:** The fix only activates when tableId != 0 and a valid table is found. Existing behavior is preserved for all other cases. - **Thread safety:** gsonPostProcess() runs during single-threaded image replay; no concurrency concerns. - **Error handling:** Proper fallback behavior - if table lookup fails for any reason, the original validation path still runs. - **Test coverage:** Regression test directly reproduces the reported scenario. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
