yzeng1618 commented on issue #10193:
URL: https://github.com/apache/seatunnel/issues/10193#issuecomment-3659158041

   Based on the information provided above, we have roughly determined the 
position for now.
   
   - Root cause:The error comes from 
FlinkSourceSplitEnumeratorContext#getJobIdForV15, which uses reflection to dig 
from SplitEnumeratorContext into Flink’s internal SchedulerBase to read the 
JobID, relying on internal field names like operatorCoordinatorContext, 
globalFailureHandler, and arg$1. Under Flink 1.17/1.18 with 
RecreateOnResetOperatorCoordinator, these internals/initialization order 
changed, so an intermediate object becomes null, causing an NPE wrapped as 
IllegalStateException("Initialize flink job-id failed") and logged as a WARN, 
leaving the jobId as null.
   
   - When is it triggered?
   
   First run is fine; on rolling upgrade / restore from checkpoint/savepoint 
(RecreateOnResetOperatorCoordinator.resetToCheckpoint), a new 
FlinkSourceSplitEnumeratorContext is created, and the reflection chain fails to 
obtain a valid internal object in this restore path, triggering the WARN.
   
   - Possible fixes for discussion
   
   1. Keep the reflection-based approach but make every step null-safe (no 
.get() or throwing IllegalStateException); if anything is missing, simply 
return null, logging a single concise WARN stating that only event jobId is 
affected, not checkpoint/savepoint restore.
   2. In the extraction logic, detect whether OperatorCoordinator.Context class 
name contains RecreateOnResetOperatorCoordinator / QuiesceableContext etc.; if 
it is a restore/upgrade context, immediately return null and skip deeper 
reflection, while keeping the normal path unchanged. This fully avoids NPEs on 
upgrade/restore.
   @
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to