Re: Retrieving Flink job ID/YARN Id programmatically
I looked at the code, and StreamExecutionEnvironment#getStreamGraph#getJobGraph#getJobID is generating a random ID unrelated to the actual ID used. Is there any way to fetch the real ID at runtime? Use case: fetch most recent checkpoint from stable storage for automated restarts. Most recent checkpoint has form ".../checkpoints/flink_app_id/chk-123" On Thu, Nov 21, 2019 at 11:28 AM Lei Nie wrote: > > This does not get the correct id: > StreamExecutionEnvironment#getStreamGraph#getJobGraph#getJobID = > eea5abc21dd8743a4090f4a3a660f9e8 > Actual job ID (from webUI): 1357d21be640b6a3b8a86a063f4bba8a > > > > On Thu, Nov 7, 2019 at 6:56 PM vino yang wrote: > > > > Hi Lei Nie, > > > > You can use > > `StreamExecutionEnvironment#getStreamGraph#getJobGraph#getJobID` to get the > > job id. > > > > Best, > > Vino > > > > Lei Nie 于2019年11月8日周五 上午8:38写道: > >> > >> Hello, > >> I am currently executing streaming jobs via StreamExecutionEnvironment. Is > >> it possible to retrieve the Flink job ID/YARN ID within the context of a > >> job? I'd like to be able to automatically register the job such that > >> monitoring jobs can run (REST api requires for example job id). > >> > >> Thanks
Re: Retrieving Flink job ID/YARN Id programmatically
This does not get the correct id: StreamExecutionEnvironment#getStreamGraph#getJobGraph#getJobID = eea5abc21dd8743a4090f4a3a660f9e8 Actual job ID (from webUI): 1357d21be640b6a3b8a86a063f4bba8a On Thu, Nov 7, 2019 at 6:56 PM vino yang wrote: > > Hi Lei Nie, > > You can use `StreamExecutionEnvironment#getStreamGraph#getJobGraph#getJobID` > to get the job id. > > Best, > Vino > > Lei Nie 于2019年11月8日周五 上午8:38写道: >> >> Hello, >> I am currently executing streaming jobs via StreamExecutionEnvironment. Is >> it possible to retrieve the Flink job ID/YARN ID within the context of a >> job? I'd like to be able to automatically register the job such that >> monitoring jobs can run (REST api requires for example job id). >> >> Thanks
StreamingFileSink duplicate data
Hello, I would like clarification on the StreamingFileSink, thank you. >From my testing, it seems that resuming job from checkpoint does *not* also restore the rolling part counter. E.g, job may have stopped with last file: *part-6-71* But when resuming from most recent checkpoint: *part-6-89* (There is unexplained gap). This is a problem if I am having an issue with my job, and need to roll back *more than one checkpoint*. After rolling back to the 4th last checkpoint, e.g, the data will be written into *different part file names*, causing duplication. - For example, checkpoints: *chk-17, chk-18, chk-19, chk-20* Original data: *part-1-5, part-1-6, part-1-7* Rollback to *chk-17*, which writes *part-1-18*, but with the same data as *part-1-5*! This is duplicate. -- Am I correct? How to avoid this?
Retrieving Flink job ID/YARN Id programmatically
Hello, I am currently executing streaming jobs via StreamExecutionEnvironment. Is it possible to retrieve the Flink job ID/YARN ID within the context of a job? I'd like to be able to automatically register the job such that monitoring jobs can run (REST api requires for example job id). Thanks