Re: flink-stop command fails with ` Operation not found under key`
Hello Huweihua (sorry for dup email, ended up clicking reply instead of reply-all), Thanks for reaching out and having a look at it. The job managers have the same logs I shared before, another example below. {"instant":{"epochSecond":1650920964,"nanoOfSecond":64800},"thread":"flink-akka.actor.default-dispatcher-15","level":"INFO","loggerName":"org.apache.flink.runtime.jobmaster.JobMaster","message":"Triggering stop-with-savepoint for job .","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":124,"threadPriority":5} {"instant":{"epochSecond":1650920964,"nanoOfSecond":77700},"thread":"Checkpoint Timer","level":"INFO","loggerName":"org.apache.flink.runtime.checkpoint.CheckpointCoordinator","message":"Triggering checkpoint 724 (type=SAVEPOINT_SUSPEND) @ 1650920964650 for job .","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":120,"threadPriority":5} *{"instant":{"epochSecond":1650920964,"nanoOfSecond":84600},"thread":"flink-rest-server-netty-worker-thread-1","level":"ERROR","loggerName":"org.apache.flink.runtime.rest.handler.job.savepoints.SavepointHandlers$SavepointStatusHandler","message":"Exception occurred in REST handler: Operation not found under key: org.apache.flink.runtime.rest.handler.job.AsynchronousJobOperationKey@382cf973","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":203,"threadPriority":5}*{"instant":{"epochSecond":1650920966,"nanoOfSecond":41700},"thread":"jobmanager-io-thread-1","level":"INFO","loggerName":"org.apache.flink.runtime.checkpoint.CheckpointCoordinator","message":"Completed checkpoint 724 for job (3268755 bytes, checkpointDuration=1443 ms, finalizationTime=324 ms).","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":105,"threadPriority":5} {"instant":{"epochSecond":1650920966,"nanoOfSecond":58100},"thread":"flink-akka.actor.default-dispatcher-16","level":"INFO","loggerName":"org.apache.flink.runtime.executiongraph.ExecutionGraph","message":"Source: line-items -> line-items-filter-valid-records -> Filter -> Map -> Filter -> Filter -> Filter -> Timestamps/Watermarks (5/8) (5ffc8a66e16b10c66a9eaef73bff538e) switched from RUNNING to FINISHED.","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":136,"threadPriority":5} . // Bunch of 'RUNNING to FINISHED.' messages {"instant":{"epochSecond":1650920967,"nanoOfSecond":95100},"thread":"flink-akka.actor.default-dispatcher-13","level":"INFO","loggerName":"org.apache.flink.runtime.executiongraph.ExecutionGraph","message":"Job JOB_NAME () switched from state RUNNING to FINISHED.","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":97,"threadPriority":5} {"instant":{"epochSecond":1650920967,"nanoOfSecond":95100},"thread":"flink-akka.actor.default-dispatcher-19","level":"INFO","loggerName":"org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager","message":"Clearing resource requirements of job ","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":139,"threadPriority":5} {"instant":{"epochSecond":1650920967,"nanoOfSecond":95100},"thread":"flink-akka.actor.default-dispatcher-13","level":"INFO","loggerName":"org.apache.flink.runtime.checkpoint.CheckpointCoordinator","message":"Stopping checkpoint coordinator for job .","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":97,"threadPriority":5} {"instant":{"epochSecond":1650920967,"nanoOfSecond":97400},"thread":"flink-akka.actor.default-dispatcher-13","level":"INFO","loggerName":"org.apache.flink.runtime.dispatcher.StandaloneDispatcher","message":"Job reached terminal state FINISHED.","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":97,"threadPriority":5} {"instant":{"epochSecond":1650920968,"nanoOfSecond":3200},"thread":"cluster-io-thread-1","level":"INFO","loggerName":"org.apache.flink.runtime.jobmanager.DefaultJobGraphStore","message":"Removed job graph from KubernetesStateHandleStore{configMapName='CONFIGMAP_NAME'}.","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":82,"threadPriority":5} {"instant":{"epochSecond":1650920968,"nanoOfSecond":3500},"thread":"flink-akka.actor.default-dispatcher-13","level":"INFO","loggerName":"org.apache.flink.runtime.jobmaster.JobMaster","message":"Stopping the JobMaster for job 'JOB_NAME'
Re: flink-stop command fails with ` Operation not found under key`
Stop-with-savepoint is an async operation, it will trigger savepoint at first, and trigger this operation in a cache. These error logs indicate that the operation is in the cache which can no longer be found. Could you provide more jobmanager.log to find when this operation is evicted from cache. > 2022年4月20日 上午12:21,Harsh Shah 写道: > > Hello Community, > > We are trying to adopt flink-stop instead of orchestrating the stop manually. > While using the command, it failed with error "Operation not found under key: > org.apache.flink.runtime.rest.handler.job.AsynchronousJobOperationKey@9fbfc15a". > the job eventually stopped but I wanted to see how I could execute the > command successfully ! Any help or direction to debug is appreciated. > > Information: > * Flink version: 1.13.1 > * Flink on kubernetes with HA enabled. Standalone cluster. > > command: "/bin/flink stop $JOBID" > Stack trace of the command: > WARNING: Illegal reflective access by > org.apache.hadoop.security.authentication.util.KerberosUtil > (file:/opt/flink/lib/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar) to method > sun.security.krb5.Config.getInstance() > WARNING: Please consider reporting this to the maintainers of > org.apache.hadoop.security.authentication.util.KerberosUtil > WARNING: Use --illegal-access=warn to enable warnings of further illegal > reflective access operations > WARNING: All illegal access operations will be denied in a future release > Suspending job "" with a savepoint. > > > The program finished with the following exception: > > org.apache.flink.util.FlinkException: Could not stop with a savepoint job > "". > at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:581) > at > org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002) > at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:569) > at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1069) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Unknown Source) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > at > org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132) > Caused by: java.util.concurrent.ExecutionException: > org.apache.flink.runtime.rest.util.RestClientException: > [org.apache.flink.runtime.rest.NotFoundException: Operation not found under > key: > org.apache.flink.runtime.rest.handler.job.AsynchronousJobOperationKey@9fbfc15a > at > org.apache.flink.runtime.rest.handler.async.AbstractAsynchronousOperationHandlers$StatusHandler.handleRequest(AbstractAsynchronousOperationHandlers.java:182) > at > org.apache.flink.runtime.rest.handler.job.savepoints.SavepointHandlers$SavepointStatusHandler.handleRequest(SavepointHandlers.java:219) > at > org.apache.flink.runtime.rest.handler.AbstractRestHandler.respondToRequest(AbstractRestHandler.java:83) > at > org.apache.flink.runtime.rest.handler.AbstractHandler.respondAsLeader(AbstractHandler.java:195) > at > org.apache.flink.runtime.rest.handler.LeaderRetrievalHandler.lambda$channelRead0$0(LeaderRetrievalHandler.java:83) > at java.base/java.util.Optional.ifPresent(Unknown Source) > at org.apache.flink.util.OptionalConsumer.ifPresent(OptionalConsumer.java:45) > at > org.apache.flink.runtime.rest.handler.LeaderRetrievalHandler.channelRead0(LeaderRetrievalHandler.java:80) > at > org.apache.flink.runtime.rest.handler.LeaderRetrievalHandler.channelRead0(LeaderRetrievalHandler.java:49) > at > org.apache.flink.shaded.netty4.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > > Jobs manager logs: > {"instant":{"epochSecond":1650384228,"nanoOfSecond":866181000},"thread":"flink-akka.actor.default-dispatcher-2","level":"INFO","loggerName":"org.apache.flink.runtime.jobmaster.JobMaster","message":"Triggering > stop-with-savepoint for job > .","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":28,"threadPriority":5} > {"instant":{"epochSecond":1650384229,"nanoOfSecond":30779000},"thread":"flink-rest-server-netty-worker-thread-1","level":"ERROR","loggerName":"org.apache.flink.runtime.rest.handler.job.savepoints.SavepointHandlers$SavepointStatusHandler","message":"Exception > occurred in REST handler: Operation not found under key: >
flink-stop command fails with ` Operation not found under key`
Hello Community, We are trying to adopt flink-stop instead of orchestrating the stop manually. While using the command, it failed with error "Operation not found under key: org.apache.flink.runtime.rest.handler.job.AsynchronousJobOperationKey@9fbfc15a". the job eventually stopped but I wanted to see how I could execute the command successfully ! Any help or direction to debug is appreciated. Information: * Flink version: 1.13.1 * Flink on kubernetes with HA enabled. Standalone cluster. *command*: "/bin/flink stop $JOBID" *Stack trace of the command:* *WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/opt/flink/lib/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar) to method sun.security.krb5.Config.getInstance()WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtilWARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operationsWARNING: All illegal access operations will be denied in a future releaseSuspending job "" with a savepoint. The program finished with the following exception:org.apache.flink.util.FlinkException: Could not stop with a savepoint job "". at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:581) at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002) at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:569) at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1069) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132) at java.base/java.security.AccessController.doPrivileged(Native Method) at java.base/javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.rest.util.RestClientException: [org.apache.flink.runtime.rest.NotFoundException: Operation not found under key: org.apache.flink.runtime.rest.handler.job.AsynchronousJobOperationKey@9fbfc15a at org.apache.flink.runtime.rest.handler.async.AbstractAsynchronousOperationHandlers$StatusHandler.handleRequest(AbstractAsynchronousOperationHandlers.java:182) at org.apache.flink.runtime.rest.handler.job.savepoints.SavepointHandlers$SavepointStatusHandler.handleRequest(SavepointHandlers.java:219) at org.apache.flink.runtime.rest.handler.AbstractRestHandler.respondToRequest(AbstractRestHandler.java:83) at org.apache.flink.runtime.rest.handler.AbstractHandler.respondAsLeader(AbstractHandler.java:195) at org.apache.flink.runtime.rest.handler.LeaderRetrievalHandler.lambda$channelRead0$0(LeaderRetrievalHandler.java:83) at java.base/java.util.Optional.ifPresent(Unknown Source) at org.apache.flink.util.OptionalConsumer.ifPresent(OptionalConsumer.java:45) at org.apache.flink.runtime.rest.handler.LeaderRetrievalHandler.channelRead0(LeaderRetrievalHandler.java:80) at org.apache.flink.runtime.rest.handler.LeaderRetrievalHandler.channelRead0(LeaderRetrievalHandler.java:49) at org.apache.flink.shaded.netty4.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)* *Jobs manager logs:* *{"instant":{"epochSecond":1650384228,"nanoOfSecond":866181000},"thread":"flink-akka.actor.default-dispatcher-2","level":"INFO","loggerName":"org.apache.flink.runtime.jobmaster.JobMaster","message":"Triggering stop-with-savepoint for job .","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":28,"threadPriority":5}{"instant":{"epochSecond":1650384229,"nanoOfSecond":30779000},"thread":"flink-rest-server-netty-worker-thread-1","level":"ERROR","loggerName":"org.apache.flink.runtime.rest.handler.job.savepoints.SavepointHandlers$SavepointStatusHandler","message":"Exception occurred in REST handler: Operation not found under key: org.apache.flink.runtime.rest.handler.job.AsynchronousJobOperationKey@9fbfc15a","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":95,"threadPriority":5}* *... # rest of things are success*