Re: Has the "Stop" Button next to the "Cancel" Button been removed in Flink's 1.9 web interface?
Thank you! One last question regarding Gordons response. When a pipeline stops consuming and cleanly shuts down and there is no error during that process, and then it gets started again and uses the last committed offset in Kafka - there should be no data loss - or am I missing something? In what scenario should I expect a data loss? (I can only think of the jobmanager or taskmanager getting killed before the shutdown is done.) Best, Tobi On Mon, Mar 2, 2020 at 1:45 PM Piotr Nowojski wrote: > Hi, > > Sorry for my previous slightly confusing response, please take a look at > the response from Gordon. > > Piotrek > > On 2 Mar 2020, at 12:05, Kaymak, Tobias wrote: > > Hi, > > let me refine my question: My pipeline is generated from Beam, so the > Flink pipeline is a translated Beam pipeline. When I update my Apache Beam > pipeline code, working with a snapshot in Flink to stop the pipeline is not > an option, as the snapshot will use the old representation of the the Flink > pipeline when resuming from that snapshot. > > Meaning that I am looking for a way to drain the pipeline cleanly and > using the last committed offset in Kafka to resume processing after I > started it again (launching it through Beam will regenerate the Flink > pipeline and it should resume at the offset where it left of, that is the > latest committed offset in Kafka). > > Can this be achieved with a cancel or stop of the Flink pipeline? > > Best, > Tobias > > On Mon, Mar 2, 2020 at 11:09 AM Piotr Nowojski > wrote: > >> Hi Tobi, >> >> No, FlinkKafkaConsumer is not using committed Kafka’s offsets for >> recovery. Offsets where to start from are stored in the checkpoint itself. >> Updating the offsets back to Kafka is an optional, purely cosmetic thing >> from the Flink’s perspective, so the job will start from the correct >> offsets. >> >> However, if you for whatever the reason re-start the job from a >> savepoint/checkpoint that’s not the latest one, this will violate >> exactly-once guarantees - there will be some duplicated records committed >> two times in the sinks, as simply some records would be processed and >> committed twice. Committing happens on checkpoint, so if you are recovering >> to some previous checkpoint, there is nothing Flink can do - some records >> were already committed before. >> >> Piotrek >> >> On 2 Mar 2020, at 10:12, Kaymak, Tobias wrote: >> >> Thank you Piotr! >> >> One last question - let's assume my source is a Kafka topic - if I stop >> via the CLI with a savepoint in Flink 1.9, but do not use that savepoint >> when restarting my job - the job would continue from the last offset that >> has been committed in Kafka and thus I would also not experience a loss of >> data in my sink. Is that correct? >> >> Best, >> Tobi >> >> On Fri, Feb 28, 2020 at 3:17 PM Piotr Nowojski >> wrote: >> >>> Yes, that’s correct. There shouldn’t be any data loss. Stop with >>> savepoint is a solution to make sure, that if you are stopping a job >>> (either permanently or temporarily) that all of the results are >>> published/committed to external systems before you actually stop the job. >>> >>> If you just cancel/kill/crash a job, in some rare cases (if a checkpoint >>> was completing at the time cluster was crashing), some records might not be >>> committed before the cancellation/kill/crash happened. Also note that >>> doesn’t mean there is a data loss, just those records will be published >>> once you restore your job from a checkpoint. If you want to stop the job >>> permanently, that might not happen, hence we need stop with savepoint. >>> >>> Piotrek >>> >>> On 28 Feb 2020, at 15:02, Kaymak, Tobias >>> wrote: >>> >>> Thank you! For understanding the matter: When I have a streaming >>> pipeline (reading from Kafka, writing somewhere) and I click "cancel" and >>> after that I restart the pipeline - I should not expect any data to be lost >>> - is that correct? >>> >>> Best, >>> Tobias >>> >>> On Fri, Feb 28, 2020 at 2:51 PM Piotr Nowojski >>> wrote: >>> >>>> Thanks for confirming that Yadong. I’ve created a ticket for that [1]. >>>> >>>> Piotrek >>>> >>>> [1] https://issues.apache.org/jira/browse/FLINK-16340 >>>> >>>> On 28 Feb 2020, at 14:32, Yadong Xie wrote: >>>> >>>> Hi >>>> >>>> 1. the old stop button was removed in flink
Re: Has the "Stop" Button next to the "Cancel" Button been removed in Flink's 1.9 web interface?
Hi, let me refine my question: My pipeline is generated from Beam, so the Flink pipeline is a translated Beam pipeline. When I update my Apache Beam pipeline code, working with a snapshot in Flink to stop the pipeline is not an option, as the snapshot will use the old representation of the the Flink pipeline when resuming from that snapshot. Meaning that I am looking for a way to drain the pipeline cleanly and using the last committed offset in Kafka to resume processing after I started it again (launching it through Beam will regenerate the Flink pipeline and it should resume at the offset where it left of, that is the latest committed offset in Kafka). Can this be achieved with a cancel or stop of the Flink pipeline? Best, Tobias On Mon, Mar 2, 2020 at 11:09 AM Piotr Nowojski wrote: > Hi Tobi, > > No, FlinkKafkaConsumer is not using committed Kafka’s offsets for > recovery. Offsets where to start from are stored in the checkpoint itself. > Updating the offsets back to Kafka is an optional, purely cosmetic thing > from the Flink’s perspective, so the job will start from the correct > offsets. > > However, if you for whatever the reason re-start the job from a > savepoint/checkpoint that’s not the latest one, this will violate > exactly-once guarantees - there will be some duplicated records committed > two times in the sinks, as simply some records would be processed and > committed twice. Committing happens on checkpoint, so if you are recovering > to some previous checkpoint, there is nothing Flink can do - some records > were already committed before. > > Piotrek > > On 2 Mar 2020, at 10:12, Kaymak, Tobias wrote: > > Thank you Piotr! > > One last question - let's assume my source is a Kafka topic - if I stop > via the CLI with a savepoint in Flink 1.9, but do not use that savepoint > when restarting my job - the job would continue from the last offset that > has been committed in Kafka and thus I would also not experience a loss of > data in my sink. Is that correct? > > Best, > Tobi > > On Fri, Feb 28, 2020 at 3:17 PM Piotr Nowojski > wrote: > >> Yes, that’s correct. There shouldn’t be any data loss. Stop with >> savepoint is a solution to make sure, that if you are stopping a job >> (either permanently or temporarily) that all of the results are >> published/committed to external systems before you actually stop the job. >> >> If you just cancel/kill/crash a job, in some rare cases (if a checkpoint >> was completing at the time cluster was crashing), some records might not be >> committed before the cancellation/kill/crash happened. Also note that >> doesn’t mean there is a data loss, just those records will be published >> once you restore your job from a checkpoint. If you want to stop the job >> permanently, that might not happen, hence we need stop with savepoint. >> >> Piotrek >> >> On 28 Feb 2020, at 15:02, Kaymak, Tobias >> wrote: >> >> Thank you! For understanding the matter: When I have a streaming pipeline >> (reading from Kafka, writing somewhere) and I click "cancel" and after that >> I restart the pipeline - I should not expect any data to be lost - is that >> correct? >> >> Best, >> Tobias >> >> On Fri, Feb 28, 2020 at 2:51 PM Piotr Nowojski >> wrote: >> >>> Thanks for confirming that Yadong. I’ve created a ticket for that [1]. >>> >>> Piotrek >>> >>> [1] https://issues.apache.org/jira/browse/FLINK-16340 >>> >>> On 28 Feb 2020, at 14:32, Yadong Xie wrote: >>> >>> Hi >>> >>> 1. the old stop button was removed in flink 1.9.0 since it could not >>> work properly as I know >>> 2. if we have the feature of the stop with savepoint, we could add it to >>> the web UI, but it may still need some work on the rest API to support the >>> new feature >>> >>> >>> Best, >>> Yadong >>> >>> >>> Piotr Nowojski 于2020年2月28日周五 下午8:49写道: >>> >>>> Hi, >>>> >>>> I’m not sure. Maybe Yadong (CC) will know more, but to the best of my >>>> knowledge and research: >>>> >>>> 1. In Flink 1.9 we switched from the old webUI to a new one, that >>>> probably explains the difference you are seeing. >>>> 2. The “Stop” button in the old webUI, was not working properly - that >>>> was not stop with savepoint, as stop with savepoint is a relatively new >>>> feature. >>>> 3. Now that we have stop with savepoint (it can be used from CLI as you >>>> wrote), probably we could exp
Re: Has the "Stop" Button next to the "Cancel" Button been removed in Flink's 1.9 web interface?
Thank you Piotr! One last question - let's assume my source is a Kafka topic - if I stop via the CLI with a savepoint in Flink 1.9, but do not use that savepoint when restarting my job - the job would continue from the last offset that has been committed in Kafka and thus I would also not experience a loss of data in my sink. Is that correct? Best, Tobi On Fri, Feb 28, 2020 at 3:17 PM Piotr Nowojski wrote: > Yes, that’s correct. There shouldn’t be any data loss. Stop with savepoint > is a solution to make sure, that if you are stopping a job (either > permanently or temporarily) that all of the results are published/committed > to external systems before you actually stop the job. > > If you just cancel/kill/crash a job, in some rare cases (if a checkpoint > was completing at the time cluster was crashing), some records might not be > committed before the cancellation/kill/crash happened. Also note that > doesn’t mean there is a data loss, just those records will be published > once you restore your job from a checkpoint. If you want to stop the job > permanently, that might not happen, hence we need stop with savepoint. > > Piotrek > > On 28 Feb 2020, at 15:02, Kaymak, Tobias wrote: > > Thank you! For understanding the matter: When I have a streaming pipeline > (reading from Kafka, writing somewhere) and I click "cancel" and after that > I restart the pipeline - I should not expect any data to be lost - is that > correct? > > Best, > Tobias > > On Fri, Feb 28, 2020 at 2:51 PM Piotr Nowojski > wrote: > >> Thanks for confirming that Yadong. I’ve created a ticket for that [1]. >> >> Piotrek >> >> [1] https://issues.apache.org/jira/browse/FLINK-16340 >> >> On 28 Feb 2020, at 14:32, Yadong Xie wrote: >> >> Hi >> >> 1. the old stop button was removed in flink 1.9.0 since it could not >> work properly as I know >> 2. if we have the feature of the stop with savepoint, we could add it to >> the web UI, but it may still need some work on the rest API to support the >> new feature >> >> >> Best, >> Yadong >> >> >> Piotr Nowojski 于2020年2月28日周五 下午8:49写道: >> >>> Hi, >>> >>> I’m not sure. Maybe Yadong (CC) will know more, but to the best of my >>> knowledge and research: >>> >>> 1. In Flink 1.9 we switched from the old webUI to a new one, that >>> probably explains the difference you are seeing. >>> 2. The “Stop” button in the old webUI, was not working properly - that >>> was not stop with savepoint, as stop with savepoint is a relatively new >>> feature. >>> 3. Now that we have stop with savepoint (it can be used from CLI as you >>> wrote), probably we could expose this feature in the new UI as well, unless >>> it’s already exposed somewhere? Yadong, do you know an answer for that? >>> >>> Piotrek >>> >>> On 27 Feb 2020, at 13:31, Kaymak, Tobias >>> wrote: >>> >>> Hello, >>> >>> before Flink 1.9 I was able to "Stop" a streaming pipeline - after >>> clicking that button in the webinterface it performed a clean shutdown. Now >>> with Flink 1.9 I just see the option to cancel it. >>> >>> However, using the commandline flink stop -d >>> 266c5b38cf9d8e61a398a0bef4a1b350 still does the trick. So the >>> functionality is there. >>> >>> Has the button been removed on purpose? >>> >>> Best, >>> Tobias >>> >>> >>> >> > > -- > > Tobias Kaymak > Data Engineer > Data Intelligence > > tobias.kay...@ricardo.ch > www.ricardo.ch > Theilerstrasse 1a, 6300 Zug > > >
Re: Has the "Stop" Button next to the "Cancel" Button been removed in Flink's 1.9 web interface?
Thank you! For understanding the matter: When I have a streaming pipeline (reading from Kafka, writing somewhere) and I click "cancel" and after that I restart the pipeline - I should not expect any data to be lost - is that correct? Best, Tobias On Fri, Feb 28, 2020 at 2:51 PM Piotr Nowojski wrote: > Thanks for confirming that Yadong. I’ve created a ticket for that [1]. > > Piotrek > > [1] https://issues.apache.org/jira/browse/FLINK-16340 > > On 28 Feb 2020, at 14:32, Yadong Xie wrote: > > Hi > > 1. the old stop button was removed in flink 1.9.0 since it could not > work properly as I know > 2. if we have the feature of the stop with savepoint, we could add it to > the web UI, but it may still need some work on the rest API to support the > new feature > > > Best, > Yadong > > > Piotr Nowojski 于2020年2月28日周五 下午8:49写道: > >> Hi, >> >> I’m not sure. Maybe Yadong (CC) will know more, but to the best of my >> knowledge and research: >> >> 1. In Flink 1.9 we switched from the old webUI to a new one, that >> probably explains the difference you are seeing. >> 2. The “Stop” button in the old webUI, was not working properly - that >> was not stop with savepoint, as stop with savepoint is a relatively new >> feature. >> 3. Now that we have stop with savepoint (it can be used from CLI as you >> wrote), probably we could expose this feature in the new UI as well, unless >> it’s already exposed somewhere? Yadong, do you know an answer for that? >> >> Piotrek >> >> On 27 Feb 2020, at 13:31, Kaymak, Tobias >> wrote: >> >> Hello, >> >> before Flink 1.9 I was able to "Stop" a streaming pipeline - after >> clicking that button in the webinterface it performed a clean shutdown. Now >> with Flink 1.9 I just see the option to cancel it. >> >> However, using the commandline flink stop -d >> 266c5b38cf9d8e61a398a0bef4a1b350 still does the trick. So the >> functionality is there. >> >> Has the button been removed on purpose? >> >> Best, >> Tobias >> >> >> > -- Tobias Kaymak Data Engineer Data Intelligence tobias.kay...@ricardo.ch www.ricardo.ch Theilerstrasse 1a, 6300 Zug
Has the "Stop" Button next to the "Cancel" Button been removed in Flink's 1.9 web interface?
Hello, before Flink 1.9 I was able to "Stop" a streaming pipeline - after clicking that button in the webinterface it performed a clean shutdown. Now with Flink 1.9 I just see the option to cancel it. However, using the commandline flink stop -d 266c5b38cf9d8e61a398a0bef4a1b350 still does the trick. So the functionality is there. Has the button been removed on purpose? Best, Tobias
Re: Flink 1.8: Using the RocksDB state backend causes "NoSuchMethodError" when trying to stop a pipeline
You are right, my bad. We had a company internal java dependency that was referring to an older version of RocksDB. I've spotted it via running mvn dependency:tree while investigating with a colleague. Thank you! On Tue, Aug 13, 2019 at 8:01 PM Yun Tang wrote: > Hi Tobias > > First of all, I think you would not need to ADD the > flink-statebackend-rocksdb jar package into your docker image's lib folder, > as the flink-dist jar package within lib folder already include all classes > of flink-statebackend-rocksdb. > > I think the root cause is that you might assemble the rocksdbjni jar > package in your user application jar which was rocksdbjni-5.7.5.jar in > Flink-1.7. As Flink would load classes first from the user code jar [1], > however, method org.rocksdb.ColumnFamilyHandle.getDescriptor() is not > existed in rocksdbjni-5.7.5.jar but in rocksdbjni-5.17.2 (or we can say > frocksdbjni-5.17.2-artisans-1.0 in Flink-1.8). That's why you come across > this NoSuchMethodError exception. > > If no necessary, please do not assemble rocksdbjni package in your user > code jar as flink-dist already provide all needed classes. Moreover, adding > dependency of flink-statebackend-rocksdb_2.11 in your pom.xml should be > enough as it already includes the dependency of rocksdbjni. > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#classloader-resolve-order > > Best > Yun Tang > > -- > *From:* Kaymak, Tobias > *Sent:* Tuesday, August 13, 2019 21:20 > *To:* user@flink.apache.org > *Subject:* Flink 1.8: Using the RocksDB state backend causes > "NoSuchMethodError" when trying to stop a pipeline > > Hi, > > I am using Apache Beam 2.14.0 with Flink 1.8.0 and I have included the > RocksDb dependency in my projects pom.xml as well as baked it into the > Dockerfile like this: > > FROM flink:1.8.0-scala_2.11 > > ADD --chown=flink:flink > http://central.maven.org/maven2/org/apache/flink/flink-statebackend-rocksdb_2.11/1.8.0/flink-statebackend-rocksdb_2.11-1.8.0.jar > /opt/flink/lib/flink-statebackend-rocksdb_2.11-1.8.0.jar > > > Everything seems to be normal up to the point when I try to stop and > cleanly shutdown my pipeline. I get the following error: > > java.lang.NoSuchMethodError: > org.rocksdb.ColumnFamilyHandle.getDescriptor()Lorg/rocksdb/ColumnFamilyDescriptor; > at > org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.addColumnFamilyOptionsToCloseLater(RocksDBOperationUtils.java:160) > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:331) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:362) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.dispose(DoFnOperator.java:470) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.tryDisposeAllOperators(StreamTask.java:454) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:337) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748) > > I can cancel my pipeline and snapshotting in general works, however. Flink > 1.7.2 with Beam 2.12.0 did not have any problem, could it be that this is > caused by the switch to FRocksDb?[0] > > Best, > Tobias > > [0] > https://ci.apache.org/projects/flink/flink-docs-stable/release-notes/flink-1.8.html#rocksdb-version-bump-and-switch-to-frocksdb-flink-10471 >
Flink 1.8: Using the RocksDB state backend causes "NoSuchMethodError" when trying to stop a pipeline
Hi, I am using Apache Beam 2.14.0 with Flink 1.8.0 and I have included the RocksDb dependency in my projects pom.xml as well as baked it into the Dockerfile like this: FROM flink:1.8.0-scala_2.11 ADD --chown=flink:flink http://central.maven.org/maven2/org/apache/flink/flink-statebackend-rocksdb_2.11/1.8.0/flink-statebackend-rocksdb_2.11-1.8.0.jar /opt/flink/lib/flink-statebackend-rocksdb_2.11-1.8.0.jar Everything seems to be normal up to the point when I try to stop and cleanly shutdown my pipeline. I get the following error: java.lang.NoSuchMethodError: org.rocksdb.ColumnFamilyHandle.getDescriptor()Lorg/rocksdb/ColumnFamilyDescriptor; at org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.addColumnFamilyOptionsToCloseLater(RocksDBOperationUtils.java:160) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:331) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:362) at org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.dispose(DoFnOperator.java:470) at org.apache.flink.streaming.runtime.tasks.StreamTask.tryDisposeAllOperators(StreamTask.java:454) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:337) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) at java.lang.Thread.run(Thread.java:748) I can cancel my pipeline and snapshotting in general works, however. Flink 1.7.2 with Beam 2.12.0 did not have any problem, could it be that this is caused by the switch to FRocksDb?[0] Best, Tobias [0] https://ci.apache.org/projects/flink/flink-docs-stable/release-notes/flink-1.8.html#rocksdb-version-bump-and-switch-to-frocksdb-flink-10471
Re: Flink 1.8.1: Seeing {"errors":["Not found."]} when trying to access the Jobmanagers web interface
Yes, I did, but I am using a session cluster https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/docker.html#flink-session-cluster to run many jobs On Tue, Aug 6, 2019 at 2:05 PM Ufuk Celebi wrote: > Hey Tobias, > > out of curiosity: were you using the job/application cluster (as > documented here: > https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/docker.html#flink-job-cluster > )? > > – Ufuk > > > On Tue, Aug 6, 2019 at 1:50 PM Kaymak, Tobias > wrote: > >> I was using Apache Beam and in the lib folder I had a JAR that was using >> Flink 1.7 in its POM. After bumping that to 1.8 it works :) >> >> On Tue, Aug 6, 2019 at 11:58 AM Kaymak, Tobias >> wrote: >> >>> It completely works when using the docker image tag 1.7.2 - I just >>> bumped back and the web interface was there. >>> >>> On Tue, Aug 6, 2019 at 10:21 AM Kaymak, Tobias >>> wrote: >>> >>>> Hello, >>>> >>>> after upgrading the docker image from version 1.7.2 to 1.8.1 and >>>> wiping out zookeeper completely I see >>>> >>>> {"errors":["Not found."]} >>>> >>>> when trying to access the webinterface of Flink. I can launch jobs from >>>> the cmdline and I can't spot any error in the logs (so far on level INFO). >>>> I tried adding the flink-runtime-web_2.12-1.8.1.jar as a dependency >>>> into the lib folder when building the Docker container, but this did not >>>> help either. >>>> >>>> Has anyone experienced this problem? Is my Flink config faulty or what >>>> could be the reason? >>>> >>>
Re: Flink 1.8.1: Seeing {"errors":["Not found."]} when trying to access the Jobmanagers web interface
I was using Apache Beam and in the lib folder I had a JAR that was using Flink 1.7 in its POM. After bumping that to 1.8 it works :) On Tue, Aug 6, 2019 at 11:58 AM Kaymak, Tobias wrote: > It completely works when using the docker image tag 1.7.2 - I just bumped > back and the web interface was there. > > On Tue, Aug 6, 2019 at 10:21 AM Kaymak, Tobias > wrote: > >> Hello, >> >> after upgrading the docker image from version 1.7.2 to 1.8.1 and wiping >> out zookeeper completely I see >> >> {"errors":["Not found."]} >> >> when trying to access the webinterface of Flink. I can launch jobs from >> the cmdline and I can't spot any error in the logs (so far on level INFO). >> I tried adding the flink-runtime-web_2.12-1.8.1.jar as a dependency into >> the lib folder when building the Docker container, but this did not help >> either. >> >> Has anyone experienced this problem? Is my Flink config faulty or what >> could be the reason? >> >
Re: Flink 1.8.1: Seeing {"errors":["Not found."]} when trying to access the Jobmanagers web interface
It completely works when using the docker image tag 1.7.2 - I just bumped back and the web interface was there. On Tue, Aug 6, 2019 at 10:21 AM Kaymak, Tobias wrote: > Hello, > > after upgrading the docker image from version 1.7.2 to 1.8.1 and wiping > out zookeeper completely I see > > {"errors":["Not found."]} > > when trying to access the webinterface of Flink. I can launch jobs from > the cmdline and I can't spot any error in the logs (so far on level INFO). > I tried adding the flink-runtime-web_2.12-1.8.1.jar as a dependency into > the lib folder when building the Docker container, but this did not help > either. > > Has anyone experienced this problem? Is my Flink config faulty or what > could be the reason? >
Flink 1.8.1: Seeing {"errors":["Not found."]} when trying to access the Jobmanagers web interface
Hello, after upgrading the docker image from version 1.7.2 to 1.8.1 and wiping out zookeeper completely I see {"errors":["Not found."]} when trying to access the webinterface of Flink. I can launch jobs from the cmdline and I can't spot any error in the logs (so far on level INFO). I tried adding the flink-runtime-web_2.12-1.8.1.jar as a dependency into the lib folder when building the Docker container, but this did not help either. Has anyone experienced this problem? Is my Flink config faulty or what could be the reason?