Flink 1.17.2 planned?
Hi team, are there any infos about a bugfix release 1.17.2 available? E.g. will there be another bugfix release of 1.17 / approximate timing? We are hit by https://issues.apache.org/jira/browse/FLINK-32296 which leads to wrong SQL responses in some circumstances. Kind regards, Christian This e-mail is from Mapp Digital Group and its international legal entities and may contain information that is confidential. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments.
Cancel a job in status INITIALIZING
Hi, we’re running a Flink Cluster in standalone/session mode. During a restart of a jobmanager one job was stuck in status INITIALIZING. When trying to cancel the job via CLI the command failed with a java.util.concurrent.TimeoutException. The only way to get rid of this job for us was to stop the jobmanagers and delete the zookeeper root node. Is there a better way of handling this issue as this seems to be very unclean to me. Kind regards, Christian Mapp Digital Germany GmbH with registered offices at Sandstr. 3, 80335 München. Registered with the District Court München HRB 226181 Managing Directors: Frasier, Christopher & Warren, Steve This e-mail is from Mapp Digital and its international legal entities and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Please consider the environment before printing. Thank you.
Re: Kafka Consumer commit error
Hi Alexander, I’ve created a Jira ticket here https://issues.apache.org/jira/browse/FLINK-28060. Unfortunately this is causing some issues to us. I hope with the attached demo project the root cause of this can also be determined, as this is reproducible in Flink 1.15.0, but not in Flink 1.14.4. Kind regards, Christian Von: Alexander Fedulov Datum: Montag, 13. Juni 2022 um 23:42 An: Christian Lorenz Cc: "user@flink.apache.org" Betreff: Re: Kafka Consumer commit error This email has reached Mapp via an external source Hi Christian, thanks for the reply. We use AT_LEAST_ONCE delivery semantics in this application. Do you think this might still be related? No, in that case, Kafka transactions are not used, so it should not be relevant. Best, Alexander Fedulov On Mon, Jun 13, 2022 at 3:48 PM Christian Lorenz mailto:christian.lor...@mapp.com>> wrote: Hi Alexander, thanks for the reply. We use AT_LEAST_ONCE delivery semantics in this application. Do you think this might still be related? Best regards, Christian Von: Alexander Fedulov mailto:alexan...@ververica.com>> Datum: Montag, 13. Juni 2022 um 13:06 An: "user@flink.apache.org<mailto:user@flink.apache.org>" mailto:user@flink.apache.org>> Cc: Christian Lorenz mailto:christian.lor...@mapp.com>> Betreff: Re: Kafka Consumer commit error This email has reached Mapp via an external source Hi Christian, you should check if the exceptions that you see after the broker is back from maintenance are the same as the ones you posted here. If you are using EXACTLY_ONCE, it could be that the later errors are caused by Kafka purging transactions that Flink attempts to commit [1]. Best, Alexander Fedulov [1] https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/kafka/#fault-tolerance On Mon, Jun 13, 2022 at 12:04 PM Martijn Visser mailto:martijnvis...@apache.org>> wrote: Hi Christian, I would expect that after the broker comes back up and recovers completely, these error messages would disappear automagically. It should not require a restart (only time). Flink doesn't rely on Kafka's checkpointing mechanism for fault tolerance. Best regards, Martijn Op wo 8 jun. 2022 om 15:49 schreef Christian Lorenz mailto:christian.lor...@mapp.com>>: Hi, we have some issues with a job using the flink-sql-connector-kafka (flink 1.15.0/standalone cluster). If one broker e.g. is restarted for maintainance (replication-factor=2), the taskmanagers executing the job are constantly logging errors on each checkpoint creation: Failed to commit consumer offsets for checkpoint 50659 org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit failed with a retriable exception. You should retry committing the latest consumed offsets. Caused by: org.apache.flink.kafka.shaded.org.apache.kafka.common.errors.CoordinatorNotAvailableException: The coordinator is not available. AFAICT the error itself is produced by the underlying kafka consumer. Unfortunately this error cannot be reproduced on our test system. From my understanding this error might occur once, but follow up checkpoints / kafka commits should be fine again. Currently my only way of “fixing” the issue is to restart the taskmanagers. Is there maybe some kafka consumer setting which would help to circumvent this? Kind regards, Christian Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 München. Registered with the District Court München HRB 226181 Managing Directors: Frasier, Christopher & Warren, Steve This e-mail is from Mapp Digital and its international legal entities and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Please consider the environment before printing. Thank you. Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 München. Registered with the District Court München HRB 226181 Managing Directors: Frasier, Christopher & Warren, Steve This e-mail is from Mapp Digital and its international legal entities and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Please consider the environment before printing. Thank you. Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 München. Registered with the District Court München HRB 226181 Managing Directors: Frasier, Christopher & Warren, Steve This e-mail is from Mapp Digital and its international legal entities and may contain information that is confidential or proprietary. If you are not the intended recipien
Re: Kafka Consumer commit error
Hi Alexander, thanks for the reply. We use AT_LEAST_ONCE delivery semantics in this application. Do you think this might still be related? Best regards, Christian Von: Alexander Fedulov Datum: Montag, 13. Juni 2022 um 13:06 An: "user@flink.apache.org" Cc: Christian Lorenz Betreff: Re: Kafka Consumer commit error This email has reached Mapp via an external source Hi Christian, you should check if the exceptions that you see after the broker is back from maintenance are the same as the ones you posted here. If you are using EXACTLY_ONCE, it could be that the later errors are caused by Kafka purging transactions that Flink attempts to commit [1]. Best, Alexander Fedulov [1] https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/kafka/#fault-tolerance On Mon, Jun 13, 2022 at 12:04 PM Martijn Visser mailto:martijnvis...@apache.org>> wrote: Hi Christian, I would expect that after the broker comes back up and recovers completely, these error messages would disappear automagically. It should not require a restart (only time). Flink doesn't rely on Kafka's checkpointing mechanism for fault tolerance. Best regards, Martijn Op wo 8 jun. 2022 om 15:49 schreef Christian Lorenz mailto:christian.lor...@mapp.com>>: Hi, we have some issues with a job using the flink-sql-connector-kafka (flink 1.15.0/standalone cluster). If one broker e.g. is restarted for maintainance (replication-factor=2), the taskmanagers executing the job are constantly logging errors on each checkpoint creation: Failed to commit consumer offsets for checkpoint 50659 org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit failed with a retriable exception. You should retry committing the latest consumed offsets. Caused by: org.apache.flink.kafka.shaded.org.apache.kafka.common.errors.CoordinatorNotAvailableException: The coordinator is not available. AFAICT the error itself is produced by the underlying kafka consumer. Unfortunately this error cannot be reproduced on our test system. From my understanding this error might occur once, but follow up checkpoints / kafka commits should be fine again. Currently my only way of “fixing” the issue is to restart the taskmanagers. Is there maybe some kafka consumer setting which would help to circumvent this? Kind regards, Christian Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 München. Registered with the District Court München HRB 226181 Managing Directors: Frasier, Christopher & Warren, Steve This e-mail is from Mapp Digital and its international legal entities and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Please consider the environment before printing. Thank you. Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 München. Registered with the District Court München HRB 226181 Managing Directors: Frasier, Christopher & Warren, Steve This e-mail is from Mapp Digital and its international legal entities and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Please consider the environment before printing. Thank you.
Re: Kafka Consumer commit error
Hi Martijn, thanks for replying. I would also expect the behavior you describe below. AFAICT it was also like this with Flink 1.14. I am aware that Flink is using checkpointing for fault tolerance, but for example the Kafka offsets are part of our monitoring and this will lead to alerts. Other applications which use the Kafka client directly also do not show repeated commit failures once all Kafka brokers are online again. I think this occurs in Flink jobs using Flinks Kafka Connector directly (KafkaSource) or via a Kafka SQL Connector based application. Will try to write a small job to verify this behavior, as we also use flink-avro-confluent-registry which makes it harder to understand the root of the issue. Best regards, Christian Von: Martijn Visser Datum: Montag, 13. Juni 2022 um 12:05 An: Christian Lorenz Cc: "user@flink.apache.org" Betreff: Re: Kafka Consumer commit error This email has reached Mapp via an external source Hi Christian, I would expect that after the broker comes back up and recovers completely, these error messages would disappear automagically. It should not require a restart (only time). Flink doesn't rely on Kafka's checkpointing mechanism for fault tolerance. Best regards, Martijn Op wo 8 jun. 2022 om 15:49 schreef Christian Lorenz mailto:christian.lor...@mapp.com>>: Hi, we have some issues with a job using the flink-sql-connector-kafka (flink 1.15.0/standalone cluster). If one broker e.g. is restarted for maintainance (replication-factor=2), the taskmanagers executing the job are constantly logging errors on each checkpoint creation: Failed to commit consumer offsets for checkpoint 50659 org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit failed with a retriable exception. You should retry committing the latest consumed offsets. Caused by: org.apache.flink.kafka.shaded.org.apache.kafka.common.errors.CoordinatorNotAvailableException: The coordinator is not available. AFAICT the error itself is produced by the underlying kafka consumer. Unfortunately this error cannot be reproduced on our test system. From my understanding this error might occur once, but follow up checkpoints / kafka commits should be fine again. Currently my only way of “fixing” the issue is to restart the taskmanagers. Is there maybe some kafka consumer setting which would help to circumvent this? Kind regards, Christian Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 München. Registered with the District Court München HRB 226181 Managing Directors: Frasier, Christopher & Warren, Steve This e-mail is from Mapp Digital and its international legal entities and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Please consider the environment before printing. Thank you. Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 München. Registered with the District Court München HRB 226181 Managing Directors: Frasier, Christopher & Warren, Steve This e-mail is from Mapp Digital and its international legal entities and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Please consider the environment before printing. Thank you.
Kafka Consumer commit error
Hi, we have some issues with a job using the flink-sql-connector-kafka (flink 1.15.0/standalone cluster). If one broker e.g. is restarted for maintainance (replication-factor=2), the taskmanagers executing the job are constantly logging errors on each checkpoint creation: Failed to commit consumer offsets for checkpoint 50659 org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit failed with a retriable exception. You should retry committing the latest consumed offsets. Caused by: org.apache.flink.kafka.shaded.org.apache.kafka.common.errors.CoordinatorNotAvailableException: The coordinator is not available. AFAICT the error itself is produced by the underlying kafka consumer. Unfortunately this error cannot be reproduced on our test system. From my understanding this error might occur once, but follow up checkpoints / kafka commits should be fine again. Currently my only way of “fixing” the issue is to restart the taskmanagers. Is there maybe some kafka consumer setting which would help to circumvent this? Kind regards, Christian Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 München. Registered with the District Court München HRB 226181 Managing Directors: Frasier, Christopher & Warren, Steve This e-mail is from Mapp Digital and its international legal entities and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Please consider the environment before printing. Thank you.