[ANNOUNCE] Apache Flink CDC 3.1.1 released

2024-06-18 文章 Qingsheng Ren
The Apache Flink community is very happy to announce the release of Apache
Flink CDC 3.1.1.

Apache Flink CDC is a distributed data integration tool for real time data
and batch data, bringing the simplicity and elegance of data integration
via YAML to describe the data movement and transformation in a data
pipeline.

Please check out the release blog post for an overview of the release:
https://flink.apache.org/2024/06/18/apache-flink-cdc-3.1.1-release-announcement/

The release is available for download at:
https://flink.apache.org/downloads.html

Maven artifacts for Flink CDC can be found at:
https://search.maven.org/search?q=g:org.apache.flink%20cdc

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12354763

We would like to thank all contributors of the Apache Flink community who
made this release possible!

Regards,
Qingsheng Ren


[ANNOUNCE] Apache Flink CDC 3.1.0 released

2024-05-17 文章 Qingsheng Ren
The Apache Flink community is very happy to announce the release of
Apache Flink CDC 3.1.0.

Apache Flink CDC is a distributed data integration tool for real time
data and batch data, bringing the simplicity and elegance of data
integration via YAML to describe the data movement and transformation
in a data pipeline.

Please check out the release blog post for an overview of the release:
https://flink.apache.org/2024/05/17/apache-flink-cdc-3.1.0-release-announcement/

The release is available for download at:
https://flink.apache.org/downloads.html

Maven artifacts for Flink CDC can be found at:
https://search.maven.org/search?q=g:org.apache.flink%20cdc

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12354387

We would like to thank all contributors of the Apache Flink community
who made this release possible!

Regards,
Qingsheng Ren


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 文章 Qingsheng Ren
Congratulations and big THANK YOU to everyone helping with this release!

Best,
Qingsheng

On Fri, Oct 27, 2023 at 10:18 AM Benchao Li  wrote:

> Great work, thanks everyone involved!
>
> Rui Fan <1996fan...@gmail.com> 于2023年10月27日周五 10:16写道:
> >
> > Thanks for the great work!
> >
> > Best,
> > Rui
> >
> > On Fri, Oct 27, 2023 at 10:03 AM Paul Lam  wrote:
> >
> > > Finally! Thanks to all!
> > >
> > > Best,
> > > Paul Lam
> > >
> > > > 2023年10月27日 03:58,Alexander Fedulov 
> 写道:
> > > >
> > > > Great work, thanks everyone!
> > > >
> > > > Best,
> > > > Alexander
> > > >
> > > > On Thu, 26 Oct 2023 at 21:15, Martijn Visser <
> martijnvis...@apache.org>
> > > > wrote:
> > > >
> > > >> Thank you all who have contributed!
> > > >>
> > > >> Op do 26 okt 2023 om 18:41 schreef Feng Jin 
> > > >>
> > > >>> Thanks for the great work! Congratulations
> > > >>>
> > > >>>
> > > >>> Best,
> > > >>> Feng Jin
> > > >>>
> > > >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu 
> wrote:
> > > >>>
> > >  Congratulations, Well done!
> > > 
> > >  Best,
> > >  Leonard
> > > 
> > >  On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee <
> lincoln.8...@gmail.com>
> > >  wrote:
> > > 
> > > > Thanks for the great work! Congrats all!
> > > >
> > > > Best,
> > > > Lincoln Lee
> > > >
> > > >
> > > > Jing Ge  于2023年10月27日周五 00:16写道:
> > > >
> > > >> The Apache Flink community is very happy to announce the
> release of
> > > > Apache
> > > >> Flink 1.18.0, which is the first release for the Apache Flink
> 1.18
> > > > series.
> > > >>
> > > >> Apache Flink® is an open-source unified stream and batch data
> > >  processing
> > > >> framework for distributed, high-performing, always-available,
> and
> > > > accurate
> > > >> data applications.
> > > >>
> > > >> The release is available for download at:
> > > >> https://flink.apache.org/downloads.html
> > > >>
> > > >> Please check out the release blog post for an overview of the
> > > > improvements
> > > >> for this release:
> > > >>
> > > >>
> > > >
> > > 
> > > >>>
> > > >>
> > >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > > >>
> > > >> The full release notes are available in Jira:
> > > >>
> > > >>
> > > >
> > > 
> > > >>>
> > > >>
> > >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352885
> > > >>
> > > >> We would like to thank all contributors of the Apache Flink
> > > >> community
> > >  who
> > > >> made this release possible!
> > > >>
> > > >> Best regards,
> > > >> Konstantin, Qingsheng, Sergey, and Jing
> > > >>
> > > >
> > > 
> > > >>>
> > > >>
> > >
> > >
>
>
>
> --
>
> Best,
> Benchao Li
>


[ANNOUNCE] Starting with Flink 1.18 Release Sync

2023-04-03 文章 Qingsheng Ren
Hi everyone,

As a fresh start of the Flink release 1.18, I'm happy to share with you
that the first release sync meeting of 1.18 will happen tomorrow on
Tuesday, April 4th at 10am (UTC+2) / 4pm (UTC+8). Welcome and feel free to
join us and share your ideas about the new release cycle!

Details of joining the release sync can be found in the 1.18 release wiki
page [1].

All contributors are invited to update the same wiki page [1] and include
features targeting the 1.18 release.

Looking forward to seeing you all in the meeting!

[1] https://cwiki.apache.org/confluence/display/FLINK/1.18+Release

Best regards,
Jing, Konstantin, Sergey and Qingsheng


Re: flink1.15中kafka source、sink对kafka-client的版本要求是否可降低

2022-06-26 文章 Qingsheng Ren
Hi,

目前 Kafka connector 会依赖于高版本 kafka-clients 的一些 API,而且 sink 端为支持 exactly-once 
语义使用了反射。Flink 社区考虑到 Kafka client 本身保证了较好后向兼容性,因此不再提供使用旧版本 client 的 Kafka 
connector,针对 5 年前发布的 Kafka 0.11 版本进行适配也不太现实。

祝好,
庆盛

> On Jun 23, 2022, at 19:37, yidan zhao  wrote:
> 
> 如题,我想问下相关了解的同学,目前只是升级 kafka-client 新版本,换了下接口用法。还是依赖到部分新版本client才有的功能呢?
> 是否有可能基于低版本 kafka-client 实现呢?
> 
> 可以的话我可能自己覆盖实现下。
> 因为高版本kafka-client不支持公司的kafka,公司kafka是开源kafka外层加了一层proxy。使用太高版本kafka
> client访问会有问题(推荐的是0.11,我测试最多到2.2的client)。



Re: Flink消费kafka实时同步到MongoDB出现丢数据

2022-06-26 文章 Qingsheng Ren
Hi,

Flink Kafka connector 会在 checkpoint 完成后将位点提交至 Kafka broker,但是 Flink 并不会依赖于提交到 
Kafka broker 上的位点做故障恢复,而是使用 checkpoint 中存储的位点恢复。

关于丢失数据个人建议可以先从小数据量开始复现问题,然后从 source 至 sink 再排查。

祝好,
庆盛

> On Jun 26, 2022, at 11:54, casel.chen  wrote:
> 
> mysql cdc -> kafka -> mongodb
> 写了一个flink 
> 1.13.2作业从kafka消费mysql整库变更topic并实时同步写入mongodb,也开启了checkpoint,但实测下来发现从savepoint恢复和从groupOffsets恢复会造成数据丢失,请问这应该怎么排查?代码仓库地址:https://github.com/ChenShuai1981/mysql2mongodb.git
> 我的MongodbSink有实现CheckpointedFunction,并在snapshotState方法中会等待所有子线程完成写mongodb。
> 
> 
> flink消费kafka处理数据后提交kafka 
> offset的流程是怎样的?一开始消费kafka获取到pendingOffsets,如何确保这些pendingOffsets都处理完成然后全部提交呢?有没有这块源码解析资料?
> 



Re: Flink-1.15.0 消费kafka提交offset失败?

2022-06-26 文章 Qingsheng Ren
Hi,

这个是 Apache Kafka consumer 的一个已知问题,参见 FLINK-28060 [1] 和 KAFKA-13840 [2]。

[1] https://issues.apache.org/jira/browse/FLINK-28060
[2] https://issues.apache.org/jira/browse/KAFKA-13840

祝好,
庆盛

> On Jun 27, 2022, at 09:16, RS  wrote:
> 
> Hi,
> 请教下各位,Flink-1.15.0,消费Kafka发现下面个问题,offset提交失败的情况,有的任务应该是一直提交失败的,数据消费了,但是offset不变,这种情况如何处理?
> 
> 
> 现象如下:
> 1. 任务没有异常,
> 2. 数据能正常消费处理,不影响数据使用
> 3. 任务有配置checkpoint,几分钟一次,理论上执行checkpoint的时候会提交offset
> 4. 部分任务的从Kafka的offset提交失败,部分正常
> 
> 
> WARN日志如下:
> 2022-06-27 01:07:42,725 INFO  
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets [] - Subtask 
> 0 checkpointing for checkpoint with id=11398 (max part counter=1).
> 2022-06-27 01:07:42,830 INFO  
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets [] - Subtask 
> 0 received completion notification for checkpoint with id=11398.
> 2022-06-27 01:07:43,820 INFO  
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets [] - Subtask 
> 0 checkpointing for checkpoint with id=11476 (max part counter=0).
> 2022-06-27 01:07:43,946 INFO  
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets [] - Subtask 
> 0 received completion notification for checkpoint with id=11476.
> 2022-06-27 01:07:45,218 INFO  
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets [] - Subtask 
> 0 checkpointing for checkpoint with id=11521 (max part counter=47).
> 2022-06-27 01:07:45,290 INFO  
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets [] - Subtask 
> 0 received completion notification for checkpoint with id=11521.
> 2022-06-27 01:07:45,521 WARN  
> org.apache.flink.connector.kafka.source.reader.KafkaSourceReader [] - Failed 
> to commit consumer offsets for checkpoint 11443
> org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset 
> commit failed with a retriable exception. You should retry committing the 
> latest consumed offsets.
> Caused by: org.apache.kafka.common.errors.CoordinatorNotAvailableException: 
> The coordinator is not available.
> 2022-06-27 01:07:45,990 WARN  
> org.apache.flink.connector.kafka.source.reader.KafkaSourceReader [] - Failed 
> to commit consumer offsets for checkpoint 11398
> org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset 
> commit failed with a retriable exception. You should retry committing the 
> latest consumed offsets.
> Caused by: org.apache.kafka.common.errors.CoordinatorNotAvailableException: 
> The coordinator is not available.
> 
> 
> Thanks~



Re: Kafka source 检测到分区变更时发生 WakeupException

2022-05-24 文章 Qingsheng Ren
Hi,

感谢反馈,看上去是一个 bug。可以在 Apache JIRA [1] 上新建一个 ticket 吗?

[1] https://issues.apache.org/jira

> On May 25, 2022, at 11:35, 邹璨  wrote:
> 
> flink版本: 1.14.3
> 模块:connectors/kafka 
> 问题描述:
> 我在使用kafka分区动态发现时发生了WakeupException,导致Job失败。异常信息如下
> 
> 2022-05-10 15:08:03
> java.lang.RuntimeException: One or more fetchers have encountered exception
> at 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:225)
> at 
> org.apache.flink.connector.base.source.reader.SourceReaderBase.getNextFetch(SourceReaderBase.java:169)
> at 
> org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:130)
> at 
> org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:350)
> at 
> org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:68)
> at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:496)
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761)
> at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
> at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: SplitFetcher thread 0 received 
> unexpected exception while polling the records
> at 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:150)
> at 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:105)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ... 1 more
> Caused by: org.apache.kafka.common.errors.WakeupException
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.maybeTriggerWakeup(ConsumerNetworkClient.java:511)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:275)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.position(KafkaConsumer.java:1726)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.position(KafkaConsumer.java:1684)
> at 
> org.apache.flink.connector.kafka.source.reader.KafkaPartitionSplitReader.removeEmptySplits(KafkaPartitionSplitReader.java:315)
> at 
> org.apache.flink.connector.kafka.source.reader.KafkaPartitionSplitReader.handleSplitsChanges(KafkaPartitionSplitReader.java:200)
> at 
> org.apache.flink.connector.base.source.reader.fetcher.AddSplitsTask.run(AddSplitsTask.java:51)
> at 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:142)
> ... 6 more
> 
> 
> 
> 根据异常栈轨迹简单查阅源码发现,KafkaSource在处理分区变更时,会通过调用consumer.wakeup中断正在拉取数据的consumer。
> 随后在处理分区变更时会调用consumer.position方法,由于consumer已经被唤醒,此时会抛出WakeupException。
> 
> 
> 
> 
> 
> 
> 此电子邮件及其包含的信息将仅发送给上面列出的收件人,必须加以保护,并且可能包含法律或其他原因禁止披露的信息。
> 如果您不是此电子邮件的预期收件人,未经许可,您不得存储、复制、发送、分发或披露它。 禁止存储、复制、发送、分发或披露电子邮件的任何部分。
> 如果此电子邮件发送不正确,请立即联系 NAVER 
> Security(dl_naversecur...@navercorp.com)。然后删除所有原件、副本和附件。谢谢您的合作。
> ​
> This email and the information contained in this email are intended solely 
> for the recipient(s) addressed above and may contain information that is 
> confidential and/or privileged or whose disclosure is prohibited by law or 
> other reasons.
> If you are not the intended recipient of this email, please be advised that 
> any unauthorized storage, duplication, dissemination, distribution or 
> disclosure of all or part of this email is strictly prohibited.
> If you received this email in error, please immediately contact NAVER 
> Security (dl_naversecur...@navercorp.com) and delete this email and any 
> copies and attachments from your system. Thank you for your cooperation.​



Re: flink jdbc connector不支持source

2022-04-10 文章 Qingsheng Ren
Hi,

JDBC connector 是支持 source 的,应该是没有将最新的文档同步翻译成中文,可以参考一下英文文档 [1]。

[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/table/jdbc/

祝好

> On Apr 10, 2022, at 11:07, casel.chen  wrote:
> 
> 现有一个场景是需要用flink一次性批量将某个mysql库下指定表(不同schema)同步到hudi表里面,查了一下官网flink jdbc 
> connector [1] 文档说明只支持sink,不支持source。请问社区有支持计划吗?如果没有的话,自己要如何开发,可以给个例子吗?谢谢!
> 
> 
> [1] 
> https://nightlies.apache.org/flink/flink-docs-release-1.13/zh/docs/connectors/datastream/jdbc/



Re: 一些关于flink rabbitmq connector的疑问

2021-10-28 文章 Qingsheng Ren
Hi Ken, 

Thanks for reaching out and sorry for making confusion!

Like Leonard mentioned we definitely honor every connector in Flink community. 
And under the situation that we have more and more connectors to maintain and 
limited guys and resources focusing on connector-wise issues, some connectors 
like Kafka and JDBC might receive more support.

I’m very glad to see that RabbitMQ connector has a lot of active users, and 
also will be appreciate if friends from RabbitMQ side could help us improve 
RabbitMQ connector, like migrating to FLIP-27 Source API and FLIP-143 Sink API~

感谢联系,也很抱歉之前的表述产生了误解!

如 Leonard 所说我们重视 Flink 社区中的每一个 connector。考虑到目前我们有越来越多的 connector 需要维护,并且专注于 
connector 的人手和资源都比较有限,有些像 Kafka 和 JDBC 这样的 connector 可能会获得更多的支持。

非常开心能看到 RabbitMQ connector 有如此多的活跃用户,也希望来自 RabbitMQ 社区的朋友能够帮助我们改进 RabbitMQ 
connector,比如迁移至 FLIP-27 Source API 和 FLIP-143 Sink API~



> 2021年10月29日 下午12:52,Leonard Xu  写道:
> 
> Hi, Peng
> 
> There’s no doubt that RabbitMQ is a good open source community with active 
> users. 
> I understand what @renqschn means is that Flink RabbitMQ  Connector is one 
> connector with few users among the many connectors in the Flink project.  
> From my observation, the connector that is used more in the Flink project 
> should be Kafka. Filesystem, JDBC and so on. So, please help us to promote 
> Flink in the RabbitMQ community and let more RabbitMQ users know and then use 
> the Flink RabbitMQ Connector, which will give the Flink community more 
> motivation to improve the Flink RabbitMQ Connector.
> 
> Best,
> Leonard
> 
>> 在 2021年10月29日,11:13,Ken Peng  写道:
>> 
>> I am one of the Forum Moderators for RabbitMQ, which does have a lot of
>> active users. :)
>> If you have any questions about RMQ please subscribe to its official group
>> and ask there.
>> rabbitmq-users+subscr...@googlegroups.com
>> 
>> Regards.
>> 
>> 
>> On Fri, Oct 29, 2021 at 11:09 AM 任庆盛  wrote:
>> 
>>> 您好,
>>> 
>>> 从代码来看 RabbitMQ Sink 的确没有语义保证。目前 RabbitMQ
>>> 由于社区用户不多,相对的维护频率也比较低,如果感兴趣的话也欢迎您参与社区的贡献~
>>> 
>>> 
>>> 
 2021年10月28日 下午7:53,wx liao  写道:
 
 你好:
 
>>> 冒昧打扰,最近项目在使用flink,sink端是rabbitmq,但是查看项目源码发现RMQSink好像并没有对消息不丢失做保证,没有看到有使用waitForConfirm()或者是confirm
>>> listener,想问一下RMQSink部分是否没有保证at least once?希望可以得到解答,谢谢。
>>> 
>>> 
> 



Re: java.lang.LinkageError: loader constraint violation

2021-10-28 文章 Qingsheng Ren
你好,

可以检查一下 Flink 集群的 lib 目录下是不是同时存在 Kafka 相关的类,从异常来看应该是有类冲突。

--
Best Regards,

Qingsheng Ren
Email: renqs...@gmail.com
On Oct 28, 2021, 10:44 AM +0800, casel.chen , wrote:
> flink作业提交报如下异常,请问root cause是什么?要怎么修复呢?
>
>
>
> Caused by: java.lang.LinkageError: loader constraint violation: loader 
> (instance of org/apache/flink/util/ChildFirstClassLoader) previously 
> initiated loading for a different type with name 
> "org/apache/kafka/clients/consumer/ConsumerRecord"
>
> at java.lang.ClassLoader.defineClass1(Native Method)
>
> at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
>
> at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
>
> at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
>
> at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
>
> at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
>
> at 
> org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:71)
>
> at 
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:48)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>
> at java.lang.Class.getDeclaredMethods0(Native Method)
>
> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
>
> at java.lang.Class.getDeclaredMethod(Class.java:2128)
>
> at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1643)
>
> at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:79)
>
> at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:520)
>
> at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at java.io.ObjectStreamClass.(ObjectStreamClass.java:494)
>
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391)
>
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134)
>
> at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
>
> at 
> org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:624)
>
> at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:143)
>
> at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:69)
>
> at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.clean(StreamExecutionEnvironment.java:2000)
>
> at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.java:1685)
>
> at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.java:1668)
>
> at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.java:1637)
>
> at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.java:1623)


Re: 回复:回复:Re: 在开启checkpoint后如何设置offset的自动提交以方便监控

2021-10-27 文章 Qingsheng Ren
你好!

如果使用的是基于 FLIP-27 实现的 KafkaSource,可以配置 enable.auto.commit = true 和 
auto.commit.interval.ms = {commit_interval} 使 KafkaSource 按照指定的时间间隔自动提交 
offset。基于 SourceFunction 的 FlinkKafkaConsumer 在 checkpoint 开启时不支持自动提交,只能在 
checkpoint 时提交位点。

--
Best Regards,

Qingsheng Ren
Email: renqs...@gmail.com
On Oct 27, 2021, 4:59 PM +0800, 杨浩 , wrote:
> 请问有办法和现有监控兼容么?开启checkpoint时,让消费组的offset实时更新
> 在 2021-10-25 21:58:28,"杨浩"  写道:
> > currentOffsets理论上OK,但是这边云上监控系统中的kafka未消费量使用的是committedOffsets
> > 在 2021-10-25 10:31:12,"Caizhi Weng"  写道:
> > > Hi!
> > >
> > > 这里的 offset 是 kafka source 的 offset 吗?其实没必要通过 checkpoint 读取 offset,可以通过
> > > metrics 读取,见 [1]。
> > >
> > > [1]
> > > https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/metrics/#kafka-connectors
> > >
> > > 杨浩  于2021年10月25日周一 上午10:20写道:
> > >
> > > > 请问下,如果启用checkpoint,因为状态比较大,checkpoint间隔设置比较大,如何让offset提交的比较快,这样方便监控程序进度


Re: Flink job消费kafka 失败,无法拿到offset值

2021-04-22 文章 Qingsheng Ren
你好 Jacob,

从错误上来看是 Kafka Consumer 没有连上 Kafka Brokers。这些方法可能帮助排查问题:

1. 确认 Flink TaskManager 和 Kafka Broker 之间的网络连通性。
2. Flink TaskManager 与 Kafka Broker 之间网络连通并不意味着能够消费数据,可能需要修改 Kafka Broker 
的配置。这篇文章[1] 或许会有帮助,绝大多数 Kafka 的连接问题是由于文章中描述的配置问题导致的。
3. 配置 Log4j 将 org.apache.kafka.clients.consumer 的 Log level 配置为 DEBUG 或 
TRACE,在日志中获取到更多的信息以帮助排查。

希望有所帮助!

[1] 
https://www.confluent.io/blog/kafka-client-cannot-connect-to-broker-on-aws-on-docker-etc/

—
Best Regards,

Qingsheng Ren
在 2021年4月14日 +0800 PM12:13,Jacob <17691150...@163.com>,写道:
> 有一个flink job在消费kafka topic消息,该topic存在于kafka两个集群cluster A 和Cluster B,Flink
> Job消费A集群的topic一切正常,但切换到消费B集群就启动失败。
>
> Flink 集群采用Docker部署,Standalone模式。集群资源充足,Slot充足。报错日志如下:
>
> java.lang.Exception: org.apache.kafka.common.errors.TimeoutException:
> Timeout of 6ms expired before the position for partition Test-topic-27
> could be determined
> at
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.checkThrowSourceExecutionException(SourceStreamTask.java:212)
> at
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.performDefaultAction(SourceStreamTask.java:132)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:298)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:403)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout of
> 6ms expired before the position for partition Test-topic-27 could be
> determined
>
> 查询一圈发现基本都是说slot不够之类的原因,已经kafka broker负载等问题,这些问题已经排除。
>
> 请指教
>
>
>
> -
> Thanks!
> Jacob
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/


Re: kafka数据源jar包使用异常

2021-04-08 文章 Qingsheng Ren
Hi,

从错误来看是在作业 JAR 里面缺少了 Flink Kafka connector 相关的类。可以确认一下 JAR 包里面是否把 Flink Kafka 
connector 相关的类打进去了,在 Maven POM 依赖中引用了 Kafka connector 并不意味着一定会被打进作业 JAR 中。

--
Best Regards,

Qingsheng Ren
Real-time Computing Department, Alibaba Cloud
Alibaba Group
Email: renqs...@gmail.com


在 2021年4月7日 +0800 PM3:27,小猫爱吃鱼 <1844061...@qq.com>,写道:
> Hi,
>     我在使用flink-1.13的过程中,尝试使用kafka数据源。
>     
> 我把flink-example中的stream-WordCount进行了修改,使其从本地kafka读取数据,直接砸idea运行也结果良好,可以正常运行。但是使用mvn打包后的jar直接提交给本地编译的flink
>  binary(本地启动的standlone flink),会报以下异常。
>
>
> java.lang.RuntimeException: Could not look up the main(String[]) method from 
> the class org.apache.flink.streaming.examples.wordcount.WordCount2: 
> org/apache/flink/stream
> ing/connectors/kafka/KafkaDeserializationSchema
> at 
> org.apache.flink.client.program.PackagedProgram.hasMainMethod(PackagedProgram.java:315)
> at org.apache.flink.client.program.PackagedProgram.(PackagedProgram.java:161)
> at org.apache.flink.client.program.PackagedProgram.(PackagedProgram.java:65)
> at 
> org.apache.flink.client.program.PackagedProgram$Builder.build(PackagedProgram.java:691)
> at org.apache.flink.client.cli.CliFrontend.buildProgram(CliFrontend.java:851)
> at 
> org.apache.flink.client.cli.CliFrontend.getPackagedProgram(CliFrontend.java:271)
> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
> at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)
> at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
> at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/flink/streaming/connectors/kafka/KafkaDeserializationSchema
> at java.lang.Class.getDeclaredMethods0(Native Method)
> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
> at java.lang.Class.getMethod0(Class.java:3018)
> at java.lang.Class.getMethod(Class.java:1784)
> at 
> org.apache.flink.client.program.PackagedProgram.hasMainMethod(PackagedProgram.java:307)
> ... 10 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.flink.streaming.connectors.kafka.KafkaDeserializationSchema
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> at 
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:64)
> at 
> org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:65)
> at 
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:48)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> ... 16 more
>
>
> 我查询的解决方式是在pom中改变对应的依赖,但是我不理解该如何处理,我在flink-example-stream的pom文件中找到了对于kafka-connector的依赖,上层的pom文件没有相关的依赖,请问我该如何处理这一问题?
> 我可以保证的是pim文件只有对于新增示例项目的修改,没有修改其他的依赖关系
>
>
> 非常感谢!