Re: 开源flink cep是否支持动态规则配置

2024-09-12 文章 Feng Jin
 目前还未支持。


https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=195730308


Best,
Feng


On Thu, Sep 12, 2024 at 10:20 AM 王凯 <2813732...@qq.com.invalid> wrote:

> 请问下各位大佬开源flink CEP是否支持动态规则配置
>
>
>
>
> 王凯
> 2813732...@qq.com
>
>
>
>  


Re: 回复:使用hive的catalog问题

2024-07-16 文章 Feng Jin
上面的示例好像使用的旧版本的 kafka connector 参数。

参考文档使用新版本的参数:
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/hive_catalog/#step-4-create-a-kafka-table-with-flink-sql-ddl
需要把 kafka 的 connector [1] 也放入到 lib 目录下。

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/connectors/table/kafka/

Best,
Feng

On Tue, Jul 16, 2024 at 2:11 PM Xuyang  wrote:

> lib目录下,需要放置一下flink-sql-connector-hive-3.1.3,这个包是给sql作业用的
>
>
>
>
> --
>
> Best!
> Xuyang
>
>
>
>
>
> 在 2024-07-16 13:40:23,"冯奇"  写道:
> >我看了下文档,几个包都在,还有一个单独下载依赖的包flink-sql-connector-hive-3.1.3,不知道是使用这个还是下面的?
> >// Flink's Hive connector  flink-connector-hive_2.12-1.19.1.jar  // Hive
> dependencies  hive-exec-3.1.0.jar  libfb303-0.9.3.jar // libfb303 is not
> packed into hive-exec in some versions, need to add it separately  // add
> antlr-runtime if you need to use hive dialect  antlr-runtime-3.5.2.jar
> >lib下面的包
> >antlr-runtime-3.5.2.jar flink-table-api-java-1.19.0.jar
> flink-cdc-dist-3.0.0.jar flink-table-api-java-uber-1.19.0.jar
> flink-cdc-pipeline-connector-doris-3.1.0.jar flink-table-common-1.19.0.jar
> flink-cdc-pipeline-connector-mysql-3.1.0.jar
> flink-table-planner-loader-1.19.0.jar flink-cep-1.19.0.jar
> flink-table-runtime-1.19.0.jar flink-connector-files-1.19.0.jar
> hive-exec-3.1.2.jar flink-connector-hive_2.12-1.19.0.jar libfb303-0.9.3.jar
> flink-connector-jdbc-3.1.2-1.18.jar log4j-1.2-api-2.17.1.jar
> flink-connector-kafka-3.1.0-1.18.jar log4j-api-2.17.1.jar
> flink-csv-1.19.0.jar log4j-core-2.17.1.jar flink-dist-1.19.0.jar
> log4j-slf4j-impl-2.17.1.jar flink-json-1.19.0.jar
> mysql-connector-java-8.0.28.jar flink-scala_2.12-1.19.0.jar
> paimon-flink-1.19-0.9-20240628.002224-23.jar
> >--
> >发件人:Xuyang 
> >发送时间:2024年7月16日(星期二) 11:43
> >收件人:"user-zh"
> >主 题:Re:使用hive的catalog问题
> >Hi, 可以check一下是否将hive sql connector的依赖[1]放入lib目录下或者add jar了吗?
> >[1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/overview/
> <
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/overview/
> >
> >--
> > Best!
> > Xuyang
> >At 2024-07-15 17:09:45, "冯奇"  wrote:
> >>Flink SQL> USE CATALOG myhive;
> >>Flink SQL> CREATE TABLE mykafka (name String, age Int) WITH (
> >> 'connector.type' = 'kafka',
> >> 'connector.version' = 'universal',
> >> 'connector.topic' = 'hive_sink',
> >> 'connector.properties.bootstrap.servers' = '10.0.15.242:9092',
> >> 'format.type' = 'csv',
> >> 'update-mode' = 'append'
> >>);
> >>提示下面错误:
> >>[ERROR] Could not execute SQL statement. Reason:
> >>org.apache.flink.table.factories.NoMatchingTableFactoryException: Could
> not find a suitable table factory for
> 'org.apache.flink.table.factories.TableSourceFactory' in
> >>the classpath.
> >>Reason: Required context properties mismatch.
> >>The matching candidates:
> >>org.apache.flink.table.sources.CsvAppendTableSourceFactory
> >>Mismatched properties:
> >>'connector.type' expects 'filesystem', but is 'kafka'
> >>The following properties are requested:
> >>connector.properties.bootstrap.servers=10.0.15.242:9092
> >>connector.topic=hive_sink
> >>connector.type=kafka
> >>connector.version=universal
> >>format.type=csv
> >>schema.0.data-type=VARCHAR(2147483647)
> >>schema.0.name=name
> >>schema.1.data-type=INT
> >>schema.1.name=age
> >>update-mode=append
> >>The following factories have been considered:
> >>org.apache.flink.table.sources.CsvBatchTableSourceFactory
> >>org.apache.flink.table.sources.CsvAppendTableSourceFactory
>


Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 文章 Feng Jin
Congratulations!


Best,
Feng


On Thu, Mar 21, 2024 at 11:37 AM Ron liu  wrote:

> Congratulations!
>
> Best,
> Ron
>
> Jark Wu  于2024年3月21日周四 10:46写道:
>
> > Congratulations and welcome!
> >
> > Best,
> > Jark
> >
> > On Thu, 21 Mar 2024 at 10:35, Rui Fan <1996fan...@gmail.com> wrote:
> >
> > > Congratulations!
> > >
> > > Best,
> > > Rui
> > >
> > > On Thu, Mar 21, 2024 at 10:25 AM Hang Ruan 
> > wrote:
> > >
> > > > Congrattulations!
> > > >
> > > > Best,
> > > > Hang
> > > >
> > > > Lincoln Lee  于2024年3月21日周四 09:54写道:
> > > >
> > > >>
> > > >> Congrats, thanks for the great work!
> > > >>
> > > >>
> > > >> Best,
> > > >> Lincoln Lee
> > > >>
> > > >>
> > > >> Peter Huang  于2024年3月20日周三 22:48写道:
> > > >>
> > > >>> Congratulations
> > > >>>
> > > >>>
> > > >>> Best Regards
> > > >>> Peter Huang
> > > >>>
> > > >>> On Wed, Mar 20, 2024 at 6:56 AM Huajie Wang 
> > > wrote:
> > > >>>
> > > 
> > >  Congratulations
> > > 
> > > 
> > > 
> > >  Best,
> > >  Huajie Wang
> > > 
> > > 
> > > 
> > >  Leonard Xu  于2024年3月20日周三 21:36写道:
> > > 
> > > > Hi devs and users,
> > > >
> > > > We are thrilled to announce that the donation of Flink CDC as a
> > > > sub-project of Apache Flink has completed. We invite you to
> explore
> > > the new
> > > > resources available:
> > > >
> > > > - GitHub Repository: https://github.com/apache/flink-cdc
> > > > - Flink CDC Documentation:
> > > > https://nightlies.apache.org/flink/flink-cdc-docs-stable
> > > >
> > > > After Flink community accepted this donation[1], we have
> completed
> > > > software copyright signing, code repo migration, code cleanup,
> > > website
> > > > migration, CI migration and github issues migration etc.
> > > > Here I am particularly grateful to Hang Ruan, Zhongqaing Gong,
> > > > Qingsheng Ren, Jiabao Sun, LvYanquan, loserwang1024 and other
> > > contributors
> > > > for their contributions and help during this process!
> > > >
> > > >
> > > > For all previous contributors: The contribution process has
> > slightly
> > > > changed to align with the main Flink project. To report bugs or
> > > suggest new
> > > > features, please open tickets
> > > > Apache Jira (https://issues.apache.org/jira).  Note that we will
> > no
> > > > longer accept GitHub issues for these purposes.
> > > >
> > > >
> > > > Welcome to explore the new repository and documentation. Your
> > > feedback
> > > > and contributions are invaluable as we continue to improve Flink
> > CDC.
> > > >
> > > > Thanks everyone for your support and happy exploring Flink CDC!
> > > >
> > > > Best,
> > > > Leonard
> > > > [1]
> > https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob
> > > >
> > > >
> > >
> >
>


Re: FlinkSQL connector jdbc 用户密码是否可以加密/隐藏

2024-03-10 文章 Feng Jin
1. 目前 JDBC connector 本身不支持加密, 我理解可以在提交 SQL 给 SQL 文本来做加解密的操作,或者做一些变量替换来隐藏密码。

2. 可以考虑提前创建好 jdbc catalog,从而避免编写 DDL 暴露密码。


Best,
Feng

On Sun, Mar 10, 2024 at 9:50 PM 杨东树  wrote:

> 各位好,
>考虑到数据库用户、密码安全性问题,使用FlinkSQL connector
> jdbc时,请问如何对数据库的用户密码进行加密/隐藏呢。例如如下常用sql中的password:
> CREATE TABLE wordcount_sink (
>  word String,
>  cnt BIGINT,
>  primary key (word) not enforced
> ) WITH (
>  'connector' = 'jdbc',
>  'url' = 'jdbc:mysql://localhost:3306/flink',
>  'username' = 'root',
>  'password' = '123456',
>  'table-name' = 'wordcount_sink'
> );


Re: mysql cdc streamapi与sqlapi 输出表现不相同

2024-03-01 文章 Feng Jin
这两个 print 的实现是不一样的。

 dataStream().print 是增加的 PrintSinkFunction, 该算子接受到数据会立刻打印出来, 且结果是在 TM 上打印出来。

 而 table.execute().print() 是会把最终的结果通过 collect_sink 收集之后,回传到 client, 结果是在
client 的 stdout 打印出来, 且只有在做 checkpoint 时才会回传至 client,
它的可见周期会受限于 checkpoint 的间隔。


Best,
Feng Jin

On Fri, Mar 1, 2024 at 4:45 PM ha.fen...@aisino.com 
wrote:

> sink中只是打印
>
> streamapi,checkpoint设置的精准一次
> env.fromSource(mySqlSource, WatermarkStrategy.noWatermarks(), "MySQL
> Source").print();
> 数据库改变数据后,立即就可以在控制台打印出来。
>
> sqlapi,checkpoint设置的精准一次
> Table custab = tEnv.sqlQuery("select * from orders ");
> custab.execute().print();
> 数据库改变不会立即打印,等到checkpoint打印时才会把改变的数据打印出来。并且刚启动程序的时候,打印历史数据也是在checkpoint后才打印。
> 16:39:17,109 INFO
> com.ververica.cdc.connectors.mysql.source.reader.MySqlSourceReader [] -
> Binlog offset on checkpoint 1: {ts_sec=0, file=mysql-bin.46, pos=11653,
> kind=SPECIFIC, gtids=, row=0, event=0}
> 16:39:17,231 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Completed
> checkpoint 1 for job 5bf08275f1992d1f7997fc8f7c32b6b1 (4268 bytes,
> checkpointDuration=218 ms, finalizationTime=6 ms).
> 16:39:17,241 INFO
> org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Marking
> checkpoint 1 as completed for source Source: orders[1].
>
> ++-+-++---+-+
> | op |  id | addtime |cusname
> | price |  status |
>
> ++-+-++---+-+
> | +I | 616 | 2024-02-22 16:23:11 |   name
> |  3.23 |   7 |
> | +I | 617 | 2024-03-01 11:42:03 |   name
> |  1.11 |   9 |
> | +I | 612 | 2024-01-31 13:53:49 |   name
> |  1.29 |   1 |
>
> 这是什么原因?
>


Re: Flink DataStream 作业如何获取到作业血缘?

2024-02-26 文章 Feng Jin
通过 JobGraph 可以获得 transformation 信息,可以获得具体的 Source 或者 Doris
Sink,之后再通过反射获取里面的 properties 信息进行提取。

可以参考 OpenLineage[1] 的实现.


1.
https://github.com/OpenLineage/OpenLineage/blob/main/integration/flink/shared/src/main/java/io/openlineage/flink/visitor/wrapper/FlinkKafkaConsumerWrapper.java


Best,
Feng


On Mon, Feb 26, 2024 at 6:20 PM casel.chen  wrote:

> 一个Flink DataStream 作业从mysql cdc消费处理后写入apache
> doris,请问有没有办法(从JobGraph/StreamGraph)获取到source/sink
> connector信息,包括连接字符串、数据库名、表名等?


Re: Flink Prometheus Connector问题

2024-02-23 文章 Feng Jin
我理解可以参考 FLIP 中的设计, 基于 Prometheus Remote-Write API v1.0
  来初步实现一个
SinkFunction 实现写入 Prometheus


Best,
Feng

On Fri, Feb 23, 2024 at 7:36 PM 17610775726 <17610775...@163.com> wrote:

> Hi
> 参考官网,
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/metric_reporters/#prometheuspushgateway
>
>
> Best
> JasonLee
>
>
>  回复的原邮件 
> | 发件人 | casel.chen |
> | 发送日期 | 2024年02月23日 17:35 |
> | 收件人 | user-zh@flink.apache.org |
> | 主题 | Flink Prometheus Connector问题 |
> 场景:使用Flink实时生成指标写入Prometheus进行监控告警
> 网上搜索到 https://github.com/apache/flink-connector-prometheus 项目,但内容是空的
> 另外找到FLIP-312 是关于flink prometheus connector的,
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-312%3A+Prometheus+Sink+Connector
> 请问Flink官方有没有出flink prometheus connector?
> 如果现在要实时写入prometheus的话,推荐的方式是什么?谢谢!


Re: flink sql中的自定义sink connector如何获取到source table中定义的event time和watermark?

2024-02-20 文章 Feng Jin
我理解不应该通过 rowData 获取, 可以通过 Context 获得 watermark 和 eventTime.

Best,
Feng

On Tue, Feb 20, 2024 at 4:35 PM casel.chen  wrote:

> 请问flink sql中的自定义sink connector如何获取到source table中定义的event time和watermark?
>
>
> public class XxxSinkFunction extends RichSinkFunction implements
> CheckpointedFunction, CheckpointListener {
>
>
> @Override
> public synchronized void invoke(RowData rowData, Context context)
> throws IOException {
>//  这里想从rowData中获取event time和watermark值,如何实现呢?
> }
> }
>
>
> 例如source table如下定义
>
>
> CREATE TEMPORARY TABLE source_table(
>   username varchar,
>   click_url varchar,
>   eventtime varchar,
>
>   ts AS TO_TIMESTAMP(eventtime),
>   WATERMARK FOR ts AS ts -INTERVAL'2'SECOND--为Rowtime定义Watermark。
> ) with (
>   'connector'='kafka',
>   ...
>
> );
>
>
> CREATE TEMPORARY TABLE sink_table(
>   username varchar,
>   click_url varchar,
>   eventtime varchar
> ) with (
>   'connector'='xxx',
>   ...
> );
> insert into sink_table select username,click_url,eventtime from
> source_table;


Re: Flink任务链接信息审计获取

2024-02-03 文章 Feng Jin
我理解应该是平台统一配置在 flink-conf.yaml  即可, 不需要用户单独配置相关参数.


Best,
Feng

On Sun, Feb 4, 2024 at 11:19 AM 阿华田  wrote:

> 看了一下 这样需要每个任务都配置listener,做不到系统级的控制,推动下游用户都去配置listener比较困难
>
>
> | |
> 阿华田
> |
> |
> a15733178...@163.com
> |
> 签名由网易邮箱大师定制
>
>
> 在2024年02月2日 19:38,Feng Jin 写道:
> hi,
>
> 可以参考下 OpenLineage[1] 的实现, 通过 Flink 配置JobListener 拿到 Transformation 信息,然后解析
> Source 和 Sink 拿到血缘信息。
>
> [1]
>
> https://github.com/OpenLineage/OpenLineage/blob/main/integration/flink/src/main/java/io/openlineage/flink/OpenLineageFlinkJobListener.java
>
> Best,
> Feng
>
>
> On Fri, Feb 2, 2024 at 6:36 PM 阿华田  wrote:
>
>
>
>
> 打算做flink任务画像的事情,主要是用户跑的datastream作业,在我们的实时平台运行起来之后希望能审计到使用了哪些kafka的topic,写入了哪些中间件(mysql,hbase
> ,ck 等等),大佬们有什么好的方式吗,目前flinksql可以通过sql获取到,用户自己编译的flink任务包去执行datastream作业获取不到
> | |
> 阿华田
> |
> |
> a15733178...@163.com
> |
> 签名由网易邮箱大师定制
>
>
>


Re: Flink任务链接信息审计获取

2024-02-02 文章 Feng Jin
hi,

可以参考下 OpenLineage[1] 的实现, 通过 Flink 配置JobListener 拿到 Transformation 信息,然后解析
Source 和 Sink 拿到血缘信息。

[1]
https://github.com/OpenLineage/OpenLineage/blob/main/integration/flink/src/main/java/io/openlineage/flink/OpenLineageFlinkJobListener.java

Best,
Feng


On Fri, Feb 2, 2024 at 6:36 PM 阿华田  wrote:

>
>
> 打算做flink任务画像的事情,主要是用户跑的datastream作业,在我们的实时平台运行起来之后希望能审计到使用了哪些kafka的topic,写入了哪些中间件(mysql,hbase
> ,ck 等等),大佬们有什么好的方式吗,目前flinksql可以通过sql获取到,用户自己编译的flink任务包去执行datastream作业获取不到
> | |
> 阿华田
> |
> |
> a15733178...@163.com
> |
> 签名由网易邮箱大师定制
>
>


Re: flink ui 算子数据展示一直loading...

2024-01-23 文章 Feng Jin
可以尝试着下面几种方式确认下原因:


   1.

   打开浏览器开发者模式,看是否因为请求某个接口卡住
   2.

   查看下 JobManager 的 GC 情况,是否频繁 FullGC
   3.

   查看下 JobManager 的日志,是否存在某些资源文件丢失或者磁盘异常情况导致 web UI 无法访问.


Best,
Feng


On Tue, Jan 23, 2024 at 6:16 PM 阿华田  wrote:

>
>
> 如下图,任务处理数据正常,任务状态也正常,但是flink_ui一致处于loading中,只有个别任务这样,其他正常,有可能是metirc包的某个类冲突导致的吗?
> 阿华田
> a15733178...@163.com
>
> 
> 签名由 网易邮箱大师  定制
>
>


Re: Flink-1.15版本

2023-11-23 文章 Feng Jin
看起来是类似的,不过这个报错应该是作业失败之后的报错,看还有没有其他的异常日志。


Best,
Feng

On Sat, Nov 4, 2023 at 3:26 PM Ray  wrote:

> 各位专家:当前遇到如下问题1、场景:在使用Yarn场景下提交flink任务2、版本:Flink1.15.03、日志:查看yarn上的日志发下,版本上的问题2023-11-04
> 15:04:42,313 ERROR org.apache.flink.util.FatalExitExceptionHandler
> [] - FATAL: Thread 'flink-akka.actor.internal-dispatcher-3' produced an
> uncaught exception. Stopping the process...java.lang.NoClassDefFoundError:
> akka/actor/dungeon/FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1
> at
> akka.actor.dungeon.FaultHandling.handleNonFatalOrInterruptedException(FaultHandling.scala:334)
> ~[flink-rpc-akka_abb25197-9eee-499b-8c1f-f52d0afc4374.jar:1.15.0]
> at
> akka.actor.dungeon.FaultHandling.handleNonFatalOrInterruptedException$(FaultHandling.scala:334)
> ~[flink-rpc-akka_abb25197-9eee-499b-8c1f-f52d0afc4374.jar:1.15.0]
> at
> akka.actor.ActorCell.handleNonFatalOrInterruptedException(ActorCell.scala:411)
> ~[flink-rpc-akka_abb25197-9eee-499b-8c1f-f52d0afc4374.jar:1.15.0]
> at akka.actor.ActorCell.invoke(ActorCell.scala:551)
> ~[flink-rpc-akka_abb25197-9eee-499b-8c1f-f52d0afc4374.jar:1.15.0]
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
> ~[flink-rpc-akka_abb25197-9eee-499b-8c1f-f52d0afc4374.jar:1.15.0]
> at akka.dispatch.Mailbox.run(Mailbox.scala:231)
> ~[flink-rpc-akka_abb25197-9eee-499b-8c1f-f52d0afc4374.jar:1.15.0]
> at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
> [flink-rpc-akka_abb25197-9eee-499b-8c1f-f52d0afc4374.jar:1.15.0]
> at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> [?:1.8.0_181]
> at
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> [?:1.8.0_181]
> at
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> [?:1.8.0_181]
> at
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
> [?:1.8.0_181]
> Caused by: java.lang.ClassNotFoundException:
> akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> ~[?:1.8.0_181]
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> ~[?:1.8.0_181]
> at
> org.apache.flink.core.classloading.ComponentClassLoader.loadClassFromComponentOnly(ComponentClassLoader.java:149)
> ~[flink-dist-1.15.0.jar:1.15.0]
> at
> org.apache.flink.core.classloading.ComponentClassLoader.loadClass(ComponentClassLoader.java:112)
> ~[flink-dist-1.15.0.jar:1.15.0]
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ~[?:1.8.0_181]
> ... 11 more
> 2023-11-04 15:04:42,324 ERROR
> org.apache.flink.runtime.util.ClusterUncaughtExceptionHandler [] - WARNING:
> Thread 'flink-shutdown-hook-1' produced an uncaught exception. If you want
> to fail on uncaught exceptions, then configure
> cluster.uncaught-exception-handling accordingly
> java.lang.NoClassDefFoundError:
> scala/collection/convert/Wrappers$MutableSetWrapper
> at
> scala.collection.convert.AsScalaConverters.asScalaSet(AsScalaConverters.scala:126)
> ~[flink-rpc-akka_abb25197-9eee-499b-8c1f-f52d0afc4374.jar:1.15.0]
> at
> scala.collection.convert.AsScalaConverters.asScalaSet$(AsScalaConverters.scala:124)
> ~[flink-rpc-akka_abb25197-9eee-499b-8c1f-f52d0afc4374.jar:1.15.0]
> at
> akka.util.ccompat.package$JavaConverters$.asScalaSet(package.scala:86)
> ~[flink-rpc-akka_abb25197-9eee-499b-8c1f-f52d0afc4374.jar:1.15.0]
> at
> scala.collection.convert.DecorateAsScala.$anonfun$asScalaSetConverter$1(DecorateAsScala.scala:59)
> ~[flink-rpc-akka_abb25197-9eee-499b-8c1f-f52d0afc4374.jar:1.15.0]
> at
> scala.collection.convert.Decorators$AsScala.asScala(Decorators.scala:25)
> ~[flink-rpc-akka_abb25197-9eee-499b-8c1f-f52d0afc4374.jar:1.15.0]
> at
> akka.actor.CoordinatedShutdown$tasks$.totalDuration(CoordinatedShutdown.scala:481)
> ~[flink-rpc-akka_abb25197-9eee-499b-8c1f-f52d0afc4374.jar:1.15.0]
> at
> akka.actor.CoordinatedShutdown.totalTimeout(CoordinatedShutdown.scala:784)
> ~[flink-rpc-akka_abb25197-9eee-499b-8c1f-f52d0afc4374.jar:1.15.0]
> at
> akka.actor.CoordinatedShutdown$.$anonfun$initJvmHook$1(CoordinatedShutdown.scala:271)
> ~[flink-rpc-akka_abb25197-9eee-499b-8c1f-f52d0afc4374.jar:1.15.0]
> at
> akka.actor.CoordinatedShutdown$$anon$3.run(CoordinatedShutdown.scala:814)
> ~[flink-rpc-akka_abb25197-9eee-499b-8c1f-f52d0afc4374.jar:1.15.0]
> Caused by: java.lang.ClassNotFoundException:
> scala.collection.convert.Wrappers$MutableSetWrapper
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> ~[?:1.8.0_181]
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> ~[?:1.8.0_181]
> at
> org.apache.flink.core.classloading.ComponentClassLoader.loadClassFromComponentOnly(ComponentClassLoader.java:149)
> ~[flink-dist-1.15.0.jar:1.15.0]
> at
> org.apache.flink.core.classloading.Compo

Re: Canal-json格式下乱码导致结果不符合预期

2023-11-13 文章 Feng Jin
hi

这个看起来不像是乱码造成的。

你可以尝试加上去重,还原出正确的CDC 再看下结果。

具体步骤如下:
1. 给 source 设置主键
2. table config 中设置 table.exec.source.cdc-events-duplicate 参数为 true
或者 set 'table.exec.source.cdc-events-duplicate'='true'

Best,
Feng



On Mon, Nov 13, 2023 at 4:09 PM yawning  wrote:

> mysql里面字段:
>
> `encrypted_xx` blob
>
> Canal-json
> "encrypted_xx":"\u0003üUãcA\u0018\u001A}àh\u0013\u001F æÉ"
>
>
>
> 乱码中会有}]这种特殊符号
>
>
> 普通查询:
> select * from tbl
>
> 结果符合预期:
>
> -U[273426307, xxx, uÂ&°àÈ;óX«V, üUãcA}àh æÉ, 1699359473865, 2]
> +U[273426307, xxx,
> uÂ&°àÈ;óX«V, üUãcA}àh æÉ, 1699359473865, 2]
> -U[399385197, yyy, nì¡… ^³Ø )½ÍU¸, ÉîA6³gŽŽMõýNêfŠ], 1697648682595, 0]
> +U[399385197, yyy, nì¡… ^³Ø )½ÍU¸, ÉîA6³gŽŽMõýNêfŠ], 1699359694026, 0]
> -U[399385197, yyy, nì¡… ^³Ø )½ÍU¸, ÉîA6³gŽŽMõýNêfŠ], 1699359694026, 0]
> +U[399385197, yyy, nì¡… ^³Ø )½ÍU¸, ÉîA6³gŽŽMõýNêfŠ], 1699359694242, 0]
>
> 聚合查询:
> select count(*) from tbl
>
>
> 结果错误
> +I[1]-D[1]
> +I[1]
> -D[1]
> +I[1]


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 文章 Feng Jin
Thanks for the great work! Congratulations


Best,
Feng Jin

On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu  wrote:

> Congratulations, Well done!
>
> Best,
> Leonard
>
> On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee 
> wrote:
>
> > Thanks for the great work! Congrats all!
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Jing Ge  于2023年10月27日周五 00:16写道:
> >
> > > The Apache Flink community is very happy to announce the release of
> > Apache
> > > Flink 1.18.0, which is the first release for the Apache Flink 1.18
> > series.
> > >
> > > Apache Flink® is an open-source unified stream and batch data
> processing
> > > framework for distributed, high-performing, always-available, and
> > accurate
> > > data applications.
> > >
> > > The release is available for download at:
> > > https://flink.apache.org/downloads.html
> > >
> > > Please check out the release blog post for an overview of the
> > improvements
> > > for this release:
> > >
> > >
> >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > >
> > > The full release notes are available in Jira:
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352885
> > >
> > > We would like to thank all contributors of the Apache Flink community
> who
> > > made this release possible!
> > >
> > > Best regards,
> > > Konstantin, Qingsheng, Sergey, and Jing
> > >
> >
>


Re: kafka_appender收集flink任务日志连接数过多问题

2023-10-19 文章 Feng Jin
可以考虑在每台 yarn 机器部署日志服务(可收集本机的日志到 kafka)
yarn container -> 单机的日志服务 -> kafka.



On Mon, Oct 16, 2023 at 3:58 PM 阿华田  wrote:

>
> Flink集群使用kafka_appender收集flink产生的日志,但是现在实时运行的任务超过了三千个,运行的yarn-container有20万+。导致存储日志的kafka集群连接数过多,kafka集群压力有点大,请教各位大佬flink日志采集这块还有什么好的方式吗
>
>
> | |
> 阿华田
> |
> |
> a15733178...@163.com
> |
> 签名由网易邮箱大师定制
>
>


Re: flink sql不支持show create catalog 吗?

2023-10-19 文章 Feng Jin
hi casel


从 1.18 开始,引入了 CatalogStore,持久化了 Catalog 的配置,确实可以支持 show create catalog 了。


Best,
Feng

On Fri, Oct 20, 2023 at 11:55 AM casel.chen  wrote:

> 之前在flink sql中创建过一个catalog,现在想查看当初创建catalog的语句复制并修改一下另存为一个新的catalog,发现flink
> sql不支持show create catalog 。
> 而据我所知doris是支持show create catalog语句的。flink sql能否支持一下呢?


Re: flink两阶段提交

2023-10-07 文章 Feng Jin
hi,

可以参考这篇博客,描述的非常清晰:
https://flink.apache.org/2018/02/28/an-overview-of-end-to-end-exactly-once-processing-in-apache-flink-with-apache-kafka-too/


Best,
Feng

On Sun, Sep 24, 2023 at 9:54 PM 海风 <18751805...@163.com> wrote:

> 请教一下,flink的两阶段提交对于sink算子,预提交是在做检查点的哪个阶段触发的?预提交时具体是做了什么工作?
>
>
>


Re: Flink CDC消费Apache Paimon表

2023-10-07 文章 Feng Jin
hi casel

Flink 实时消费 paimon,默认情况就是全量 + 增量的方式。

具体可以参考:  https://paimon.apache.org/docs/master/maintenance/configurations/
中的 scan.mode 参数


best,
Feng

On Fri, Sep 29, 2023 at 5:50 PM casel.chen  wrote:

> 目前想要通过Flink全量+增量消费Apache Paimon表需要分别起离线和增量消费两个作业,比较麻烦,而且无法无缝衔接,能否通过类似Flink
> CDC消费mysql表的方式消费Apache Paimon表?


Re: flink-metrics如何获取applicationid

2023-08-30 文章 Feng Jin
hi,

可以尝试获取下 _APP_ID  这个 JVM 环境变量.
System.getenv(YarnConfigKeys.ENV_APP_ID);

https://github.com/apache/flink/blob/6c9bb3716a3a92f3b5326558c6238432c669556d/flink-yarn/src/main/java/org/apache/flink/yarn/YarnConfigKeys.java#L28


Best,
Feng

On Wed, Aug 30, 2023 at 7:14 PM allanqinjy  wrote:

> hi,
>请教大家一个问题,就是在上报指标到prometheus时候,jobname会随机生成一个后缀,看源码也是new Abstract
> ID(),有方法在这里获取本次上报的作业applicationid吗?


Re: flink sql语句转成底层处理函数

2023-08-28 文章 Feng Jin
Loglevel 设置为 debug 之后,可以看到具体的 codegen 的代码。

On Mon, Aug 28, 2023 at 1:25 PM 海风 <18751805...@163.com> wrote:

> 嗯,执行计划确实可以看到一些信息,只是还想知道是否还有比较好的方式能看具体有哪些底层函数以及状态,从而更方便去分析性能相关问题的
>
>
>
>  回复的原邮件 
> | 发件人 | Shammon FY |
> | 日期 | 2023年08月28日 12:05 |
> | 收件人 | user-zh@flink.apache.org |
> | 抄送至 | |
> | 主题 | Re: flink sql语句转成底层处理函数 |
> 如果想看一个sql被转换后包含哪些具体执行步骤,可以通过explain语法[1]查看执行计划
>
> [1]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/explain/
>
> On Sun, Aug 27, 2023 at 5:23 PM 海风 <18751805...@163.com> wrote:
>
> > 请教下,是否可以去查询一个flink
> > sql提交运行后,flink给它转成的底层处理函数到底是什么样的,假如涉及状态计算,flink给这个sql定义的状态变量是哪些呢?
> >
> >
> >
>


Re: 建议Flink ROWKIND做成元数据metadata

2023-07-18 文章 Feng Jin
Hi casel

之前有类似的讨论, 不过暴露 ROWKIND 之后可能可以会造成 SQL 语义上的不明确,你可以在 dev 邮件在发起讨论看看,看看大家的想法。

https://issues.apache.org/jira/browse/FLINK-24547

Best,
Feng

On Wed, Jul 19, 2023 at 12:06 AM casel.chen  wrote:

> 社区无人响应吗?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 在 2023-07-15 12:19:46,"casel.chen"  写道:
> >Flink社区能否考虑将ROWKIND做成元数据metadata,类似于kafka
> connector中的offset和partition等,用户可以使用这些ROWKIND
> metadata进行处理,例如对特定ROWKIND过滤或者答输出数据中添加ROWKIND字段
>


Re: Re: flink sql作业指标名称过长把prometheus内存打爆问题

2023-06-14 文章 Feng Jin
org___AS_huifuSecOrg__CASE__huifu_thd_org_IS_NULL_OR__TRIM_FLAG_BOTHUTF_16LE_huifu_thd_org_UTF_16LE__:VARCHAR_2147483647__CHARACTER_SET__UTF_16LE___OR__TRIM_FLAG_BOTHUTF_16LE_huifu_thd_org_UTF_16LE_null_:VARCHAR_2147483647__CHARACTER_SET__UTF_16LE__UTF_16LE_defalut_:VARCHAR_2147483647__CHARACTER_SET__UTF_16LE___TRIM_FLAG_BOTHUTF_16LE_huifu_thd_org___AS_huifuThdOrg__CASE__huifu_for_org_IS_NULL_OR__TRIM_FLAG_BOTHUTF_16LE_huifu_for_org_UTF_16LE__:VARCHAR_2147483647__CHARACTER_SET__UTF_16LE___OR__TRIM_FLAG_BOTHUTF_16LE_huifu_for_org_UTF_16LE_null_:VARCHAR_2147483647__CHARACTER_SET__UTF_16LE__UTF_16LE_defalut_:VARCHAR_2147483647__CHARACTER_SET__UTF_16LE___TRIM_FLAG_BOTHUTF_16LE_huifu_for_org___AS_huifuForOrg__CASE__huifu_sales_sub_IS_NULL_OR__TRIM_FLAG_BOTHUTF_16LE_huifu_sales_sub_UTF_16LE__:VARCHAR_2147483647__CHARACTER_SET__UTF_16LE___OR__TRIM_FLAG_BOTHUTF_16LE_huifu_sales_sub_UTF_16LE_null_:VARCHAR_2147483647__CHARACTER_SET__UTF_16LE__UTF_16LE_defalut_:VARCHAR_2147483647__CHARACTER_SET__UTF_16LE___TRIM_FLAG_BOTHUTF_16LE_huifu_sales_sub___AS_huifuSales__DATE_FORMAT_CAST_LOCALTIMESTAMP__UTF_16LE_MMddHHmmssSSS___AS_synModifyTime__CAST_CURRENT_TIMESTAMPAS_synTtlDate__NotNullEnforcer_fields__serviceId__Sink:_Sink_table__hive_default_mongodb_active_channel_sink___fields__transDate__serviceId__huifuFstOrg__huifuSecOrg__huifuThdOrg__huifuForOrg__huifuSales__synModifyTime__synTtlDate__",task_attempt_num="1",job_name="tb_top_top_trans_order_binlog2mongo",tm_id="tb_top_top_trans_order_binlog2mongo_taskmanager_1_112",subtask_index="35",}
> 73144.0
>
>
>
>
>
> 在 2023-06-13 10:13:17,"Feng Jin"  写道:
> >hi casel
> >
> >1. 可以考虑使用 Flink1.15, 使用精简的 operator name
> >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/config/#table-exec-simplify-operator-name-enabled
> >
> >2.  Flink 也提供了 restful 接口直接获取瞬时的 metric,如果不需要历史的 metric
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobmanager-metrics
> >
> >
> >Best,
> >Feng
> >
> >On Tue, Jun 13, 2023 at 8:51 AM casel.chen  wrote:
> >
> >> 线上跑了200多个flink
> >>
> sql作业,接了prometheus指标(prometheus定期来获取作业指标)监控后没跑一会儿就将prometheus内存打爆(开了64GB内存),查了一下是因为指标名称过长导致的。
> >> flink
> >>
> sql作业的指标名称一般是作业名称+算子名称组成的,而算子名称是由sql内容拼出来的,在select字段比较多或sql较复杂的情况下容易生成过长的名称,
> >> 请问这个问题有什么好的办法解决吗?
>


Re: flink sql作业指标名称过长把prometheus内存打爆问题

2023-06-12 文章 Feng Jin
hi casel

1. 可以考虑使用 Flink1.15, 使用精简的 operator name

https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/config/#table-exec-simplify-operator-name-enabled

2.  Flink 也提供了 restful 接口直接获取瞬时的 metric,如果不需要历史的 metric

https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobmanager-metrics


Best,
Feng

On Tue, Jun 13, 2023 at 8:51 AM casel.chen  wrote:

> 线上跑了200多个flink
> sql作业,接了prometheus指标(prometheus定期来获取作业指标)监控后没跑一会儿就将prometheus内存打爆(开了64GB内存),查了一下是因为指标名称过长导致的。
> flink
> sql作业的指标名称一般是作业名称+算子名称组成的,而算子名称是由sql内容拼出来的,在select字段比较多或sql较复杂的情况下容易生成过长的名称,
> 请问这个问题有什么好的办法解决吗?


Re: Flink RocketMQ Connector

2023-05-26 文章 Feng Jin
hi casel

Flink RocketMQ connector 是由 RockeMQ 社区维护的, 对应的项目地址是:
https://github.com/apache/rocketmq-flink  这个版本默认的消息是格式 DELIMIT 格式(默认消息是
String,按分隔符进行分割), 只能指定消息的列分隔符.


best,
feng


On Fri, May 26, 2023 at 7:44 PM casel.chen  wrote:

> 有没有Flink RocketMQ官方连接器? 需要自己开发吗?Flink生态组件网址(用户上传自己开发的连接器格式什么的)是什么?


Re: flink datastream api写的代码如何在idea中调试

2023-04-22 文章 Feng Jin
如果你是要本地 idea debug 线上的作业,需要在 taskManager 的 JVM 参数中开启debug

提交作业时, 添加参数:
env.java.opts.taskmanager="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005"



然后在 idea 创建一个 remote debug 启动项,连接到线上的 TaskManager 所在的机器 IP 即可。之后即可在在 idea
中打断点,或者截取执行栈
*(前提是你本地的机器和线上的机器网络是互通的)*

参考:
https://www.jetbrains.com/help/idea/tutorial-remote-debug.html#174f812f

---
Best,
Feng Jin

On Sat, Apr 22, 2023 at 10:04 PM m18751805115_1 <18751805...@163.com> wrote:

> 抱歉啊,可能我没有把问题描述清楚。我是想本地对代码进行debug调试,观察每条流输入后的变量值以及调用栈等信息的。
>
>
>
> ---- 回复的原邮件 
> | 发件人 | Feng Jin |
> | 日期 | 2023年04月22日 21:53 |
> | 收件人 | user-zh@flink.apache.org |
> | 抄送至 | |
> | 主题 | Re: flink datastream api写的代码如何在idea中调试 |
> 支持的, 在 idea 中执行 main 函数即可.执行前,idea 中的运行配置中,最好勾选上: *Include dependencies
> with "Provided" scope *否则有可能会有 class not found 的报错.
>
>
> 
> Best,
> Feng Jin
>
> On Sat, Apr 22, 2023 at 9:28 PM m18751805115_1 <18751805...@163.com>
> wrote:
>
> > 请教一下,在idea中用flink datastream
> >
> api写的代码,source输入是一条一条socket流数据,那如何在本地idea中进行调试,观察每条输入数据的运行情况,idea是否支持这种调试?
> >
> >
> >
>


Re: flink datastream api写的代码如何在idea中调试

2023-04-22 文章 Feng Jin
支持的, 在 idea 中执行 main 函数即可.执行前,idea 中的运行配置中,最好勾选上: *Include dependencies
with "Provided" scope *否则有可能会有 class not found 的报错.


----
Best,
Feng Jin

On Sat, Apr 22, 2023 at 9:28 PM m18751805115_1 <18751805...@163.com> wrote:

> 请教一下,在idea中用flink datastream
> api写的代码,source输入是一条一条socket流数据,那如何在本地idea中进行调试,观察每条输入数据的运行情况,idea是否支持这种调试?
>
>
>