Re: Re: [ANNOUNCE] Apache Flink CDC 3.1.0 released
Amazing, congrats! Best, Jingsong On Sat, May 18, 2024 at 3:10 PM 大卫415 <2446566...@qq.com.invalid> wrote: > > 退订 > > > > > > > > Original Email > > > > Sender:"gongzhongqiang"< gongzhongqi...@apache.org >; > > Sent Time:2024/5/17 23:10 > > To:"Qingsheng Ren"< re...@apache.org >; > > Cc recipient:"dev"< d...@flink.apache.org >;"user"< u...@flink.apache.org > >;"user-zh"< user-zh@flink.apache.org >;"Apache Announce List"< > annou...@apache.org >; > > Subject:Re: [ANNOUNCE] Apache Flink CDC 3.1.0 released > > > Congratulations ! > Thanks for all contributors. > > > Best, > > Zhongqiang Gong > > Qingsheng Ren 于 2024年5月17日周五 17:33写道: > > > The Apache Flink community is very happy to announce the release of > > Apache Flink CDC 3.1.0. > > > > Apache Flink CDC is a distributed data integration tool for real time > > data and batch data, bringing the simplicity and elegance of data > > integration via YAML to describe the data movement and transformation > > in a data pipeline. > > > > Please check out the release blog post for an overview of the release: > > > > > https://flink.apache.org/2024/05/17/apache-flink-cdc-3.1.0-release-announcement/ > > > > The release is available for download at: > > https://flink.apache.org/downloads.html > > > > Maven artifacts for Flink CDC can be found at: > > https://search.maven.org/search?q=g:org.apache.flink%20cdc > > > > The full release notes are available in Jira: > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12354387 > > > > We would like to thank all contributors of the Apache Flink community > > who made this release possible! > > > > Regards, > > Qingsheng Ren > >
Re: Re: [ANNOUNCE] Apache Flink CDC 3.1.0 released
CC to the Paimon community. Best, Jingsong On Mon, May 20, 2024 at 9:55 AM Jingsong Li wrote: > > Amazing, congrats! > > Best, > Jingsong > > On Sat, May 18, 2024 at 3:10 PM 大卫415 <2446566...@qq.com.invalid> wrote: > > > > 退订 > > > > > > > > > > > > > > > > Original Email > > > > > > > > Sender:"gongzhongqiang"< gongzhongqi...@apache.org >; > > > > Sent Time:2024/5/17 23:10 > > > > To:"Qingsheng Ren"< re...@apache.org >; > > > > Cc recipient:"dev"< d...@flink.apache.org >;"user"< > > u...@flink.apache.org >;"user-zh"< user-zh@flink.apache.org >;"Apache > > Announce List"< annou...@apache.org >; > > > > Subject:Re: [ANNOUNCE] Apache Flink CDC 3.1.0 released > > > > > > Congratulations ! > > Thanks for all contributors. > > > > > > Best, > > > > Zhongqiang Gong > > > > Qingsheng Ren 于 2024年5月17日周五 17:33写道: > > > > > The Apache Flink community is very happy to announce the release of > > > Apache Flink CDC 3.1.0. > > > > > > Apache Flink CDC is a distributed data integration tool for real time > > > data and batch data, bringing the simplicity and elegance of data > > > integration via YAML to describe the data movement and transformation > > > in a data pipeline. > > > > > > Please check out the release blog post for an overview of the release: > > > > > > > > https://flink.apache.org/2024/05/17/apache-flink-cdc-3.1.0-release-announcement/ > > > > > > The release is available for download at: > > > https://flink.apache.org/downloads.html > > > > > > Maven artifacts for Flink CDC can be found at: > > > https://search.maven.org/search?q=g:org.apache.flink%20cdc > > > > > > The full release notes are available in Jira: > > > > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12354387 > > > > > > We would like to thank all contributors of the Apache Flink community > > > who made this release possible! > > > > > > Regards, > > > Qingsheng Ren > > >
Re: flinksql 经过优化后,group by字段少了
看起来像是因为 "dt = cast(CURRENT_DATE as string)" 推导 dt 这个字段是个常量,进而被优化掉了。 将 CURRENT_DATE 优化为常量的行为应该只在 batch 模式下才是这样的,你这个 SQL 是跑在 batch 模式下的嘛? ℡小新的蜡笔不见嘞、 <1515827...@qq.com.invalid> 于2024年5月19日周日 01:01写道: > > create view tmp_view as > SELECT > dt, -- 2 > uid, -- 0 > uname, -- 1 > uage -- 3 > from > kafkaTable > where dt = cast(CURRENT_DATE as string); > > insert into printSinkTable > select > dt, uid, uname, sum(uage) > from tmp_view > group by > dt, > uid, > uname; > > > > sql 比较简单,首先根据 dt = current_date 条件进行过滤,然后 按照dt、uid、uname 三个字段进行聚合求和操作。 > 但是,经过优化后,生成的 物理结构如下: > == Optimized Execution Plan == > Sink(table=[default_catalog.default_database.printSinkTable], fields=[dt, > uid, uname, EXPR$3]) > +- Calc(select=[CAST(CAST(CURRENT_DATE())) AS dt, uid, uname, EXPR$3]) > +- GroupAggregate(groupBy=[uid, uname], select=[uid, uname, > SUM(uage) AS EXPR$3]) > +- Exchange(distribution=[hash[uid, uname]]) > +- Calc(select=[uid, uname, uage], > where=[(dt = CAST(CURRENT_DATE()))]) > +- > TableSourceScan(table=[[default_catalog, default_database, kafkaTable]], > fields=[uid, uname, dt, uage]) > > > > 请问,这个时候,怎么实现按照 dt\uid\uname 三个字段聚合求和。感谢 -- Best, Benchao Li
Re: flinksql 经过优化后,group by字段少了
你引用的这个 calcite 的 issue[1] 是在 calcite-1.22.0 版本中就修复了的,Flink 应该从 1.11 版本开始就已经用的是这个 calcite 版本了。 所以你用的是哪个版本的 Flink 呢,感觉这个可能是另外一个问题。如果可以在当前最新的版本 1.19 中复现这个问题的话,可以建一个 issue 来报一个 bug。 PS: 上面我说的这个行为,我稍微确认下了,这个应该是一个代码生成阶段才做的区分,所以优化过程中并不能识别,所以即使是batch模式下,优化的plan也应该是包含dt字段的。 [1] https://issues.apache.org/jira/browse/CALCITE-3531 ℡小新的蜡笔不见嘞、 <1515827...@qq.com.invalid> 于2024年5月20日周一 11:06写道: > > 您好,当前是流任务。我跟了下代码,cast(CURRENT_DATE as string) 被识别了常量。这个问题已经在 calcite > 中修复了,https://github.com/apache/calcite/pull/1602/files > 但是,flink 中引用的 calcite 版本并没有修复这个问题。我这边通过自定义 udf 来规避了这个问题。 > > > > > -- 原始邮件 -- > 发件人: > "user-zh" > > 发送时间: 2024年5月20日(星期一) 上午10:32 > 收件人: "user-zh" > 主题: Re: flinksql 经过优化后,group by字段少了 > > > > 看起来像是因为 "dt = cast(CURRENT_DATE as string)" 推导 dt 这个字段是个常量,进而被优化掉了。 > > 将 CURRENT_DATE 优化为常量的行为应该只在 batch 模式下才是这样的,你这个 SQL 是跑在 batch 模式下的嘛? > > ℡小新的蜡笔不见嘞、 <1515827...@qq.com.invalid> 于2024年5月19日周日 01:01写道: > > > > create view tmp_view as > > SELECT > > dt, -- 2 > > uid, -- 0 > > uname, -- 1 > > uage -- 3 > > from > > kafkaTable > > where dt = cast(CURRENT_DATE as string); > > > > insert into printSinkTable > > select > > dt, uid, uname, sum(uage) > > from tmp_view > > group by > > dt, > > uid, > > uname; > > > > > > > > sql 比较简单,首先根据 dt = current_date 条件进行过滤,然后 按照dt、uid、uname 三个字段进行聚合求和操作。 > > 但是,经过优化后,生成的 物理结构如下: > > == Optimized Execution Plan == > > Sink(table=[default_catalog.default_database.printSinkTable], > fields=[dt, uid, uname, EXPR$3]) > > +- Calc(select=[CAST(CAST(CURRENT_DATE())) AS dt, uid, uname, EXPR$3]) > > +- GroupAggregate(groupBy=[uid, uname], > select=[uid, uname, SUM(uage) AS EXPR$3]) > > +- Exchange(distribution=[hash[uid, > uname]]) > > +- > Calc(select=[uid, uname, uage], where=[(dt = CAST(CURRENT_DATE()))]) > > +- > TableSourceScan(table=[[default_catalog, default_database, kafkaTable]], > fields=[uid, uname, dt, uage]) > > > > > > > > 请问,这个时候,怎么实现按照 dt\uid\uname 三个字段聚合求和。感谢 > > > > -- > > Best, > Benchao Li -- Best, Benchao Li