Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 Thread Jark Wu
Congratulations and welcome! Best, Jark On Thu, 21 Mar 2024 at 10:35, Rui Fan <1996fan...@gmail.com> wrote: > Congratulations! > > Best, > Rui > > On Thu, Mar 21, 2024 at 10:25 AM Hang Ruan wrote: > > > Congrattulations! > > > > Best, > > Hang > > > > Lincoln Lee 于2024年3月21日周四 09:54写道: > > >

Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 Thread Jark Wu
Congratulations and welcome! Best, Jark On Thu, 21 Mar 2024 at 10:35, Rui Fan <1996fan...@gmail.com> wrote: > Congratulations! > > Best, > Rui > > On Thu, Mar 21, 2024 at 10:25 AM Hang Ruan wrote: > > > Congrattulations! > > > > Best, > > Hang > > > > Lincoln Lee 于2024年3月21日周四 09:54写道: > > >

Re: [ANNOUNCE] Apache Flink 1.19.0 released

2024-03-18 Thread Jark Wu
Congrats! Thanks Lincoln, Jing, Yun and Martijn driving this release. Thanks all who involved this release! Best, Jark On Mon, 18 Mar 2024 at 16:31, Rui Fan <1996fan...@gmail.com> wrote: > Congratulations, thanks for the great work! > > Best, > Rui > > On Mon, Mar 18, 2024 at 4:26 PM Lincoln

Re: [ANNOUNCE] Apache Flink 1.19.0 released

2024-03-18 Thread Jark Wu
Congrats! Thanks Lincoln, Jing, Yun and Martijn driving this release. Thanks all who involved this release! Best, Jark On Mon, 18 Mar 2024 at 16:31, Rui Fan <1996fan...@gmail.com> wrote: > Congratulations, thanks for the great work! > > Best, > Rui > > On Mon, Mar 18, 2024 at 4:26 PM Lincoln

Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Jark Wu
Congratulations and thanks release managers and everyone who has contributed! Best, Jark On Fri, 27 Oct 2023 at 12:25, Hang Ruan wrote: > Congratulations! > > Best, > Hang > > Samrat Deb 于2023年10月27日周五 11:50写道: > > > Congratulations on the great release > > > > Bests, > > Samrat > > > > On

Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Jark Wu
Congratulations and thanks release managers and everyone who has contributed! Best, Jark On Fri, 27 Oct 2023 at 12:25, Hang Ruan wrote: > Congratulations! > > Best, > Hang > > Samrat Deb 于2023年10月27日周五 11:50写道: > > > Congratulations on the great release > > > > Bests, > > Samrat > > > > On

Re: [DISCUSS][FLINK-31788][FLINK-33015] Add back Support emitUpdateWithRetract for TableAggregateFunction

2023-09-07 Thread Jark Wu
+1 to fix it first. I also agree to deprecate it if there are few people using it, but this should be another discussion thread within dev+user ML. In the future, we are planning to introduce user-defined-operator based on the TVF functionality which I think can fully subsume the UDTAG, cc @Timo

Re: [ANNOUNCE] Apache Flink has won the 2023 SIGMOD Systems Award

2023-07-03 Thread Jark Wu
Congrats everyone! Best, Jark > 2023年7月3日 22:37,Yuval Itzchakov 写道: > > Congrats team! > > On Mon, Jul 3, 2023, 17:28 Jing Ge via user > wrote: >> Congratulations! >> >> Best regards, >> Jing >> >> >> On Mon, Jul 3, 2023 at 3:21 PM yuxia >

Re: [DISCUSS] Hive dialect shouldn't fall back to Flink's default dialect

2023-06-01 Thread Jark Wu
+1, I think this can make the grammar more clear. Please remember to add a release note once the issue is finished. Best, Jark On Thu, 1 Jun 2023 at 11:28, yuxia wrote: > Hi, Jingsong. It's hard to provide an option regarding to the fact that we > also want to decouple Hive with flink planner.

Re: [DISCUSS] Reverting sink metric name changes made in 1.15

2022-10-13 Thread Jark Wu
t;>>>>> metrics are quite “trivial” to be noticed as public APIs. As mentioned by >>>>>> Martijn I couldn’t find a place noting that metrics are public APIs and >>>>>> should be treated carefully while contributing and reviewing. >>>>&g

Re: [DISCUSS] Reverting sink metric name changes made in 1.15

2022-10-10 Thread Jark Wu
Thanks for discovering this problem, Qingsheng! I'm also +1 for reverting the breaking changes. IIUC, currently, the behavior of "numXXXOut" metrics of the new and old sink is inconsistent. We have to break one of them to have consistent behavior. Sink V2 is an evolving API which is just

Re: [ANNOUNCE] Welcome to join the Apache Flink community on Slack

2022-06-02 Thread Jark Wu
Thank Xingtong for making this possible! Cheers, Jark Wu On Thu, 2 Jun 2022 at 15:31, Xintong Song wrote: > Hi everyone, > > I'm very happy to announce that the Apache Flink community has created a > dedicated Slack workspace [1]. Welcome to join us on Slack. > > ## Join t

Re: [ANNOUNCE] Welcome to join the Apache Flink community on Slack

2022-06-02 Thread Jark Wu
Thank Xingtong for making this possible! Cheers, Jark Wu On Thu, 2 Jun 2022 at 15:31, Xintong Song wrote: > Hi everyone, > > I'm very happy to announce that the Apache Flink community has created a > dedicated Slack workspace [1]. Welcome to join us on Slack. > > ## Join t

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-12 Thread Jark Wu
ng. > > >>>>>>>> >> > >> > > >>>>>>>> >> > >> While I share the same concerns as those mentioned in > the > > >>>>>>>> previous > > >>>>>>>> >>

Re: Fwd: [DISCUSS] Flink's supported APIs and Hive query syntax

2022-03-10 Thread Jark Wu
ket for that. >> >> It would be great to get some insights from currently Flink and Hive >> users which versions are being used. >> @Jark I would indeed deprecate the old Hive versions in Flink 1.15 and >> then drop them in Flink 1.16. That would also remove some tech d

Re: Fwd: [DISCUSS] Flink's supported APIs and Hive query syntax

2022-03-09 Thread Jark Wu
Thanks Martijn for the reply and summary. I totally agree with your plan and thank Yuxia for volunteering the Hive tech debt issue. I think we can create an umbrella issue for this and target version 1.16. We can discuss details and create subtasks there. Regarding dropping old Hive versions,

Re: Re: [DISCUSS] Flink's supported APIs and Hive query syntax

2022-03-08 Thread Jark Wu
Hi Martijn, Thanks for starting this discussion. I think it's great for the community to to reach a consensus on the roadmap of Hive query syntax. I agree that the Hive project is not actively developed nowadays. However, Hive still occupies the majority of the batch market and the Hive

Re: flinkCDC2.1.1

2022-01-06 Thread Jark Wu
Flink CDC 的问题可以 report 到 https://github.com/ververica/flink-cdc-connectors/issues到 On Thu, 30 Dec 2021 at 14:08, Liu Join wrote: > 使用flinkCDC2.1.1读取MySQL数据,一段时间后报错 > 图床链接:报错图片 > > > > 从 Windows 版邮件 发送 > > >

Re: 邮件归档访问不了

2022-01-06 Thread Jark Wu
nabble 服务挂了,用这个地址吧:https://lists.apache.org/list.html?d...@flink.apache.org On Fri, 31 Dec 2021 at 18:29, Ada Wong wrote: > 想看当时的讨论情况,但是这个访问不了。 > > >

Re: flink mysql cdc同步字段不识别

2022-01-06 Thread Jark Wu
这个报错日志应该没有关系,是 rest client 的报错,不是正常数据处理流程的报错。 mysql-cdc 没有 jackson json 解析相关的代码。 On Wed, 5 Jan 2022 at 17:09, Fei Han wrote: > > @all: > Flink mysql cdc同步数据报字段不识别,是什么原因造成的?难道是关键字不识别?报错日志如下: > > httpResponseStatus=200 OK} >

Re: [DISCUSS] Creating an external connector repository

2021-10-20 Thread Jark Wu
gt; each connector as Arvid suggested. IMO we shouldn't block this effort on > the stability of the APIs. > > Cheers, > > Konstantin > > > > On Wed, Oct 20, 2021 at 8:56 AM Jark Wu wrote: > >> Hi, >> >> I think Thomas raised very good questions and would like to

Re: [DISCUSS] Creating an external connector repository

2021-10-20 Thread Jark Wu
Hi, I think Thomas raised very good questions and would like to know your opinions if we want to move connectors out of flink in this version. (1) is the connector API already stable? > Separate releases would only make sense if the core Flink surface is > fairly stable though. As evident from

Re: [ANNOUNCE] Flink mailing lists archive service has migrated to Apache Archive service

2021-09-06 Thread Jark Wu
Thanks Leonard, I have seen many users complaining that the Flink mailing list doesn't work (they were using Nabble). I think this information would be very helpful. Best, Jark On Mon, 6 Sept 2021 at 16:39, Leonard Xu wrote: > Hi, all > > The mailing list archive service Nabble Archive was

Re: [ANNOUNCE] Flink mailing lists archive service has migrated to Apache Archive service

2021-09-06 Thread Jark Wu
Thanks Leonard, I have seen many users complaining that the Flink mailing list doesn't work (they were using Nabble). I think this information would be very helpful. Best, Jark On Mon, 6 Sept 2021 at 16:39, Leonard Xu wrote: > Hi, all > > The mailing list archive service Nabble Archive was

Re: [DISCUSS] Better user experience in the WindowAggregate upon Changelog (contains update message)

2021-07-01 Thread Jark Wu
cleaned accurately by watermark. We don't need to expose `table.exec.emit.late-fire.enabled` on docs and can remove it in the next version. Best, Jark On Thu, 1 Jul 2021 at 21:20, Jark Wu wrote: > Thanks Jing for bringing up this topic, > > The emit strategy configs are annotated as Exp

Re: [DISCUSS] Better user experience in the WindowAggregate upon Changelog (contains update message)

2021-07-01 Thread Jark Wu
Thanks Jing for bringing up this topic, The emit strategy configs are annotated as Experiential and not public on documentations. However, I see this is a very useful feature which many users are looking for. I have posted these configs for many questions like "how to handle late events in SQL".

Re: [Flink SQL] Lookup join hbase problem

2021-06-28 Thread Jark Wu
Yes. Currently, the HBase lookup source only supports lookup on rowkey. If there is more than one join on condition, it may fail. We should support lookup HBase on multiple fields (by Get#setFilter). Feel free to open issues. Best, Jark On Mon, 28 Jun 2021 at 12:48, 纳兰清风 wrote: > Hi, > >

Re: Questions about UPDATE_BEFORE row kind in ElasticSearch sink

2021-06-28 Thread Jark Wu
UPDATE_BEFORE is required in cases such as Aggregation with Filter. For example: SELECT * FROM ( SELECT word, count(*) as cnt FROM T GROUP BY word ) WHERE cnt < 3; There is more discussion in this issue: https://issues.apache.org/jira/browse/FLINK-9528 Best, Jark On Mon, 28 Jun 2021 at

Re: 中文教程更新不及时问题

2021-06-23 Thread Jark Wu
Hi Kevin, 欢迎来到 Apache Flink 开源社区!正如唐云所说,社区非常欢迎每一个贡献,也很珍惜每一份贡献。 但是中文文档的维护是一个非常庞大的工作,涉及到所有的模块,所以需要很多模块的 committer 的协作, 所以有时候难免会有更新不及时。 如果你有发现未翻译的页面且没有相关 JIRA issue,可以直接去创建 issue 并提交 PR。 如果已有相关 issue 和 PR,可以帮助 review,社区目前更缺高质量的 reviewer,这更能加速很多翻译的进度。 Best, Jark On Wed, 23 Jun 2021 at 11:04, Yun

Re: hbase async lookup能否保证输出结果有序?

2021-06-17 Thread Jark Wu
可以看下 AsyncWaitOperator 的源码实现。 Best, Jark On Tue, 15 Jun 2021 at 18:53, zilong xiao wrote: > 想了解下这块如何保证100%有序的呢,感觉异步查询是不是无法保证查询结果的先后,比如网络原因等等。 > > Jingsong Li 于2021年6月15日周二 下午5:07写道: > > > 是有序的。 > > > > 无序的mode目前并没有支持, 目前可能会影响流计算的正确性 > > > > Best, > > Jingsong > > > > On Tue, Jun 15, 2021 at

Re: 邮件退订

2021-06-17 Thread Jark Wu
退订请发送到 user-zh-unsubscr...@flink.apache.org 而不是 user-zh@flink.apache.org Best, Jark On Thu, 17 Jun 2021 at 09:29, wangweigu...@stevegame.cn < wangweigu...@stevegame.cn> wrote: > > 邮箱变更,退订! > > > >

Re: 退订

2021-06-17 Thread Jark Wu
退订请发送到 user-zh-unsubscr...@flink.apache.org 而不是 user-zh@flink.apache.org Best, Jark On Tue, 15 Jun 2021 at 23:56, frank.liu wrote: > 退订 > > > | | > frank.liu > | > | > frank...@163.com > | > 签名由网易邮箱大师定制

Re: Re: flink sql cdc做计算后, elastic search connector 多并发sink 会有问题,偶现!数据量较大,目前还没较好复现。请问有人是否也遇到过类似的问题。

2021-06-17 Thread Jark Wu
社区最近重新设计了 mysql-cdc 的实现,可以支持全量阶段并发读取、checkpoint,移除全局锁依赖。 可以关注 GitHub 仓库的动态 https://github.com/ververica/flink-cdc-connectors。 7月的 meetup 上也会分享相关设计和实现,敬请期待。 Best, Jark On Thu, 17 Jun 2021 at 09:34, casel.chen wrote: > Flink CDC什么时候能够支持修改并行度,进行细粒度的资源控制?目前我也遇到flink sql >

Re: Add control mode for flink

2021-06-07 Thread Jark Wu
Thanks Xintong for the summary, I'm big +1 for this feature. Xintong's summary for Table/SQL's needs is correct. The "custom (broadcast) event" feature is important to us and even blocks further awesome features and optimizations in Table/SQL. I also discussed offline with @Yun Gao several

Re: How to recovery from last count when using CUMULATE window after restart flink-sql job?

2021-05-09 Thread Jark Wu
Hi, When restarting a Flink job, Flink will start the job with an empty state, because this is a new job. This is not a special for CUMULATE window, but for all Flink jobs. If you want to restore a Flink job from a state/savepoint, you have to specify the savepoint path, see [1]. Best, Jark

Re: Protobuf support with Flink SQL and Kafka Connector

2021-05-05 Thread Jark Wu
Hi Shipeng, Matthias is correct. FLINK-18202 should address this topic. There is already a pull request there which is in good shape. You can also download the PR and build the format jar yourself, and then it should work with Flink 1.12. Best, Jark On Mon, 3 May 2021 at 21:41, Matthias Pohl

Re: Nondeterministic results with SQL job when parallelism is > 1

2021-04-17 Thread Jark Wu
, Dylan Forciea wrote: > Jark, > > > > Thanks for the heads up! I didn’t see this behavior when running in batch > mode with parallelism turned on. Is it safe to do this kind of join in > batch mode right now, or am I just getting lucky? > > > > Dylan > > >

Re: Nondeterministic results with SQL job when parallelism is > 1

2021-04-16 Thread Jark Wu
HI Dylan, I think this has the same reason as https://issues.apache.org/jira/browse/FLINK-20374. The root cause is that changelogs are shuffled by `attr` at second join, and thus records with the same `id` will be shuffled to different join tasks (also different sink tasks). So the data arrived

Re: flink sql count distonct 优化

2021-03-26 Thread Jark Wu
> 如果不是window agg,开启参数后flink会自动打散是吧 是的 > 那关于window agg, 不能自动打散,这部分的介绍,在文档中可以找到吗? 文档中没有说明。 这个文档[1] 里说地都是针对 unbounded agg 的优化。 Best, Jark [1]: https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/tuning/streaming_aggregation_optimization.html#split-distinct-aggregation On Fri,

Re: pipeline.auto-watermark-interval vs setAutoWatermarkInterval

2021-03-26 Thread Jark Wu
IIUC, pipeline.auto-watermak-interval = 0 just disable **periodic** watermark emission, it doesn't mean the watermark will never be emitted. In Table API/SQL, it has the same meaning. If watermark interval = 0, we disable periodic watermark emission, and emit watermark once it advances. So I

Re: flink sql count distonct 优化

2021-03-25 Thread Jark Wu
我看你的作业里面是window agg,目前 window agg 还不支持自动拆分。1.13 的基于 window tvf 的 window agg支持这个参数了。可以期待下。 Best, Jark On Wed, 24 Mar 2021 at 19:29, Robin Zhang wrote: > Hi,guomuhua > 开启本地聚合,是不需要自己打散进行二次聚合的哈,建议看看官方的文档介绍。 > > Best, > Robin > > > guomuhua wrote > > 在SQL中,如果开启了 local-global 参数:set > >

Re: LocalWatermarkAssigner causes predicate pushdown to be skipped

2021-03-07 Thread Jark Wu
Hi Yuval, That's correct you will always get a LogicalWatermarkAssigner if you assigned a watermark. If you implement SupportsWatermarkPushdown, the LogicalWatermarkAssigner will be pushed into TableSource, and then you can push Filter into source if source implement SupportsFilterPushdown.

Re: sql 动态修改参数问题

2021-03-04 Thread Jark Wu
看起来是个分段优化复用节点的bug,可以去 JIRA 开个 issue。 Best, Jark On Thu, 4 Mar 2021 at 19:37, 酷酷的浑蛋 wrote: > StatementSet statementSet = tableEnvironment.createStatementSet(); > String sql1 = "insert into test select a,b,c from test_a_12342 /*+ > OPTIONS('table-name'='test_a_1')*/"; > String sql2 = "insert

Re: Flink SQL upsert-kafka connector 生成的 Stage ChangelogNormalize 算子的疑问

2021-03-04 Thread Jark Wu
1. 对于 upsert-kafka 会默认加上 ChangelogNormalize 2. ChangelogNormalize 会用 env 并发,所以可以认为能突破你说的并发限制。kafka + canal-json 也能用,但是要加上 table.exec.source.cdc-events-duplicate = true 参数[1]才能开启。但是要注意 ChangelogNormalize 是一个 stateful 节点,本身也是有性能开销的,总体性能可能还不如 forward。 Best, Jark [1]:

Re: [DISCUSS] Deprecation and removal of the legacy SQL planner

2021-03-04 Thread Jark Wu
big +1 from my side. Best, Jark On Thu, 4 Mar 2021 at 20:59, Leonard Xu wrote: > +1 for the roadmap. > > Thanks Timo for driving this. > > Best, > Leonard > > > 在 2021年3月4日,20:40,Timo Walther 写道: > > > > Last call for feedback on this topic. > > > > It seems everyone agrees to finally

Re: [DISCUSS] Deprecation and removal of the legacy SQL planner

2021-03-04 Thread Jark Wu
big +1 from my side. Best, Jark On Thu, 4 Mar 2021 at 20:59, Leonard Xu wrote: > +1 for the roadmap. > > Thanks Timo for driving this. > > Best, > Leonard > > > 在 2021年3月4日,20:40,Timo Walther 写道: > > > > Last call for feedback on this topic. > > > > It seems everyone agrees to finally

Re: Union fields with time attributes have different types

2021-02-28 Thread Jark Wu
Hi Sebastián, `endts` in your case is a time attribute which is slightly different than a regular TIMESTAMP type. You can manually `cast(endts as timestamp(3)` to make this query work which removes the time attribute meta. SELECT `evt`, `value`, `startts`, cast(endts as timestamp(3) FROM aggs_1m

Re: Best way to handle BIGING to TIMESTAMP conversions

2021-02-28 Thread Jark Wu
Hi Sebastián, You can use `TO_TIMESTAMP(FROM_UNIXTIME(e))` to get a timestamp value. The BIGINT should be in seconds. Please note to declare the computed column in DDL schema and declare a watermark strategy on this computed field to make the field to be a rowtime attribute. Because streaming

Re: LEAD/LAG functions

2021-02-01 Thread Jark Wu
Yes. RANK/ROW_NUMBER is not allowed with ROW/RANGE over window, i.e. the "ROWS BETWEEN 1 PRECEDING AND CURRENT ROW" clause. Best, Jark On Mon, 1 Feb 2021 at 22:06, Timo Walther wrote: > Hi Patrick, > > I could imagine that LEAD/LAG are translated into RANK/ROW_NUMBER > operations that are not

Re: Conflicts between the JDBC and postgresql-cdc SQL connectors

2021-01-28 Thread Jark Wu
ctorformat-resources On Thu, 28 Jan 2021 at 17:28, Sebastián Magrí wrote: > Hi Jark! > > Please find the full pom file attached. > > Best Regards, > > On Thu, 28 Jan 2021 at 03:21, Jark Wu wrote: > >> Hi Sebastián, >> >> I think Dawid is right. >> &

Re: Converting non-mini-batch to mini-batch from checkpoint or savepoint

2021-01-27 Thread Jark Wu
Hi Rex, Currently, it is not state compatible, because we will add a new node called MiniBatchAssigner after the source which changes the JobGraph , thus uid is different. Best, Jark On Tue, 26 Jan 2021 at 18:33, Dawid Wysakowicz wrote: > I am pulling in Jark and Godfrey who are more familiar

Re: Conflicts between the JDBC and postgresql-cdc SQL connectors

2021-01-27 Thread Jark Wu
Hi Sebastián, I think Dawid is right. Could you share the pom file? I also tried to package flink-connector-postgres-cdc with ServicesResourceTransformer, and the Factory file contains com.alibaba.ververica.cdc.connectors.postgres.table.PostgreSQLTableFactory Best, Jark On Tue, 26 Jan 2021

Re: A few questions about minibatch

2021-01-27 Thread Jark Wu
Hi Rex, Could you share your query here? It would be helpful to identify the root cause if we have the query. 1) watermark The framework automatically adds a node (the MiniBatchAssigner) to generate watermark events as the mini-batch id to broadcast and trigger mini-batch in the pipeline. 2)

Re: [DISCUSS] Correct time-related function behavior in Flink SQL

2021-01-20 Thread Jark Wu
Great examples to understand the problem and the proposed changes, @Kurt! Thanks Leonard for investigating this problem. The time-zone problems around time functions and windows have bothered a lot of users. It's time to fix them! The return value changes sound reasonable to me, and keeping the

Re: [DISCUSS] Correct time-related function behavior in Flink SQL

2021-01-20 Thread Jark Wu
Great examples to understand the problem and the proposed changes, @Kurt! Thanks Leonard for investigating this problem. The time-zone problems around time functions and windows have bothered a lot of users. It's time to fix them! The return value changes sound reasonable to me, and keeping the

Re: Flink SQL - IntervalJoin doesn't support consuming update and delete - trying to deduplicate rows

2021-01-13 Thread Jark Wu
Hi Dan, Sorry for the late reply. I guess you applied a "deduplication with keeping last row" before the interval join? That will produce an updating stream and interval join only supports append-only input. You can try to apply "deduplication with keeping *first* row" before the interval join.

Re: Statement Sets

2021-01-13 Thread Jark Wu
No. The Kafka reader will be shared, that means Kafka data is only be read once. On Tue, 12 Jan 2021 at 03:04, Aeden Jameson wrote: > When using statement sets, if two select queries use the same table > (e.g. Kafka Topic), does each query get its own copy of data? > > Thank you, > Aeden >

Re: 请问大神们flink-sql可以指定时间清除内存中的全部State吗?

2021-01-13 Thread Jark Wu
为啥不用天级别的tumble window? 自动就帮你清楚 state 了 On Wed, 6 Jan 2021 at 13:53, 徐州州 <25977...@qq.com> wrote: > 最近在做一个flink实时数仓项目,有个日报的场景,使用flink-on-yarn模式,如果job不停止,第二天的结果就在第一天的基础上累加了,我计算的读取规则(数据TimeStampcurrent_date)就认为是今天的数据,可是如果集群不停止第二天的计算结果就会在第一天累加,代码中只设置了env.setStateBackend(new >

Re: flink的算子没有类似于spark的cache操作吗?

2021-01-13 Thread Jark Wu
社区已经在做了,可以关注下这个 FLIP: https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink On Fri, 8 Jan 2021 at 15:42, 张锴 wrote: > 保存中间变量可以用状态存 > > 李继 于2021年1月7日周四 下午5:42写道: > > > HI , 请问当一个算子会被多次使用时,怎么把他缓存住,类似于spark的cache操作 > > > > val env = getBatchEnv > >

Re: flink sqlsubmit自定义程序报错

2021-01-13 Thread Jark Wu
从报错信息看是超时了,看看client与 JM 之间的网络是否通常把。 On Sun, 10 Jan 2021 at 16:23, Fei Han wrote: > 大家好! > > 参考云邪写的sqlsubmit提交SQL文件,我修改后提交,SQL文件已经识别了,可以创建表。但是提交任务insert的时候,在local模式下就报错。 > Flink版本是1.12.0。我的提交命令是:$FLINK_HOME/bin/flink run -mip:8081 -d -p 3 -c > sql.submit.SqlSubmit $SQL_JAR -f $sql_file >

Re: Row function cannot have column reference through table alias

2021-01-13 Thread Jark Wu
已知问题,后续版本会修复,作为临时解决办法,可以使用直接这样去构造 (b.app_id, b.message),不用添加 ROW 关键字。 On Mon, 11 Jan 2021 at 11:17, 刘海 wrote: > 使用ROW里面的 表别名.字段名 会出现解析错误,我之前使用hbase也有这个问题,我一般是在查询里面包一层子查询 > > > | | > 刘海 > | > | > liuha...@163.com > | > 签名由网易邮箱大师定制 > On 1/11/2021 11:04,马阳阳 wrote: > We have a sql that compose a

Re: flink sql读kafka元数据问题

2021-01-13 Thread Jark Wu
kafka 读 key fields: https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/kafka.html#key-fields On Wed, 13 Jan 2021 at 15:18, JasonLee <17610775...@163.com> wrote: > hi > > 你写入数据的时候设置 headers 了吗 没设置的话当然是空的了 > > > > - > Best Wishes > JasonLee > -- > Sent from:

Re: Re: SQL Client并行度设置 问题

2020-12-31 Thread Jark Wu
由flink-conf.yaml 中的 > parallelism.default控制,其是全局配置,不能做到单个作业配置并发度 > 2、在sql-client-defaults.yaml中设置 parallelism 无效,在SQL Clint 中设置 > parallelism.default > 和 parallelism 都无效 > > 那么如何有效控制 单个任务的并发度呢? > > 在 2020-12-31 15:21:36,"Jark Wu" 写道: > >在 Batch 模式下:

Re: SQL Client并行度设置 问题

2020-12-30 Thread Jark Wu
在 Batch 模式下: 1. Hive source 会去推断并发数,并发数由文件数决定。你也可以通过 table.exec.hive.infer-source-parallelism=false 来禁止并发推断, 这时候就会用 job 并发。或者设置一个最大的推断并发数 table.exec.hive.infer-source-parallelism.max。[1] 2. 同上。 3. 这里跟 max-parallelism 应该没有关系,应该是你没有配置 max slot 的原因,source 申请的并发太多,而 yarn 一时半会儿没这么多资源,所以超时了。 配上

Re: Flink SQL 1.11支持将数据写入到Hive吗?

2020-12-11 Thread Jark Wu
1.11的文档: https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/hive/hive_read_write.html 1.12的文档: https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/connectors/hive/ On Fri, 11 Dec 2020 at 15:45, yinghua...@163.com wrote: > 看官网介绍是支持的: > >

Re: computed column转为timestamp类型后进行窗口聚合报错

2020-12-11 Thread Jark Wu
建议将完整的代码展示出来,现在的信息不足以分析问题。 On Fri, 11 Dec 2020 at 11:53, jun su wrote: > hi Danny, > 尝试过是一样报错,debug看了下是LogicalWindowAggregateRuleBase在构建window时没有将Expr信息带下去 > , 只带了别名,导致后续优化规则报错退出 > > Danny Chan 于2020年12月11日周五 上午11:47写道: > > > 有木有尝试补充 watermark 语法 > > > > jun su 于2020年12月11日周五 上午10:47写道:

Re: Re: Re: retract stream UDAF使用问题

2020-12-09 Thread Jark Wu
X来说,在accumulate方法中每次都会执行acc.lastAmount = > Amount去更新acc的状态,但从结果来看,对于同一个Code X,每一次进入方法acc.lastAmount都是0? > 也是因为表中仅保留一条Code X的数据的关系吗? > > > 那在upsert kafka table中(Code X只保留最新一条数据),假设要累加Code > X的Amount,期望的输出是:0,100,300...,应该如何实现? > 求大佬解惑>< > > > > > > > > &g

Re: Re: retract stream UDAF使用问题

2020-12-09 Thread Jark Wu
> public Long bs = 0L; > > public Long ms = 0L; > > public Long ss = 0L; > > public Long ebb = 0L; > > public Long bb = 0L; > > public Long mb = 0L; > > public Long sb = 0L; > } > > > debug观察acc的lastAmount值,一直是0. > > > 刚才试了一下用sum()函数,执行select Code,sum(Amount) from market_stock GROUP BY > Code,发现并没有累加Amount字段的值,每一次输出都是最新的那个Amount值。 > 是我的使用姿势不对吗= = > > 在 2020-12-10 11:30:31,"Jark Wu" 写道: > >可以发下代码吗? > > > >On Thu, 10 Dec 2020 at 11:19, bulterman <15618338...@163.com> wrote: > > > >> 上游是upsert-kafka connector 创建的table, > >> 使用UDAF时发现accumlator里的变量一直没被更新?如果使用kafka connector是正常更新的 > >> (为了测试方便,table里只有同一个PK的数据) >

Re: Primary keys go missing after using TableFunction + leftOuterJoinLateral

2020-12-09 Thread Jark Wu
Could you use 4 scalar functions instead of UDTF and map function? For example; select *, hasOrange(fruits), hasBanana(fruits), hasApple(fruits), hasWatermelon(fruits) from T; I think this can preserve the primary key. Best, Jark On Thu, 3 Dec 2020 at 15:28, Rex Fenley wrote: > It appears

Re: retract stream UDAF使用问题

2020-12-09 Thread Jark Wu
可以发下代码吗? On Thu, 10 Dec 2020 at 11:19, bulterman <15618338...@163.com> wrote: > 上游是upsert-kafka connector 创建的table, > 使用UDAF时发现accumlator里的变量一直没被更新?如果使用kafka connector是正常更新的 > (为了测试方便,table里只有同一个PK的数据)

Re: flink jdbc connector 在checkpoint的时候如何确保buffer中的数据全都sink到数据库

2020-12-09 Thread Jark Wu
Hi Jie, 看起来确实是个问题。 sink 返回的 JdbcBatchingOutputFormat 没有 wrap 到 GenericJdbcSinkFunction 里面。 可以帮忙创建个 issue 么? Best, Jark On Thu, 10 Dec 2020 at 02:05, hailongwang <18868816...@163.com> wrote: > Hi, >是的,感觉你是对的。 > `JdbcOutputFormat` 会被 wrap 在 `OutputFormatSinkFunction` 中,而 >

Re: 使用flink sql cli读取postgres-cdc时,Could not find any factory for identifier 'postgres-cdc' that implements 'org.apache.flink.table.factories.DynamicTableSinkFactory' in the classpath.

2020-12-09 Thread Jark Wu
postgres-cdc 的表只支持读,不支持写。 On Wed, 9 Dec 2020 at 22:49, zhisheng wrote: > sql client 也得重启 > > 王敏超 于2020年12月9日周三 下午4:49写道: > > > 在使用standalone模式,并启动sql > > cli后,报错如下。但是我的lib目录是引入了flink-sql-connector-postgres-cdc-1.1.0.jar, > > 并且重启过集群。同样方式使用mysql cdc是可以的。 > > > > Caused by:

Re: A group window expects a time attribute for grouping in a stream environment谢谢

2020-12-09 Thread Jark Wu
Table orders = tEnv.from("Orders"); 没看到你有注册过 "Orders" 表。这一行应该执行不成功把。 Best, Jark On Thu, 10 Dec 2020 at 11:09, Jark Wu wrote: > 1). 所谓时间属性,不是指 timestamp 类型的字段,而是一个特殊概念。可以看下文档如果声明时间属性: > https://ci.apache.org/projects/flink/flink-docs-release-1).12/dev/table/streaming/tim

Re: A group window expects a time attribute for grouping in a stream environment谢谢

2020-12-09 Thread Jark Wu
1). 所谓时间属性,不是指 timestamp 类型的字段,而是一个特殊概念。可以看下文档如果声明时间属性: https://ci.apache.org/projects/flink/flink-docs-release-1).12/dev/table/streaming/time_attributes.html 2. 你的代码好像也不对。 L45: Table orders = tEnv.from("Orders"); 没看到你有注册过 "Orders" 表。这一行应该执行不成功把。 Best, Jark On Wed, 9 Dec 2020 at 15:44,

Re: FlinkSQL如何定义JsonObject数据的字段类型

2020-12-09 Thread Jark Wu
嗯 1.12.0 这两天就会发布。 On Wed, 9 Dec 2020 at 14:45, xiao cai wrote: > Hi Jark > sorry,是1.12.0, 我打错了 > > > Original Message > Sender: Jark Wu > Recipient: user-zh > Date: Wednesday, Dec 9, 2020 14:40 > Subject: Re: FlinkSQL如何定义JsonObject数据的字段类型 > > > Hi 赵一旦, 这部分 j

Re: 关于flink cdc的N流join状态后端的选择问题: ‎FsStateBackend和‎RocksDBStateBackend

2020-12-09 Thread Jark Wu
关于 rocksdb 的性能调优, @Yun Tang 会更清楚。 On Thu, 10 Dec 2020 at 11:04, Jark Wu wrote: > 建议大状态还是用 rocksdb,生产上会稳定很多。你说的这个量级感觉相差比较大,可能还没有对 rocksdb 调优导致的。 > > 你可以参考下这几篇文章尝试调优下 rocksdb: > > https://mp.weixin.qq.com/s/YpDi3BV8Me3Ay4hzc0nPQA > https://mp.weixin.qq.com/s/mjWGWJVQ_zSV

Re: 关于flink cdc的N流join状态后端的选择问题: ‎FsStateBackend和‎RocksDBStateBackend

2020-12-09 Thread Jark Wu
建议大状态还是用 rocksdb,生产上会稳定很多。你说的这个量级感觉相差比较大,可能还没有对 rocksdb 调优导致的。 你可以参考下这几篇文章尝试调优下 rocksdb: https://mp.weixin.qq.com/s/YpDi3BV8Me3Ay4hzc0nPQA https://mp.weixin.qq.com/s/mjWGWJVQ_zSVOgZqtjjoLw https://mp.weixin.qq.com/s/ylqK9_SuPKBKoaKmcdMYPA https://mp.weixin.qq.com/s/r0iPPGWceWkT1OeBJjvJGg Best,

Re: flink sql 1.11 kafka cdc与holo sink

2020-12-09 Thread Jark Wu
1. 目前不支持。 已有 issue 跟进支持 https://issues.apache.org/jira/browse/FLINK-20385 2. 配上 canal-json.table.include = 't1' 来过滤表。暂不支持正则过滤。 3. 会 Best, Jark On Wed, 9 Dec 2020 at 11:33, 于洋 <1045860...@qq.com> wrote: > flink sql 1.11 创建kafka source 表 ,kafka数据是canal采集的mysql 信息,'format' = > 'canal-json', 问题是

Re: flink11 SQL 如何支持双引号字符串

2020-12-09 Thread Jark Wu
跟这个 issue 没有关系。 这个听起来更像是 hive query 兼容的需求? 可以关注下 FLIP-152: Hive Query Syntax Compatibility https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility Best, Jark On Wed, 9 Dec 2020 at 11:13, zhisheng wrote: > 是跟这个 Issue

Re: 关于flink sql往postgres写数据遇到的timestamp问题

2020-12-09 Thread Jark Wu
看报错信息,你并没有 insert into postgres,而是只是 select 了 query,这个会将运行结果显式在 sql client 界面上,而不会插入到 postgres 中。 你的 query 中有 timestamp with local time zone, 而 sql client 的 query运行结果的显式 还不支持这个类型。 这个问题的解决可以关注下这个 issue:https://issues.apache.org/jira/browse/FLINK-17948 Best, Jark On Tue, 8 Dec 2020 at 19:32, 李轲

Re: 求助如何用flink1.11.2 on yarn集成CDH的hbase2.0版本

2020-12-09 Thread Jark Wu
1. 提示“找不到hbase包” 具体的异常栈是什么呢? 2. 看你的步骤中也没有加 flink hbase connector jar 到 lib 下,这会导致找不到 hbase table factory 3. flink 1.11 版本的时候还没有提供 hbase 2.x connector jar 4. flink 1.12 版本支持了 hbase 2.x,理论上也兼容 flink 1.11 集群。 所以你可以试下 download

Re: [flink-1.10.2] 异步IO结果DataStream 该如何注册为table??

2020-12-09 Thread Jark Wu
> tabEnv.createTemporaryView("test_table", result, 我看你这不是注册进去了么? 有报什么错么? 最后提交作业执行记得调用 StreamExecutionEnvironment.execute() Best, Jark On Tue, 8 Dec 2020 at 14:54, Tianwang Li wrote: > Flink版本:1.10.2 > > 使用RichAsyncFunction 异步IO 操作,结果DataStream 不能注册为table。 > > 本地测试的结果是一直重复输出数据。 > >

Re: FlinkSQL如何定义JsonObject数据的字段类型

2020-12-08 Thread Jark Wu
Hi 赵一旦, 这部分 jackson 组件已经自动处理了这部分逻辑。 Hi xiaocai, 你有什么 issue 是需要1.12.1的? 1.12.0 这两天即将发布。 Best, Jark On Wed, 9 Dec 2020 at 14:34, xiao cai wrote: > 好的,计划下周升级测试下,另:1.12.1计划何时发布呢 > > > Original Message > Sender: Jark Wu > Recipient: user-zh > Date: Tuesday, Dec 8, 202

Re: Error while connecting with MSSQL server

2020-12-07 Thread Jark Wu
Hi, Currently, flink-connector-jdbc doesn't support MS Server dialect. Only MySQL and Postgres are supported. Best, Jark On Tue, 8 Dec 2020 at 01:20, aj wrote: > Hello , > > I am trying to create a table with microsoft sql server using flink sql > > CREATE TABLE sampleSQLSink ( > id

Re: 回复: flink 使用关系型数据库的默认事务是否可以做到端对端的精确一次,还是需要实现2p2提交

2020-12-07 Thread Jark Wu
数据库两阶段提交,保证 exactly once 语义,社区正在支持,感兴趣的可以在 https://issues.apache.org/jira/browse/FLINK-15578 下面讨论。 Best, Jark On Tue, 8 Dec 2020 at 09:14, hdxg1101300...@163.com wrote: > > > > > hdxg1101300...@163.com > > 发件人: hdxg1101300...@163.com > 发送时间: 2020-12-07 18:40 > 收件人: user-zh > 主题: 回复: Re: flink

Re: FlinkSQL如何定义JsonObject数据的字段类型

2020-12-07 Thread Jark Wu
hailong 说的定义成 STRING 是在1.12 版本上支持的, https://issues.apache.org/jira/browse/FLINK-18002 1.12 这两天就会发布,如果能升级的话,可以尝试一下。 Best, Jark On Tue, 8 Dec 2020 at 11:56, wxpcc wrote: > 可以使用字符串的方式,或者自定义 String类型format,内部结构再通过udf去做后续的实现 > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 使用RedisSink无法将读取的Kafka数据写入Redis中

2020-12-06 Thread Jark Wu
这个估计和网络和部署有关,建议咨询下华为云的技术支持。 On Sun, 6 Dec 2020 at 20:40, 赵一旦 wrote: > 连接不上,你的华为云确认和redis服务器连通吗? > > 追梦的废柴 于2020年12月6日周日 下午8:35写道: > > > 各位: > > 晚上好! > > 现在我所在的项目组在调研Flink框架,有一个指标需要读取Kafka中的数据然后使用Redis存储最终的结果。 > > > > >

Re: 动态表 Change Log 格式

2020-12-04 Thread Jark Wu
是完整的记录。 upsert kafka 就是这样子实现的,只存储最新镜像。 但是有的 query 是会产生 delete 消息的,所以有时候还是需要存下 delete,像 upsert kafka 里就存成了kafka 的 tombstone 消息。 Best, Jark On Fri, 4 Dec 2020 at 17:00, jie mei wrote: > Hi, Community > > Flink 动态表的 Change Log 会有四种消息(INSERT, UPDATE_BEFORE, UPDATE_AFTER, > DELETE). > 其中

Re: 为什么要关闭calcite的隐式转换功能

2020-12-04 Thread Jark Wu
社区已经开始 Hive query 语法兼容的设计讨论,可以关注下: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-152-Hive-Query-Syntax-Compatibility-td46928.html Best, Jark On Fri, 4 Dec 2020 at 15:37, stgztsw wrote: > 我觉得既然社区准备兼容hive,隐式转换和其他hive的语法兼容还是必须的。实际生产环境里运行的hive >

Re: 生产hive sql 迁移flink 11 引擎,碰到的问题

2020-12-04 Thread Jark Wu
Hi, Flink SQL 1.11 暂时还不兼容 Hive SQL 语法。这个功能的设计,最近才在社区中讨论,预计1.13中支持。可以关注下这个 design 的讨论: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-152-Hive-Query-Syntax-Compatibility-td46928.html Best, Jark On Fri, 4 Dec 2020 at 11:45, 莫失莫忘 wrote: > 最近尝试把一个生产 hive sql

Re: flink-cdc 无法读出binlog,程序也不报错

2020-12-04 Thread Jark Wu
这个听起来不太合理。总得报个什么错 作业再失败吧。 或者TaskManager 的日志中有没有什么异常信息? On Fri, 4 Dec 2020 at 09:23, chenjb wrote: > > 谢谢老哥关注,我是int对应成bigint了,原以为bigint范围更大应该可以兼容,没认真看文档,然后mysql有错误日志但程序没报错,注意力就放mysql那边了。这两天试了下flink-cdc-sql发现有些有错误的场景不报错,比如连不上mysql(密码错误)也是程序直接退出exit > 0,没任何报错或提示,是还要设置什么吗?我是java小白,这块不是很懂 > > > > --

Re: Partitioned tables in SQL client configuration.

2020-12-03 Thread Jark Wu
Only legacy connectors (`connector.type=kafka` instead of `connector=kafka`) are supported in the YAML at the moment. You can use regular DDL instead. There is a similar discussion in https://issues.apache.org/jira/browse/FLINK-20260 these days. Best, Jark On Thu, 3 Dec 2020 at 00:52, Till

Re: flink sql实时计算分位数如何实现

2020-12-03 Thread Jark Wu
可以看下UDAF的文档: https://ci.apache.org/projects/flink/flink-docs-master/dev/table/functions/udfs.html#aggregate-functions On Thu, 3 Dec 2020 at 12:06, 爱成绕指柔 <1194803...@qq.com> wrote: > 你好: > 目前flink sql实时计算中没有percentile函数吗?如果没有,如何实现这一功能。 > 期待你的答复,谢谢!

Re: 帮忙推荐下flink是用啥可视化的客户端?

2020-12-03 Thread Jark Wu
可以尝试下 Zeppelin, 与 flink sql 的集成做的挺好的。 Best, Jark On Thu, 3 Dec 2020 at 21:55, yinghua...@163.com wrote: > 这个我没说清楚,就是flink sql客户端,我们想弄个客户端给公司其他部门使用,那些部门同事只会一些sql,对于代码编程欠缺 > > > 在 2020年12月3日,21:52,Shawn Huang 写道: > > > > 你说的客户端是指什么?Flink 默认在 8081 端口提供了 Web UI,可以提交和取消任务,查看日志和一些基础指标。 > > > >

Re: flink-cdc 无法读出binlog,程序也不报错

2020-12-03 Thread Jark Wu
是不是 unsigned int 惹的祸... On Thu, 3 Dec 2020 at 15:15, chenjb wrote: > 破案了,字段类型没按官网的要求对应起来,对应起来后正常了 > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Flink SQL共享source 问题

2020-12-03 Thread Jark Wu
1. 是不是共享了 source,看下 web ui 中的拓扑图就知道了 2. 追数据的时候,或者下游消费速度不一的时候,分区之间消费不均衡是很正常的。 3. 你可以调大 sink 的并发,以及增加 buffer size 来缓解这个问题。 Best, Jark On Wed, 2 Dec 2020 at 19:22, zz wrote: > hi各位: > 目前我有一个任务,source table是读取一个topic生成的,但是有6个sink,使用了多条insert > 语句输出到同一张mysql表中,按照我的理解,这些insert语句 > 应该都是共享这个source

Re: I defined a Kafka dynamic table in SQL-Client, but the kafka theme had some elements in the wrong format, so an exception was thrown in SQL-Client. Can we define the Kafka dynamic table with some

2020-12-03 Thread Jark Wu
我觉得这应该是个 bug,已创建 issue: https://issues.apache.org/jira/browse/FLINK-20470 On Wed, 2 Dec 2020 at 18:02, mr.meng...@ouglook.com wrote: > < > http://apache-flink.147419.n8.nabble.com/file/t1146/QQ%E6%88%AA%E5%9B%BE111.jpg> > > > < >

Re: 为什么要关闭calcite的隐式转换功能

2020-12-03 Thread Jark Wu
隐式转换功能,是一个非常重要的 public API ,需要经过社区仔细地讨论,例如哪些类型之间可以类型转换。 目前社区还没有规划这个功能,如果需要的话,可以在社区中开个 issue。 Best, Jark On Wed, 2 Dec 2020 at 18:33, stgztsw wrote: > 目前flink sql,flink hive >

Re: flink sql 1.11.1 貌似出现bug

2020-12-03 Thread Jark Wu
看样子是提交作业超时失败了,请确认 1. flink cluster 已经起来了 2. sql client 的环境与 flink cluster 环境连通 3. sql-client-defaults.yaml 中配置了正确的 gateway-address 地址 (如果是本地 cluster,则不用配置) Best, Jark On Wed, 2 Dec 2020 at 14:12, zzy wrote: > 遇到的问题如下, flink版本1.11.1,sql client 中使用flink sql > > > sql语句如下: > CREATE TABLE

Re: Flink TableAPI Issue: cannot assign instance of org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.util.LRUMap to field

2020-12-03 Thread Jark Wu
检查下提交作业的 flink 版本,和 yarn 集群上部署的 flink 版本是否一致。 或者可能是你集群中有两个不同版本的 flink-shaded-jackson 包。 On Wed, 2 Dec 2020 at 11:55, Zed wrote: > When I submitted a flink-table-sql job to yarn, the following exception > came > out. Wondering how to solve it. Anyone can help me with that? Appreciate > it >

  1   2   3   4   5   6   7   >