退订
退订
Re: How to set reblance in Flink Sql like Streaming api?
Hi hjw To rescale data for dim join, I think you can use `partition by` in sql before `dim join` which will redistribute data by specific column. In addition, you can add cache for `dim table` to improve performance too. Best, Shammon FY On Tue, Apr 4, 2023 at 10:28 AM Hang Ruan wrote: > Hi, hiw, > > IMO, I think the parallelism 1 is enough for you job if we do not consider > the sink. I do not know why you need set the lookup join operator's > parallelism to 6. > The SQL planner will help us to decide the type of the edge and we can not > change it. > Maybe you could share the Execution graph to provide more information. > > Best, > Hang > > hjw 于2023年4月4日周二 00:37写道: > >> For example. I create a kafka source to subscribe the topic that have >> one partition and set the default parallelism of the job to 6.The next >> operator of kafka source is that lookup join a mysql table.However, the >> relationship between the kafka Source and the Lookup join operator is >> Forward, so only one subtask in the Lookup join operator can receive data.I >> want to set the relationship between the kafka Source and the Lookup join >> operator is reblance so that all subtask in Lookup join operator can >> recevie data. >> >> Env: >> Flink version:1.15.1 >> >> >> -- >> Best, >> Hjw >> >
Re: How to set reblance in Flink Sql like Streaming api?
Hi, hiw, IMO, I think the parallelism 1 is enough for you job if we do not consider the sink. I do not know why you need set the lookup join operator's parallelism to 6. The SQL planner will help us to decide the type of the edge and we can not change it. Maybe you could share the Execution graph to provide more information. Best, Hang hjw 于2023年4月4日周二 00:37写道: > For example. I create a kafka source to subscribe the topic that have > one partition and set the default parallelism of the job to 6.The next > operator of kafka source is that lookup join a mysql table.However, the > relationship between the kafka Source and the Lookup join operator is > Forward, so only one subtask in the Lookup join operator can receive data.I > want to set the relationship between the kafka Source and the Lookup join > operator is reblance so that all subtask in Lookup join operator can > recevie data. > > Env: > Flink version:1.15.1 > > > -- > Best, > Hjw >
Re: [DISCUSS] Status of Statefun Project
Thanks for bringing this up. I'm currently using Statefun, and I've made a few small code contributions over time. All of my PRs have been merged into master and most have been released, but a few haven't been part of a release yet. Most recently, I helped upgrade Statefun to be compatible with Flink 1.15.2, which was merged last October but hasn't been released. (And, of course, there have been more Flink releases since then.) IMO, the main thing driving the need for ongoing Statefun releases -- even in the absence of any new feature development -- is that there is typically a bit of work to do to make Statefun compatible with each new Flink release. This usually involves updating dependency versions and sometimes some simple code changes, a common example being adapting to changes in Flink config parameters that have changed from, say, delimited strings to arrays. I'd be happy to continue to make the necessary changes to Statefun to be compatible with each new Flink release, but I don't have the committer rights that would allow me to release the code. On Mon, Apr 3, 2023 at 5:02 AM Martijn Visser wrote: > Hi everyone, > > I want to open a discussion on the status of the Statefun Project [1] in > Apache Flink. As you might have noticed, there hasn't been much development > over the past months in the Statefun repository [2]. There is currently a > lack of active contributors and committers who are able to help with the > maintenance of the project. > > In order to improve the situation, we need to solve the lack of committers > and the lack of contributors. > > On the lack of committers: > > 1. Ideally, there are some of the current Flink committers who have the > bandwidth and can help with reviewing PRs and merging them. > 2. If that's not an option, it could be a consideration that current > committers only approve and review PRs, that are approved by those who are > willing to contribute to Statefun and if the CI passes > > On the lack of contributors: > > 3. Next to having this discussion on the Dev and User mailing list, we can > also create a blog with a call for new contributors on the Flink project > website, send out some tweets on the Flink / Statefun twitter accounts, > post messages on Slack etc. In that message, we would inform how those that > are interested in contributing can start and where they could reach out for > more information. > > There's also option 4. where a group of interested people would split > Statefun from the Flink project and make it a separate top level project > under the Apache Flink umbrella (similar as recently has happened with > Flink Table Store, which has become Apache Paimon). > > If we see no improvements in the coming period, we should consider > sunsetting Statefun and communicate that clearly to the users. > > I'm looking forward to your thoughts. > > Best regards, > > Martijn > > [1] https://nightlies.apache.org/flink/flink-statefun-docs-master/ > [2] https://github.com/apache/flink-statefun >
How to set reblance in Flink Sql like Streaming api?
For example. I create a kafka source to subscribe the topic that have one partition and set the default parallelism of the job to 6.The next operator of kafka source is that lookup join a mysql table.However, the relationship between the kafka Source and the Lookup join operator is Forward, so only one subtask in the Lookup join operator can receive data.I want to set the relationship between the kafka Source and the Lookup join operator is reblance so that all subtask in Lookup join operator can recevie data. Env: Flink version:1.15.1 -- Best, Hjw
[ANNOUNCE] Starting with Flink 1.18 Release Sync
Hi everyone, As a fresh start of the Flink release 1.18, I'm happy to share with you that the first release sync meeting of 1.18 will happen tomorrow on Tuesday, April 4th at 10am (UTC+2) / 4pm (UTC+8). Welcome and feel free to join us and share your ideas about the new release cycle! Details of joining the release sync can be found in the 1.18 release wiki page [1]. All contributors are invited to update the same wiki page [1] and include features targeting the 1.18 release. Looking forward to seeing you all in the meeting! [1] https://cwiki.apache.org/confluence/display/FLINK/1.18+Release Best regards, Jing, Konstantin, Sergey and Qingsheng
[ANNOUNCE] Starting with Flink 1.18 Release Sync
Hi everyone, As a fresh start of the Flink release 1.18, I'm happy to share with you that the first release sync meeting of 1.18 will happen tomorrow on Tuesday, April 4th at 10am (UTC+2) / 4pm (UTC+8). Welcome and feel free to join us and share your ideas about the new release cycle! Details of joining the release sync can be found in the 1.18 release wiki page [1]. All contributors are invited to update the same wiki page [1] and include features targeting the 1.18 release. Looking forward to seeing you all in the meeting! [1] https://cwiki.apache.org/confluence/display/FLINK/1.18+Release Best regards, Jing, Konstantin, Sergey and Qingsheng
[DISCUSS] Status of Statefun Project
Hi everyone, I want to open a discussion on the status of the Statefun Project [1] in Apache Flink. As you might have noticed, there hasn't been much development over the past months in the Statefun repository [2]. There is currently a lack of active contributors and committers who are able to help with the maintenance of the project. In order to improve the situation, we need to solve the lack of committers and the lack of contributors. On the lack of committers: 1. Ideally, there are some of the current Flink committers who have the bandwidth and can help with reviewing PRs and merging them. 2. If that's not an option, it could be a consideration that current committers only approve and review PRs, that are approved by those who are willing to contribute to Statefun and if the CI passes On the lack of contributors: 3. Next to having this discussion on the Dev and User mailing list, we can also create a blog with a call for new contributors on the Flink project website, send out some tweets on the Flink / Statefun twitter accounts, post messages on Slack etc. In that message, we would inform how those that are interested in contributing can start and where they could reach out for more information. There's also option 4. where a group of interested people would split Statefun from the Flink project and make it a separate top level project under the Apache Flink umbrella (similar as recently has happened with Flink Table Store, which has become Apache Paimon). If we see no improvements in the coming period, we should consider sunsetting Statefun and communicate that clearly to the users. I'm looking forward to your thoughts. Best regards, Martijn [1] https://nightlies.apache.org/flink/flink-statefun-docs-master/ [2] https://github.com/apache/flink-statefun
[ANNOUNCE] Apache flink-connector-aws v4.1.0 released
The Apache Flink community is very happy to announce the release of Apache flink-connector-aws v4.1.0 Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The release is available for download at: https://flink.apache.org/downloads.html The full release notes are available in Jira: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352646 We would like to thank all contributors of the Apache Flink community who made this release possible! Regards, Danny
[ANNOUNCE] Apache flink-connector-mongodb v1.0.0 released
The Apache Flink community is very happy to announce the release of Apache flink-connector-mongodb v1.0.0 Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The release is available for download at: https://flink.apache.org/downloads.html The full release notes are available in Jira: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352386 We would like to thank all contributors of the Apache Flink community who made this release possible! Regards, Danny
Re: Access ExecutionConfig from new Source and Sink API
Hi, christopher, I think there is already about the ExecutionConfig for new Sink API in the FLIP-287[1]. What we actually need is a read-only ExecutionConfig for Source API and Sink API. Maybe we could continue to discuss this topic under FLIP-287. Best, Hang [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240880853 Yufan Sheng 于2023年4月3日周一 14:06写道: > I agree with you. It's quite useful to access the ExecutionConfig in > Source API. When I develop the flink-connector-pulsar. The only > configuration that I can't access is the checkpoint configure which is > defined in ExecutionConfig. I can switch the behavior automatically by > the checkpoint switch. So I have to add more custom configurations for > the Pulsar Source. > > On Mon, Apr 3, 2023 at 1:47 PM Christopher Lee > wrote: > > > > Hello, > > > > I'm trying to develop Flink connectors to NATS using the new FLIP-27 and > FLIP-143 APIs. The scaffolding is more complicated than the old > SourceFunction and SinkFunction, but not terrible. However I can't figure > out how to access the ExecutionConfig under these new APIs. This was > possible in the old APIs by way of the RuntimeContext of the > AbstractRichFunction (which are extended by RichSourceFunction and > RichSinkFunction). > > > > The reason I would like this is: some interactions with external > systems may be invalid under certain Flink job execution parameters. > Consider a system like NATS which allows for acknowledgements of messages > received. I would ideally acknowledge all received messages by the source > connector during checkpointing. If I fail to acknowledge the delivered > messages, after a pre-configured amount of time, NATS would resend the > message (which is good in my case for fault tolerance). > > > > However, if a Flink job using these connectors has disabled > checkpointing or made the interval too large, the connector will never > acknowledge delivered messages and the NATS system may send the message > again and cause duplicate data. I would be able to avoid this if I could > access the ExecutionConfig to check these parameters and throw early. > > > > I know that the SourceReaderContext gives me access to the > Configuration, but that doesn't handle the case where the > execution-environment is set programatically in a job definition rather > than through configuration. Any ideas? > > > > Thanks, > > Chris >
Re: Access ExecutionConfig from new Source and Sink API
I agree with you. It's quite useful to access the ExecutionConfig in Source API. When I develop the flink-connector-pulsar. The only configuration that I can't access is the checkpoint configure which is defined in ExecutionConfig. I can switch the behavior automatically by the checkpoint switch. So I have to add more custom configurations for the Pulsar Source. On Mon, Apr 3, 2023 at 1:47 PM Christopher Lee wrote: > > Hello, > > I'm trying to develop Flink connectors to NATS using the new FLIP-27 and > FLIP-143 APIs. The scaffolding is more complicated than the old > SourceFunction and SinkFunction, but not terrible. However I can't figure out > how to access the ExecutionConfig under these new APIs. This was possible in > the old APIs by way of the RuntimeContext of the AbstractRichFunction (which > are extended by RichSourceFunction and RichSinkFunction). > > The reason I would like this is: some interactions with external systems may > be invalid under certain Flink job execution parameters. Consider a system > like NATS which allows for acknowledgements of messages received. I would > ideally acknowledge all received messages by the source connector during > checkpointing. If I fail to acknowledge the delivered messages, after a > pre-configured amount of time, NATS would resend the message (which is good > in my case for fault tolerance). > > However, if a Flink job using these connectors has disabled checkpointing or > made the interval too large, the connector will never acknowledge delivered > messages and the NATS system may send the message again and cause duplicate > data. I would be able to avoid this if I could access the ExecutionConfig to > check these parameters and throw early. > > I know that the SourceReaderContext gives me access to the Configuration, but > that doesn't handle the case where the execution-environment is set > programatically in a job definition rather than through configuration. Any > ideas? > > Thanks, > Chris