退订

2023-04-03 Thread 柳懿珊
退订

Re: How to set reblance in Flink Sql like Streaming api?

2023-04-03 Thread Shammon FY
Hi hjw

To rescale data for dim join, I think you can use `partition by` in sql
before `dim join` which will redistribute data by specific column. In
addition, you can add cache for `dim table` to improve performance too.

Best,
Shammon FY


On Tue, Apr 4, 2023 at 10:28 AM Hang Ruan  wrote:

> Hi, hiw,
>
> IMO, I think the parallelism 1 is enough for you job if we do not consider
> the sink. I do not know why you need set the lookup join operator's
> parallelism to 6.
> The SQL planner will help us to decide the type of the edge and we can not
> change it.
> Maybe you could share the Execution graph to provide more information.
>
> Best,
> Hang
>
> hjw  于2023年4月4日周二 00:37写道:
>
>> For example. I create a kafka source to subscribe  the topic that have
>> one partition and set the default parallelism of the job to 6.The next
>> operator of kafka source is that  lookup join a mysql table.However, the
>> relationship between the kafka Source and the Lookup join operator is
>> Forward, so only one subtask in the Lookup join operator can receive data.I
>> want to set the relationship between the kafka Source and the Lookup join
>> operator is reblance so that all subtask in Lookup join operator can
>> recevie data.
>>
>> Env:
>> Flink version:1.15.1
>>
>>
>> --
>> Best,
>> Hjw
>>
>


Re: How to set reblance in Flink Sql like Streaming api?

2023-04-03 Thread Hang Ruan
Hi, hiw,

IMO, I think the parallelism 1 is enough for you job if we do not consider
the sink. I do not know why you need set the lookup join operator's
parallelism to 6.
The SQL planner will help us to decide the type of the edge and we can not
change it.
Maybe you could share the Execution graph to provide more information.

Best,
Hang

hjw  于2023年4月4日周二 00:37写道:

> For example. I create a kafka source to subscribe  the topic that have
> one partition and set the default parallelism of the job to 6.The next
> operator of kafka source is that  lookup join a mysql table.However, the
> relationship between the kafka Source and the Lookup join operator is
> Forward, so only one subtask in the Lookup join operator can receive data.I
> want to set the relationship between the kafka Source and the Lookup join
> operator is reblance so that all subtask in Lookup join operator can
> recevie data.
>
> Env:
> Flink version:1.15.1
>
>
> --
> Best,
> Hjw
>


Re: [DISCUSS] Status of Statefun Project

2023-04-03 Thread Galen Warren via user
Thanks for bringing this up.

I'm currently using Statefun, and I've made a few small code contributions
over time. All of my PRs have been merged into master and most have been
released, but a few haven't been part of a release yet. Most recently, I
helped upgrade Statefun to be compatible with Flink 1.15.2, which was
merged last October but hasn't been released. (And, of course, there have
been more Flink releases since then.)

IMO, the main thing driving the need for ongoing Statefun releases -- even
in the absence of any new feature development -- is that there is typically
a bit of work to do to make Statefun compatible with each new Flink
release. This usually involves updating dependency versions and sometimes
some simple code changes, a common example being adapting to changes in
Flink config parameters that have changed from, say, delimited strings to
arrays.

I'd be happy to continue to make the necessary changes to Statefun to be
compatible with each new Flink release, but I don't have the committer
rights that would allow me to release the code.





On Mon, Apr 3, 2023 at 5:02 AM Martijn Visser 
wrote:

> Hi everyone,
>
> I want to open a discussion on the status of the Statefun Project [1] in
> Apache Flink. As you might have noticed, there hasn't been much development
> over the past months in the Statefun repository [2]. There is currently a
> lack of active contributors and committers who are able to help with the
> maintenance of the project.
>
> In order to improve the situation, we need to solve the lack of committers
> and the lack of contributors.
>
> On the lack of committers:
>
> 1. Ideally, there are some of the current Flink committers who have the
> bandwidth and can help with reviewing PRs and merging them.
> 2. If that's not an option, it could be a consideration that current
> committers only approve and review PRs, that are approved by those who are
> willing to contribute to Statefun and if the CI passes
>
> On the lack of contributors:
>
> 3. Next to having this discussion on the Dev and User mailing list, we can
> also create a blog with a call for new contributors on the Flink project
> website, send out some tweets on the Flink / Statefun twitter accounts,
> post messages on Slack etc. In that message, we would inform how those that
> are interested in contributing can start and where they could reach out for
> more information.
>
> There's also option 4. where a group of interested people would split
> Statefun from the Flink project and make it a separate top level project
> under the Apache Flink umbrella (similar as recently has happened with
> Flink Table Store, which has become Apache Paimon).
>
> If we see no improvements in the coming period, we should consider
> sunsetting Statefun and communicate that clearly to the users.
>
> I'm looking forward to your thoughts.
>
> Best regards,
>
> Martijn
>
> [1] https://nightlies.apache.org/flink/flink-statefun-docs-master/
> [2] https://github.com/apache/flink-statefun
>


How to set reblance in Flink Sql like Streaming api?

2023-04-03 Thread hjw
For example. I create a kafka source to subscribe  the topic that have one 
partition and set the default parallelism of the job to 6.The next operator of 
kafka source is that  lookup join a mysql table.However, the relationship 
between the kafka Source and the Lookup join operator is Forward, so only one 
subtask in the Lookup join operator can receive data.I want to set the 
relationship between the kafka Source and the Lookup join operator is reblance 
so that all subtask in Lookup join operator can recevie data.


Env:
Flink version:1.15.1




--

Best,
Hjw

[ANNOUNCE] Starting with Flink 1.18 Release Sync

2023-04-03 Thread Qingsheng Ren
Hi everyone,

As a fresh start of the Flink release 1.18, I'm happy to share with you
that the first release sync meeting of 1.18 will happen tomorrow on
Tuesday, April 4th at 10am (UTC+2) / 4pm (UTC+8). Welcome and feel free to
join us and share your ideas about the new release cycle!

Details of joining the release sync can be found in the 1.18 release wiki
page [1].

All contributors are invited to update the same wiki page [1] and include
features targeting the 1.18 release.

Looking forward to seeing you all in the meeting!

[1] https://cwiki.apache.org/confluence/display/FLINK/1.18+Release

Best regards,
Jing, Konstantin, Sergey and Qingsheng


[ANNOUNCE] Starting with Flink 1.18 Release Sync

2023-04-03 Thread Qingsheng Ren
Hi everyone,

As a fresh start of the Flink release 1.18, I'm happy to share with you
that the first release sync meeting of 1.18 will happen tomorrow on
Tuesday, April 4th at 10am (UTC+2) / 4pm (UTC+8). Welcome and feel free to
join us and share your ideas about the new release cycle!

Details of joining the release sync can be found in the 1.18 release wiki
page [1].

All contributors are invited to update the same wiki page [1] and include
features targeting the 1.18 release.

Looking forward to seeing you all in the meeting!

[1] https://cwiki.apache.org/confluence/display/FLINK/1.18+Release

Best regards,
Jing, Konstantin, Sergey and Qingsheng


[DISCUSS] Status of Statefun Project

2023-04-03 Thread Martijn Visser
Hi everyone,

I want to open a discussion on the status of the Statefun Project [1] in
Apache Flink. As you might have noticed, there hasn't been much development
over the past months in the Statefun repository [2]. There is currently a
lack of active contributors and committers who are able to help with the
maintenance of the project.

In order to improve the situation, we need to solve the lack of committers
and the lack of contributors.

On the lack of committers:

1. Ideally, there are some of the current Flink committers who have the
bandwidth and can help with reviewing PRs and merging them.
2. If that's not an option, it could be a consideration that current
committers only approve and review PRs, that are approved by those who are
willing to contribute to Statefun and if the CI passes

On the lack of contributors:

3. Next to having this discussion on the Dev and User mailing list, we can
also create a blog with a call for new contributors on the Flink project
website, send out some tweets on the Flink / Statefun twitter accounts,
post messages on Slack etc. In that message, we would inform how those that
are interested in contributing can start and where they could reach out for
more information.

There's also option 4. where a group of interested people would split
Statefun from the Flink project and make it a separate top level project
under the Apache Flink umbrella (similar as recently has happened with
Flink Table Store, which has become Apache Paimon).

If we see no improvements in the coming period, we should consider
sunsetting Statefun and communicate that clearly to the users.

I'm looking forward to your thoughts.

Best regards,

Martijn

[1] https://nightlies.apache.org/flink/flink-statefun-docs-master/
[2] https://github.com/apache/flink-statefun


[ANNOUNCE] Apache flink-connector-aws v4.1.0 released

2023-04-03 Thread Danny Cranmer
The Apache Flink community is very happy to announce the release of Apache
flink-connector-aws v4.1.0

Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data streaming
applications.

The release is available for download at:
https://flink.apache.org/downloads.html

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352646

We would like to thank all contributors of the Apache Flink community who
made this release possible!

Regards,
Danny


[ANNOUNCE] Apache flink-connector-mongodb v1.0.0 released

2023-04-03 Thread Danny Cranmer
The Apache Flink community is very happy to announce the release of Apache
flink-connector-mongodb v1.0.0

Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data streaming
applications.

The release is available for download at:
https://flink.apache.org/downloads.html

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352386

We would like to thank all contributors of the Apache Flink community who
made this release possible!

Regards,
Danny


Re: Access ExecutionConfig from new Source and Sink API

2023-04-03 Thread Hang Ruan
Hi, christopher,

I think there is already about the ExecutionConfig for new Sink API in
the FLIP-287[1]. What we actually need is a read-only ExecutionConfig for
Source API and Sink API.
Maybe we could continue to discuss this topic under FLIP-287.

Best,
Hang

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240880853

Yufan Sheng  于2023年4月3日周一 14:06写道:

> I agree with you. It's quite useful to access the ExecutionConfig in
> Source API. When I develop the flink-connector-pulsar. The only
> configuration that I can't access is the checkpoint configure which is
> defined in ExecutionConfig. I can switch the behavior automatically by
> the checkpoint switch. So I have to add more custom configurations for
> the Pulsar Source.
>
> On Mon, Apr 3, 2023 at 1:47 PM Christopher Lee 
> wrote:
> >
> > Hello,
> >
> > I'm trying to develop Flink connectors to NATS using the new FLIP-27 and
> FLIP-143 APIs. The scaffolding is more complicated than the old
> SourceFunction and SinkFunction, but not terrible. However I can't figure
> out how to access the ExecutionConfig under these new APIs. This was
> possible in the old APIs by way of the RuntimeContext of the
> AbstractRichFunction (which are extended by RichSourceFunction and
> RichSinkFunction).
> >
> > The reason I would like this is:  some interactions with external
> systems may be invalid under certain Flink job execution parameters.
> Consider a system like NATS which allows for acknowledgements of messages
> received. I would ideally acknowledge all received messages by the source
> connector during checkpointing. If I fail to acknowledge the delivered
> messages, after a pre-configured amount of time, NATS would resend the
> message (which is good in my case for fault tolerance).
> >
> > However, if a Flink job using these connectors has disabled
> checkpointing or made the interval too large, the connector will never
> acknowledge delivered messages and the NATS system may send the message
> again and cause duplicate data. I would be able to avoid this if I could
> access the ExecutionConfig to check these parameters and throw early.
> >
> > I know that the SourceReaderContext gives me access to the
> Configuration, but that doesn't handle the case where the
> execution-environment is set programatically in a job definition rather
> than through configuration. Any ideas?
> >
> > Thanks,
> > Chris
>


Re: Access ExecutionConfig from new Source and Sink API

2023-04-03 Thread Yufan Sheng
I agree with you. It's quite useful to access the ExecutionConfig in
Source API. When I develop the flink-connector-pulsar. The only
configuration that I can't access is the checkpoint configure which is
defined in ExecutionConfig. I can switch the behavior automatically by
the checkpoint switch. So I have to add more custom configurations for
the Pulsar Source.

On Mon, Apr 3, 2023 at 1:47 PM Christopher Lee  wrote:
>
> Hello,
>
> I'm trying to develop Flink connectors to NATS using the new FLIP-27 and 
> FLIP-143 APIs. The scaffolding is more complicated than the old 
> SourceFunction and SinkFunction, but not terrible. However I can't figure out 
> how to access the ExecutionConfig under these new APIs. This was possible in 
> the old APIs by way of the RuntimeContext of the AbstractRichFunction (which 
> are extended by RichSourceFunction and RichSinkFunction).
>
> The reason I would like this is:  some interactions with external systems may 
> be invalid under certain Flink job execution parameters. Consider a system 
> like NATS which allows for acknowledgements of messages received. I would 
> ideally acknowledge all received messages by the source connector during 
> checkpointing. If I fail to acknowledge the delivered messages, after a 
> pre-configured amount of time, NATS would resend the message (which is good 
> in my case for fault tolerance).
>
> However, if a Flink job using these connectors has disabled checkpointing or 
> made the interval too large, the connector will never acknowledge delivered 
> messages and the NATS system may send the message again and cause duplicate 
> data. I would be able to avoid this if I could access the ExecutionConfig to 
> check these parameters and throw early.
>
> I know that the SourceReaderContext gives me access to the Configuration, but 
> that doesn't handle the case where the execution-environment is set 
> programatically in a job definition rather than through configuration. Any 
> ideas?
>
> Thanks,
> Chris