[ANNOUNCE] New Apache Flink Committer - Hangxiang Yu

2023-08-06 Thread Yuan Mei
On behalf of the PMC, I'm happy to announce Hangxiang Yu as a new Flink
Committer.

Hangxiang has been active in the Flink community for more than 1.5 years
and has played an important role in developing and maintaining State and
Checkpoint related features/components, including Generic Incremental
Checkpoints (take great efforts to make the feature prod-ready). Hangxiang
is also the main driver of the FLIP-263: Resolving schema compatibility.

Hangxiang is passionate about the Flink community. Besides the technical
contribution above, he is also actively promoting Flink: talks about Generic
Incremental Checkpoints in Flink Forward and Meet-up. Hangxiang also spent
a good amount of time supporting users, participating in Jira/mailing list
discussions, and reviewing code.

Please join me in congratulating Hangxiang for becoming a Flink Committer!

Thanks,
Yuan Mei (on behalf of the Flink PMC)


[ANNOUNCE] New Apache Flink Committer - Yanfei Lei

2023-08-06 Thread Yuan Mei
On behalf of the PMC, I'm happy to announce Yanfei Lei as a new Flink
Committer.

Yanfei has been active in the Flink community for almost two years and has
played an important role in developing and maintaining State and Checkpoint
related features/components, including RocksDB Rescaling Performance
Improvement and Generic Incremental Checkpoints.

Yanfei also helps improve community infrastructure in many ways, including
migrating the Flink Daily performance benchmark to the Apache Flink slack
channel. She is the maintainer of the benchmark and has improved its
detection stability significantly. She is also one of the major maintainers
of the FrocksDB Repo and released FRocksDB 6.20.3 (part of Flink 1.17
release). Yanfei is a very active community member, supporting users and
participating
in tons of discussions on the mailing lists.

Please join me in congratulating Yanfei for becoming a Flink Committer!

Thanks,
Yuan Mei (on behalf of the Flink PMC)


Re: FLINK-20767 - Support for nested fields filter push down

2023-08-06 Thread Venkatakrishnan Sowrirajan
(Sorry, I pressed send too early)

Thanks for the help @zhengyunhon...@gmail.com.

Agree on not changing the API as much as possible as well as wrt
simplifying Projection pushdown with nested fields eventually as well.

In terms of the code itself, currently I am trying to leverage the
FieldReferenceExpression to also handle nested fields for filter push down.
But where I am currently struggling to make progress is, once the filters
are pushed to the table source itself, in
PushFilterIntoSourceScanRuleBase#resolveFiltersAndCreateTableSourceTable
there is a conversion from List itself.

If you have some pointers for that, please let me know. Thanks.

Regards
Venkata krishnan


On Sun, Aug 6, 2023 at 10:23 PM Venkatakrishnan Sowrirajan 
wrote:

> Thanks @zhengyunhon...@gmail.com
> Regards
> Venkata krishnan
>
>
> On Sun, Aug 6, 2023 at 6:16 PM yh z  wrote:
>
>> Hi, Venkatakrishnan,
>> I think this is a very useful feature. I have been focusing on the
>> development of the flink-table-planner module recently, so if you need
>> some
>> help, I can assist you in completing the development of some sub-tasks or
>> code review.
>>
>> Returning to the design itself, I think it's necessary to modify
>> FieldReferenceExpression or re-implement a NestedFieldReferenceExpression.
>> As for modifying the interface of SupportsProjectionPushDown, I think we
>> need to make some trade-offs. As a connector developer, the stability of
>> the interface is very important. If there are no unresolved bugs, I
>> personally do not recommend modifying the interface. However, when I first
>> read the code of SupportsProjectionPushDown, the design of int[][] was
>> very
>> confusing for me, and it took me a long time to understand it by running
>> specify UT tests. Therefore, in terms of the design of this interface and
>> the consistency between different interfaces, there is indeed room for
>> improvement it.
>>
>> Thanks,
>> Yunhong Zheng (Swuferhong)
>>
>>
>>
>>
>> Becket Qin  于2023年8月3日周四 07:44写道:
>>
>> > Hi Jark,
>> >
>> > If the FieldReferenceExpression contains an int[] to support a nested
>> field
>> > reference, List (or
>> FieldReferenceExpression[])
>> > and int[][] are actually equivalent. If we are designing this from
>> scratch,
>> > personally I prefer using List for
>> consistency,
>> > i.e. always resolving everything to expressions for users. Projection
>> is a
>> > simpler case, but should not be a special case. This avoids doing the
>> same
>> > thing in different ways which is also a confusion to the users. To me,
>> the
>> > int[][] format would become kind of a technical debt after we extend the
>> > FieldReferenceExpression. Although we don't have to address it right
>> away
>> > in the same FLIP, this kind of debt accumulates over time and makes the
>> > project harder to learn and maintain. So, personally I prefer to address
>> > these technical debts as soon as possible.
>> >
>> > Thanks,
>> >
>> > Jiangjie (Becket) Qin
>> >
>> > On Wed, Aug 2, 2023 at 8:19 PM Jark Wu  wrote:
>> >
>> > > Hi,
>> > >
>> > > I agree with Becket that we may need to extend
>> FieldReferenceExpression
>> > to
>> > > support nested field access (or maybe a new
>> > > NestedFieldReferenceExpression).
>> > > But I have some concerns about evolving the
>> > > SupportsProjectionPushDown.applyProjection.
>> > > A projection is much simpler than Filter Expression which only needs
>> to
>> > > represent the field indexes.
>> > > If we evolve `applyProjection` to accept
>> `List
>> > > projectedFields`,
>> > > users have to convert the `List` back to
>> > int[][]
>> > > which is an overhead for users.
>> > > Field indexes (int[][]) is required to project schemas with the
>> > > utility org.apache.flink.table.connector.Projection.
>> > >
>> > >
>> > > Best,
>> > > Jark
>> > >
>> > >
>> > >
>> > > On Wed, 2 Aug 2023 at 07:40, Venkatakrishnan Sowrirajan <
>> > vsowr...@asu.edu>
>> > > wrote:
>> > >
>> > > > Thanks Becket for the suggestion. That makes sense. Let me try it
>> out
>> > and
>> > > > get back to you.
>> > > >
>> > > > Regards
>> > > > Venkata krishnan
>> > > >
>> > > >
>> > > > On Tue, Aug 1, 2023 at 9:04 AM Becket Qin 
>> > wrote:
>> > > >
>> > > > > This is a very useful feature in practice.
>> > > > >
>> > > > > It looks to me that the key issue here is that Flink
>> > ResolvedExpression
>> > > > > does not have necessary abstraction for nested field access. So
>> the
>> > > > Calcite
>> > > > > RexFieldAccess does not have a counterpart in the
>> ResolvedExpression.
>> > > The
>> > > > > FieldReferenceExpression only supports direct access to the
>> fields,
>> > not
>> > > > > nested access.
>> > > > >
>> > > > > Theoretically speaking, this nested field reference is also
>> required
>> > by
>> > > > > projection pushdown. However, we addressed that by using an
>> int[][]
>> > in
>> > > > the
>> > > > > SupportsProjectionPushDown interface. Maybe we can do the
>> following:
>> > > > >
>> > > > > 1. Extend the FieldReferenceE

Re: FLINK-20767 - Support for nested fields filter push down

2023-08-06 Thread Venkatakrishnan Sowrirajan
Thanks @zhengyunhon...@gmail.com
Regards
Venkata krishnan


On Sun, Aug 6, 2023 at 6:16 PM yh z  wrote:

> Hi, Venkatakrishnan,
> I think this is a very useful feature. I have been focusing on the
> development of the flink-table-planner module recently, so if you need some
> help, I can assist you in completing the development of some sub-tasks or
> code review.
>
> Returning to the design itself, I think it's necessary to modify
> FieldReferenceExpression or re-implement a NestedFieldReferenceExpression.
> As for modifying the interface of SupportsProjectionPushDown, I think we
> need to make some trade-offs. As a connector developer, the stability of
> the interface is very important. If there are no unresolved bugs, I
> personally do not recommend modifying the interface. However, when I first
> read the code of SupportsProjectionPushDown, the design of int[][] was very
> confusing for me, and it took me a long time to understand it by running
> specify UT tests. Therefore, in terms of the design of this interface and
> the consistency between different interfaces, there is indeed room for
> improvement it.
>
> Thanks,
> Yunhong Zheng (Swuferhong)
>
>
>
>
> Becket Qin  于2023年8月3日周四 07:44写道:
>
> > Hi Jark,
> >
> > If the FieldReferenceExpression contains an int[] to support a nested
> field
> > reference, List (or FieldReferenceExpression[])
> > and int[][] are actually equivalent. If we are designing this from
> scratch,
> > personally I prefer using List for consistency,
> > i.e. always resolving everything to expressions for users. Projection is
> a
> > simpler case, but should not be a special case. This avoids doing the
> same
> > thing in different ways which is also a confusion to the users. To me,
> the
> > int[][] format would become kind of a technical debt after we extend the
> > FieldReferenceExpression. Although we don't have to address it right away
> > in the same FLIP, this kind of debt accumulates over time and makes the
> > project harder to learn and maintain. So, personally I prefer to address
> > these technical debts as soon as possible.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Wed, Aug 2, 2023 at 8:19 PM Jark Wu  wrote:
> >
> > > Hi,
> > >
> > > I agree with Becket that we may need to extend FieldReferenceExpression
> > to
> > > support nested field access (or maybe a new
> > > NestedFieldReferenceExpression).
> > > But I have some concerns about evolving the
> > > SupportsProjectionPushDown.applyProjection.
> > > A projection is much simpler than Filter Expression which only needs to
> > > represent the field indexes.
> > > If we evolve `applyProjection` to accept
> `List
> > > projectedFields`,
> > > users have to convert the `List` back to
> > int[][]
> > > which is an overhead for users.
> > > Field indexes (int[][]) is required to project schemas with the
> > > utility org.apache.flink.table.connector.Projection.
> > >
> > >
> > > Best,
> > > Jark
> > >
> > >
> > >
> > > On Wed, 2 Aug 2023 at 07:40, Venkatakrishnan Sowrirajan <
> > vsowr...@asu.edu>
> > > wrote:
> > >
> > > > Thanks Becket for the suggestion. That makes sense. Let me try it out
> > and
> > > > get back to you.
> > > >
> > > > Regards
> > > > Venkata krishnan
> > > >
> > > >
> > > > On Tue, Aug 1, 2023 at 9:04 AM Becket Qin 
> > wrote:
> > > >
> > > > > This is a very useful feature in practice.
> > > > >
> > > > > It looks to me that the key issue here is that Flink
> > ResolvedExpression
> > > > > does not have necessary abstraction for nested field access. So the
> > > > Calcite
> > > > > RexFieldAccess does not have a counterpart in the
> ResolvedExpression.
> > > The
> > > > > FieldReferenceExpression only supports direct access to the fields,
> > not
> > > > > nested access.
> > > > >
> > > > > Theoretically speaking, this nested field reference is also
> required
> > by
> > > > > projection pushdown. However, we addressed that by using an int[][]
> > in
> > > > the
> > > > > SupportsProjectionPushDown interface. Maybe we can do the
> following:
> > > > >
> > > > > 1. Extend the FieldReferenceExpression to include an int[] for
> nested
> > > > field
> > > > > access,
> > > > > 2. By doing (1),
> > > > > SupportsFilterPushDown#applyFilters(List) can
> > > support
> > > > > nested field access.
> > > > > 3. Evolve the SupportsProjectionPushDown.applyProjection(int[][]
> > > > > projectedFields, DataType producedDataType) to
> > > > > applyProjection(List projectedFields,
> > > DataType
> > > > > producedDataType)
> > > > >
> > > > > This will need a FLIP.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Tue, Aug 1, 2023 at 11:42 PM Venkatakrishnan Sowrirajan <
> > > > > vsowr...@asu.edu>
> > > > > wrote:
> > > > >
> > > > > > Thanks for the response. Looking forward to your pointers. In the
> > > > > > meanwhile, let me figure out how we can implement it. Will keep
> you
> > > > > posted.
> > > > > >
> > > > > > On Mon, Jul 31, 2023, 11:43 PM liu 

[jira] [Created] (FLINK-32766) Support kafka catalog

2023-08-06 Thread melin (Jira)
melin created FLINK-32766:
-

 Summary: Support kafka catalog
 Key: FLINK-32766
 URL: https://issues.apache.org/jira/browse/FLINK-32766
 Project: Flink
  Issue Type: New Feature
Reporter: melin


Support kafka catalog
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32765) create view should reuse calcite tree

2023-08-06 Thread xiaogang zhou (Jira)
xiaogang zhou created FLINK-32765:
-

 Summary: create view should reuse calcite tree
 Key: FLINK-32765
 URL: https://issues.apache.org/jira/browse/FLINK-32765
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Affects Versions: 1.16.1
Reporter: xiaogang zhou






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-334 : Decoupling autoscaler and kubernetes

2023-08-06 Thread Rui Fan
Hi Ron:

Thanks for the feedback! The goal is indeed to turn the autoscaler into
a general tool that can support other resource management.


Hi Max, Gyula:

My proposed `AutoScalerStateStore` is similar to Map, it can really be
improved.

> public interface AutoScalerStateStore {
> Map getState(KEY jobKey)
> void updateState(KEY jobKey, Map state);
> }

>From the method parameter, the StateStore is shared by all jobs, right?
If yes, the `KEY jobKey` isn't enough because the CR is needed during
creating the ConfigMap. The `jobKey` is ResourceID, the extraJobInfo is CR.

So, this parameter may need to be changed from `KEY jobKey` to
`JobAutoScalerContext`. Does that make sense?
If yes, I can update the interface in the FLIP doc.

Best,
Rui

On Mon, Aug 7, 2023 at 10:18 AM liu ron  wrote:

> Hi, Rui
>
> Thanks for driving the FLIP.
>
> The tuning of streaming jobs by autoscaler is very important. Although the
> mainstream trend now is cloud-native, many companies still run their Flink
> jobs on Yarn for historical reasons. If we can decouple autoscaler from K8S
> and turn it into a common tool that can support other resource management
> frameworks such as Yarn, I think it will be very meaningful.
> +1 for this proposal.
>
> Best,
> Ron
>
>
> Gyula Fóra  于2023年8月5日周六 15:03写道:
>
> > Hi Rui!
> >
> > Thanks for the proposal.
> >
> > I agree with Max on that the state store abstractions could be improved
> and
> > be more specific as we know what goes into the state. It could simply be
> >
> > public interface AutoScalerStateStore {
> > Map getState(KEY jobKey)
> > void updateState(KEY jobKey, Map state);
> > }
> >
> >
> > We could also remove the entire recommended parallelism logic from the
> > interface and make it internal to the implementation somehow because it's
> > not very nice in the current form.
> >
> > Cheers,
> > Gyula
> >
> > On Fri, Aug 4, 2023 at 7:05 AM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > > Hi Max,
> > >
> > > After careful consideration, I prefer to keep the AutoScalerStateStore
> > > instead of AutoScalerEventHandler taking over the work of
> > > AutoScalerStateStore. And the following are some reasons:
> > >
> > > 1. Keeping the AutoScalerStateStore to make StateStore easy to plug in.
> > >
> > > Currently, the kubernetes-operator-autoscaler uses the ConfigMap as the
> > > state store. However, users may use a different state store for
> > > yarn-autoscaler or generic autoscaler. Such as: MySQL StateStore,
> > > Heaped StateStore or PostgreSQL StateStore, etc.
> > >
> > > Of course, kubernetes autoscaler can also use the MySQL StateStore.
> > > If the AutoScalerEventHandler is responsible for recording events,
> > > scaling job and accessing state, whenever users or community want to
> > > create a new state store, they must also implement the new
> > > AutoScalerEventHandler, it includes recording events and scaling job.
> > >
> > > If we decouple AutoScalerEventHandler and AutoScalerStateStore,
> > > it's easy to implement a new state store.
> > >
> > > 2. AutoScalerEventHandler isn't suitable for access state.
> > >
> > > Sometimes the generic autoscaler needs to query state,
> > > AutoScalerEventHandler can update the state during handling events.
> > > However, it's wired to provide a series of interfaces to query state.
> > >
> > > What do you think?
> > >
> > > And looking forward to more thoughts from the community, thanks!
> > >
> > > Best,
> > > Rui Fan
> > >
> > > On Tue, Aug 1, 2023 at 11:47 PM Rui Fan <1996fan...@gmail.com> wrote:
> > >
> > > > Hi Max,
> > > >
> > > > Thanks for your quick response!
> > > >
> > > > > 1. Handle state in the AutoScalerEventHandler which will receive
> > > > > all related scaling and metric collection events, and can keep
> > > > > track of any state.
> > > >
> > > > If I understand correctly, you mean that updating state is just part
> of
> > > > handling events, right?
> > > >
> > > > If yes, sounds make sense. However, I have some concerns:
> > > >
> > > > - Currently, we have 3 key-values that need to be stored. And the
> > > > autoscaler needs to get them first, then update them, and sometimes
> > > > remove them. If we use AutoScalerEventHandler, we need to provided
> > > > 9 methods, right? Every key has 3 methods.
> > > > - Do we add the persistState interface for AutoScalerEventHandler to
> > > > persist in-memory state to kubernetes?
> > > >
> > > > > 2. In the long run, the autoscaling logic can move to a
> > > > > separate repository, although this will complicate the release
> > > > > process, so I would defer this unless there is strong interest.
> > > >
> > > > I also agree to leave it in flink-k8s-operator for now. Unless moving
> > it
> > > > out of flink-k8s-operator is necessary in the future.
> > > >
> > > > > 3. The proposal mentions some removal of tests.
> > > >
> > > > Sorry, I didn't express clearly in FLIP. POC just check whether these
> > > > interfaces work well. It will take more time if I deve

[jira] [Created] (FLINK-32764) SlotManager supports pulling up all TaskManagers at initialization

2023-08-06 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-32764:


 Summary: SlotManager supports pulling up all TaskManagers at 
initialization 
 Key: FLINK-32764
 URL: https://issues.apache.org/jira/browse/FLINK-32764
 Project: Flink
  Issue Type: Sub-task
Reporter: xiangyu feng


For OLAP session clusters, It is better to pull all TM when the cluster starts, 
rather than waiting for the job to come and assign it later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-334 : Decoupling autoscaler and kubernetes

2023-08-06 Thread liu ron
Hi, Rui

Thanks for driving the FLIP.

The tuning of streaming jobs by autoscaler is very important. Although the
mainstream trend now is cloud-native, many companies still run their Flink
jobs on Yarn for historical reasons. If we can decouple autoscaler from K8S
and turn it into a common tool that can support other resource management
frameworks such as Yarn, I think it will be very meaningful.
+1 for this proposal.

Best,
Ron


Gyula Fóra  于2023年8月5日周六 15:03写道:

> Hi Rui!
>
> Thanks for the proposal.
>
> I agree with Max on that the state store abstractions could be improved and
> be more specific as we know what goes into the state. It could simply be
>
> public interface AutoScalerStateStore {
> Map getState(KEY jobKey)
> void updateState(KEY jobKey, Map state);
> }
>
>
> We could also remove the entire recommended parallelism logic from the
> interface and make it internal to the implementation somehow because it's
> not very nice in the current form.
>
> Cheers,
> Gyula
>
> On Fri, Aug 4, 2023 at 7:05 AM Rui Fan <1996fan...@gmail.com> wrote:
>
> > Hi Max,
> >
> > After careful consideration, I prefer to keep the AutoScalerStateStore
> > instead of AutoScalerEventHandler taking over the work of
> > AutoScalerStateStore. And the following are some reasons:
> >
> > 1. Keeping the AutoScalerStateStore to make StateStore easy to plug in.
> >
> > Currently, the kubernetes-operator-autoscaler uses the ConfigMap as the
> > state store. However, users may use a different state store for
> > yarn-autoscaler or generic autoscaler. Such as: MySQL StateStore,
> > Heaped StateStore or PostgreSQL StateStore, etc.
> >
> > Of course, kubernetes autoscaler can also use the MySQL StateStore.
> > If the AutoScalerEventHandler is responsible for recording events,
> > scaling job and accessing state, whenever users or community want to
> > create a new state store, they must also implement the new
> > AutoScalerEventHandler, it includes recording events and scaling job.
> >
> > If we decouple AutoScalerEventHandler and AutoScalerStateStore,
> > it's easy to implement a new state store.
> >
> > 2. AutoScalerEventHandler isn't suitable for access state.
> >
> > Sometimes the generic autoscaler needs to query state,
> > AutoScalerEventHandler can update the state during handling events.
> > However, it's wired to provide a series of interfaces to query state.
> >
> > What do you think?
> >
> > And looking forward to more thoughts from the community, thanks!
> >
> > Best,
> > Rui Fan
> >
> > On Tue, Aug 1, 2023 at 11:47 PM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > > Hi Max,
> > >
> > > Thanks for your quick response!
> > >
> > > > 1. Handle state in the AutoScalerEventHandler which will receive
> > > > all related scaling and metric collection events, and can keep
> > > > track of any state.
> > >
> > > If I understand correctly, you mean that updating state is just part of
> > > handling events, right?
> > >
> > > If yes, sounds make sense. However, I have some concerns:
> > >
> > > - Currently, we have 3 key-values that need to be stored. And the
> > > autoscaler needs to get them first, then update them, and sometimes
> > > remove them. If we use AutoScalerEventHandler, we need to provided
> > > 9 methods, right? Every key has 3 methods.
> > > - Do we add the persistState interface for AutoScalerEventHandler to
> > > persist in-memory state to kubernetes?
> > >
> > > > 2. In the long run, the autoscaling logic can move to a
> > > > separate repository, although this will complicate the release
> > > > process, so I would defer this unless there is strong interest.
> > >
> > > I also agree to leave it in flink-k8s-operator for now. Unless moving
> it
> > > out of flink-k8s-operator is necessary in the future.
> > >
> > > > 3. The proposal mentions some removal of tests.
> > >
> > > Sorry, I didn't express clearly in FLIP. POC just check whether these
> > > interfaces work well. It will take more time if I develop all the tests
> > > during POC. So I removed these tests in my POC.
> > >
> > > These tests will be completed in the final PR, and the test is very
> > useful
> > > for less bugs.
> > >
> > > Best,
> > > Rui Fan
> > >
> > > On Tue, Aug 1, 2023 at 10:10 PM Maximilian Michels 
> > wrote:
> > >
> > >> Hi Rui,
> > >>
> > >> Thanks for the proposal. I think it makes a lot of sense to decouple
> > >> the autoscaler from Kubernetes-related dependencies. A couple of notes
> > >> when I read the proposal:
> > >>
> > >> 1. You propose AutoScalerEventHandler, AutoScalerStateStore,
> > >> AutoScalerStateStoreFactory, and AutoScalerEventHandler.
> > >> AutoscalerStateStore is a generic key/value database (methods:
> > >> "get"/"put"/"delete"). I would propose to refine this interface and
> > >> make it less general purpose, e.g. add a method for persisting scaling
> > >> decisions as well as any metrics gathered for the current metric
> > >> window. For simplicity, I'd even go so far to remove the state store
> > >

Re: [ANNOUNCE] New Apache Flink Committer - Weihua Hu

2023-08-06 Thread Yuepeng Pan



Congratulations, Weihua!

Best,
Yuepeng Pan





在 2023-08-07 09:17:41,"yh z"  写道:
>Congratulations, Weihua!
>
>Best,
>Yunhong Zheng (Swuferhong)
>
>Runkang He  于2023年8月5日周六 21:34写道:
>
>> Congratulations, Weihua!
>>
>> Best,
>> Runkang He
>>
>> Kelu Tao  于2023年8月4日周五 18:21写道:
>>
>> > Congratulations!
>> >
>> > On 2023/08/04 08:35:49 Danny Cranmer wrote:
>> > > Congrats and welcome to the team, Weihua!
>> > >
>> > > Thanks,
>> > > Danny
>> > >
>> > > On Fri, Aug 4, 2023 at 9:30 AM Feng Jin  wrote:
>> > >
>> > > > Congratulations Weihua!
>> > > >
>> > > > Best regards,
>> > > >
>> > > > Feng
>> > > >
>> > > > On Fri, Aug 4, 2023 at 4:28 PM weijie guo > >
>> > > > wrote:
>> > > >
>> > > > > Congratulations Weihua!
>> > > > >
>> > > > > Best regards,
>> > > > >
>> > > > > Weijie
>> > > > >
>> > > > >
>> > > > > Lijie Wang  于2023年8月4日周五 15:28写道:
>> > > > >
>> > > > > > Congratulations, Weihua!
>> > > > > >
>> > > > > > Best,
>> > > > > > Lijie
>> > > > > >
>> > > > > > yuxia  于2023年8月4日周五 15:14写道:
>> > > > > >
>> > > > > > > Congratulations, Weihua!
>> > > > > > >
>> > > > > > > Best regards,
>> > > > > > > Yuxia
>> > > > > > >
>> > > > > > > - 原始邮件 -
>> > > > > > > 发件人: "Yun Tang" 
>> > > > > > > 收件人: "dev" 
>> > > > > > > 发送时间: 星期五, 2023年 8 月 04日 下午 3:05:30
>> > > > > > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Weihua Hu
>> > > > > > >
>> > > > > > > Congratulations, Weihua!
>> > > > > > >
>> > > > > > >
>> > > > > > > Best
>> > > > > > > Yun Tang
>> > > > > > > 
>> > > > > > > From: Jark Wu 
>> > > > > > > Sent: Friday, August 4, 2023 15:00
>> > > > > > > To: dev@flink.apache.org 
>> > > > > > > Subject: Re: [ANNOUNCE] New Apache Flink Committer - Weihua Hu
>> > > > > > >
>> > > > > > > Congratulations, Weihua!
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > Jark
>> > > > > > >
>> > > > > > > On Fri, 4 Aug 2023 at 14:48, Yuxin Tan > >
>> > > > wrote:
>> > > > > > >
>> > > > > > > > Congratulations Weihua!
>> > > > > > > >
>> > > > > > > > Best,
>> > > > > > > > Yuxin
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Junrui Lee  于2023年8月4日周五 14:28写道:
>> > > > > > > >
>> > > > > > > > > Congrats, Weihua!
>> > > > > > > > > Best,
>> > > > > > > > > Junrui
>> > > > > > > > >
>> > > > > > > > > Geng Biao  于2023年8月4日周五 14:25写道:
>> > > > > > > > >
>> > > > > > > > > > Congrats, Weihua!
>> > > > > > > > > > Best,
>> > > > > > > > > > Biao Geng
>> > > > > > > > > >
>> > > > > > > > > > 发送自 Outlook for iOS
>> > > > > > > > > > 
>> > > > > > > > > > 发件人: 周仁祥 
>> > > > > > > > > > 发送时间: Friday, August 4, 2023 2:23:42 PM
>> > > > > > > > > > 收件人: dev@flink.apache.org 
>> > > > > > > > > > 抄送: Weihua Hu 
>> > > > > > > > > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Weihua Hu
>> > > > > > > > > >
>> > > > > > > > > > Congratulations, Weihua~
>> > > > > > > > > >
>> > > > > > > > > > > 2023年8月4日 14:21,Sergey Nuyanzin 
>> > 写道:
>> > > > > > > > > > >
>> > > > > > > > > > > Congratulations, Weihua!
>> > > > > > > > > > >
>> > > > > > > > > > > On Fri, Aug 4, 2023 at 8:03 AM Chen Zhanghao <
>> > > > > > > > > zhanghao.c...@outlook.com>
>> > > > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > >> Congratulations, Weihua!
>> > > > > > > > > > >>
>> > > > > > > > > > >> Best,
>> > > > > > > > > > >> Zhanghao Chen
>> > > > > > > > > > >> 
>> > > > > > > > > > >> 发件人: Xintong Song 
>> > > > > > > > > > >> 发送时间: 2023年8月4日 11:18
>> > > > > > > > > > >> 收件人: dev 
>> > > > > > > > > > >> 抄送: Weihua Hu 
>> > > > > > > > > > >> 主题: [ANNOUNCE] New Apache Flink Committer - Weihua Hu
>> > > > > > > > > > >>
>> > > > > > > > > > >> Hi everyone,
>> > > > > > > > > > >>
>> > > > > > > > > > >> On behalf of the PMC, I'm very happy to announce
>> Weihua
>> > Hu
>> > > > as
>> > > > > a
>> > > > > > > new
>> > > > > > > > > > Flink
>> > > > > > > > > > >> Committer!
>> > > > > > > > > > >>
>> > > > > > > > > > >> Weihua has been consistently contributing to the
>> project
>> > > > since
>> > > > > > May
>> > > > > > > > > > 2022. He
>> > > > > > > > > > >> mainly works in Flink's distributed coordination
>> areas.
>> > He
>> > > > is
>> > > > > > the
>> > > > > > > > main
>> > > > > > > > > > >> contributor of FLIP-298 and many other improvements in
>> > > > > > large-scale
>> > > > > > > > job
>> > > > > > > > > > >> scheduling and improvements. He is also quite active
>> in
>> > > > > mailing
>> > > > > > > > lists,
>> > > > > > > > > > >> participating discussions and answering user
>> questions.
>> > > > > > > > > > >>
>> > > > > > > > > > >> Please join me in congratulating Weihua!
>> > > > > > > > > > >>
>> > > > > > > > > > >> Best,
>> > > > > > > > > > >>
>> > > > > > > > > > >> Xintong (on behalf of the Apache Flink PMC)
>> > > > > > > > > > >>
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > --
>> > > > > > > > > > > Best regards,
>> > > > > > > > > > > S

Re: Projection pushdown for Avro files seems to be buggy

2023-08-06 Thread Xingcan Cui
After rechecking it, I realized that some of my changes broke the expected
schema passed to GenericDatumReader#getResolver. The logic in Flink
codebase is okay and we should only read a portion of the Avro record.

Thanks, Xingcan

On Sun, Aug 6, 2023 at 2:31 PM liu ron  wrote:

> Hi, Xingcan
>
> After deep dive into the source code, I also think it is a bug.
>
> Best,
> Ron
>
> Xingcan Cui  于2023年8月5日周六 23:27写道:
>
> > Hi all,
> >
> > We tried to read some Avro files with the Flink SQL (1.16.1) and noticed
> > that the projection pushdown seems to be buggy.
> >
> > The Avro schema we used has 4 fields, namely f1, f2, f3 and f4. When
> using
> > "SELECT *" or SELECT the first n fields (e.g., SELECT f1 or SELECT f1,
> f2)
> > to read the table, it works fine. However, if we query an arbitrary field
> > other than f1, a data & converter mismatch exception will show.
> >
> > After digging into the code, I figured out that `AvroFileFormatFactory`
> > generates a `producedDataType` with projection push down. When generating
> > AvroToRowDataConverters, it only considers the selected fields and
> > generates converters for them. However, the records read by the
> > DataFileReader contain all fields.
> >
> > Specifically, for the following code snippet from
> AvroToRowDataConverters,
> > the fieldConverters contains only the converters for the selected fields
> > but the record contains all fields which leads to a converters & data
> > fields mismatch problem. That also explains why selecting the first n
> > fields works (It's because the converters & data fields happen to match).
> >
> > ```
> > return avroObject -> {
> >   IndexedRecord record = (IndexedRecord) avroObject;
> >   GenericRowData row = new GenericRowData(arity);
> >   for (int i = 0; i < arity; ++i) {
> > // avro always deserialize successfully even though the type
> isn't
> > matched
> > // so no need to throw exception about which field can't be
> > deserialized
> > row.setField(i, fieldConverters[i].convert(record.get(i)));
> >   }
> >   return row;
> > };
> > ```
> >
> > Not sure if any of you hit this before. If it's confirmed to be a bug,
> I'll
> > file a ticket and try to fix it.
> >
> > Best,
> > Xingcan
> >
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Matthias Pohl

2023-08-06 Thread Andriy Redko
Congrats Matthias, well deserved!!

DC> Congrats Matthias!

DC> Very well deserved, thankyou for your continuous, consistent contributions.
DC> Welcome.

DC> Thanks,
DC> Danny

DC> On Fri, Aug 4, 2023 at 9:30 AM Feng Jin  wrote:

>> Congratulations, Matthias!
>>
>> Best regards
>>
>> Feng
>>
>> On Fri, Aug 4, 2023 at 4:29 PM weijie guo 
>> wrote:
>>
>> > Congratulations, Matthias!
>> >
>> > Best regards,
>> >
>> > Weijie
>> >
>> >
>> > Wencong Liu  于2023年8月4日周五 15:50写道:
>> >
>> > > Congratulations, Matthias!
>> > >
>> > > Best,
>> > > Wencong Liu
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > At 2023-08-04 11:18:00, "Xintong Song"  wrote:
>> > > >Hi everyone,
>> > > >
>> > > >On behalf of the PMC, I'm very happy to announce that Matthias Pohl
>> has
>> > > >joined the Flink PMC!
>> > > >
>> > > >Matthias has been consistently contributing to the project since Sep
>> > 2020,
>> > > >and became a committer in Dec 2021. He mainly works in Flink's
>> > distributed
>> > > >coordination and high availability areas. He has worked on many FLIPs
>> > > >including FLIP195/270/285. He helped a lot with the release
>> management,
>> > > >being one of the Flink 1.17 release managers and also very active in
>> > Flink
>> > > >1.18 / 2.0 efforts. He also contributed a lot to improving the build
>> > > >stability.
>> > > >
>> > > >Please join me in congratulating Matthias!
>> > > >
>> > > >Best,
>> > > >
>> > > >Xintong (on behalf of the Apache Flink PMC)
>> > >
>> >
>>



Re: [ANNOUNCE] New Apache Flink Committer - Weihua Hu

2023-08-06 Thread yh z
Congratulations, Weihua!

Best,
Yunhong Zheng (Swuferhong)

Runkang He  于2023年8月5日周六 21:34写道:

> Congratulations, Weihua!
>
> Best,
> Runkang He
>
> Kelu Tao  于2023年8月4日周五 18:21写道:
>
> > Congratulations!
> >
> > On 2023/08/04 08:35:49 Danny Cranmer wrote:
> > > Congrats and welcome to the team, Weihua!
> > >
> > > Thanks,
> > > Danny
> > >
> > > On Fri, Aug 4, 2023 at 9:30 AM Feng Jin  wrote:
> > >
> > > > Congratulations Weihua!
> > > >
> > > > Best regards,
> > > >
> > > > Feng
> > > >
> > > > On Fri, Aug 4, 2023 at 4:28 PM weijie guo  >
> > > > wrote:
> > > >
> > > > > Congratulations Weihua!
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Weijie
> > > > >
> > > > >
> > > > > Lijie Wang  于2023年8月4日周五 15:28写道:
> > > > >
> > > > > > Congratulations, Weihua!
> > > > > >
> > > > > > Best,
> > > > > > Lijie
> > > > > >
> > > > > > yuxia  于2023年8月4日周五 15:14写道:
> > > > > >
> > > > > > > Congratulations, Weihua!
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Yuxia
> > > > > > >
> > > > > > > - 原始邮件 -
> > > > > > > 发件人: "Yun Tang" 
> > > > > > > 收件人: "dev" 
> > > > > > > 发送时间: 星期五, 2023年 8 月 04日 下午 3:05:30
> > > > > > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Weihua Hu
> > > > > > >
> > > > > > > Congratulations, Weihua!
> > > > > > >
> > > > > > >
> > > > > > > Best
> > > > > > > Yun Tang
> > > > > > > 
> > > > > > > From: Jark Wu 
> > > > > > > Sent: Friday, August 4, 2023 15:00
> > > > > > > To: dev@flink.apache.org 
> > > > > > > Subject: Re: [ANNOUNCE] New Apache Flink Committer - Weihua Hu
> > > > > > >
> > > > > > > Congratulations, Weihua!
> > > > > > >
> > > > > > > Best,
> > > > > > > Jark
> > > > > > >
> > > > > > > On Fri, 4 Aug 2023 at 14:48, Yuxin Tan  >
> > > > wrote:
> > > > > > >
> > > > > > > > Congratulations Weihua!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yuxin
> > > > > > > >
> > > > > > > >
> > > > > > > > Junrui Lee  于2023年8月4日周五 14:28写道:
> > > > > > > >
> > > > > > > > > Congrats, Weihua!
> > > > > > > > > Best,
> > > > > > > > > Junrui
> > > > > > > > >
> > > > > > > > > Geng Biao  于2023年8月4日周五 14:25写道:
> > > > > > > > >
> > > > > > > > > > Congrats, Weihua!
> > > > > > > > > > Best,
> > > > > > > > > > Biao Geng
> > > > > > > > > >
> > > > > > > > > > 发送自 Outlook for iOS
> > > > > > > > > > 
> > > > > > > > > > 发件人: 周仁祥 
> > > > > > > > > > 发送时间: Friday, August 4, 2023 2:23:42 PM
> > > > > > > > > > 收件人: dev@flink.apache.org 
> > > > > > > > > > 抄送: Weihua Hu 
> > > > > > > > > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Weihua Hu
> > > > > > > > > >
> > > > > > > > > > Congratulations, Weihua~
> > > > > > > > > >
> > > > > > > > > > > 2023年8月4日 14:21,Sergey Nuyanzin 
> > 写道:
> > > > > > > > > > >
> > > > > > > > > > > Congratulations, Weihua!
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Aug 4, 2023 at 8:03 AM Chen Zhanghao <
> > > > > > > > > zhanghao.c...@outlook.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> Congratulations, Weihua!
> > > > > > > > > > >>
> > > > > > > > > > >> Best,
> > > > > > > > > > >> Zhanghao Chen
> > > > > > > > > > >> 
> > > > > > > > > > >> 发件人: Xintong Song 
> > > > > > > > > > >> 发送时间: 2023年8月4日 11:18
> > > > > > > > > > >> 收件人: dev 
> > > > > > > > > > >> 抄送: Weihua Hu 
> > > > > > > > > > >> 主题: [ANNOUNCE] New Apache Flink Committer - Weihua Hu
> > > > > > > > > > >>
> > > > > > > > > > >> Hi everyone,
> > > > > > > > > > >>
> > > > > > > > > > >> On behalf of the PMC, I'm very happy to announce
> Weihua
> > Hu
> > > > as
> > > > > a
> > > > > > > new
> > > > > > > > > > Flink
> > > > > > > > > > >> Committer!
> > > > > > > > > > >>
> > > > > > > > > > >> Weihua has been consistently contributing to the
> project
> > > > since
> > > > > > May
> > > > > > > > > > 2022. He
> > > > > > > > > > >> mainly works in Flink's distributed coordination
> areas.
> > He
> > > > is
> > > > > > the
> > > > > > > > main
> > > > > > > > > > >> contributor of FLIP-298 and many other improvements in
> > > > > > large-scale
> > > > > > > > job
> > > > > > > > > > >> scheduling and improvements. He is also quite active
> in
> > > > > mailing
> > > > > > > > lists,
> > > > > > > > > > >> participating discussions and answering user
> questions.
> > > > > > > > > > >>
> > > > > > > > > > >> Please join me in congratulating Weihua!
> > > > > > > > > > >>
> > > > > > > > > > >> Best,
> > > > > > > > > > >>
> > > > > > > > > > >> Xintong (on behalf of the Apache Flink PMC)
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Best regards,
> > > > > > > > > > > Sergey
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: FLINK-20767 - Support for nested fields filter push down

2023-08-06 Thread yh z
Hi, Venkatakrishnan,
I think this is a very useful feature. I have been focusing on the
development of the flink-table-planner module recently, so if you need some
help, I can assist you in completing the development of some sub-tasks or
code review.

Returning to the design itself, I think it's necessary to modify
FieldReferenceExpression or re-implement a NestedFieldReferenceExpression.
As for modifying the interface of SupportsProjectionPushDown, I think we
need to make some trade-offs. As a connector developer, the stability of
the interface is very important. If there are no unresolved bugs, I
personally do not recommend modifying the interface. However, when I first
read the code of SupportsProjectionPushDown, the design of int[][] was very
confusing for me, and it took me a long time to understand it by running
specify UT tests. Therefore, in terms of the design of this interface and
the consistency between different interfaces, there is indeed room for
improvement it.

Thanks,
Yunhong Zheng (Swuferhong)




Becket Qin  于2023年8月3日周四 07:44写道:

> Hi Jark,
>
> If the FieldReferenceExpression contains an int[] to support a nested field
> reference, List (or FieldReferenceExpression[])
> and int[][] are actually equivalent. If we are designing this from scratch,
> personally I prefer using List for consistency,
> i.e. always resolving everything to expressions for users. Projection is a
> simpler case, but should not be a special case. This avoids doing the same
> thing in different ways which is also a confusion to the users. To me, the
> int[][] format would become kind of a technical debt after we extend the
> FieldReferenceExpression. Although we don't have to address it right away
> in the same FLIP, this kind of debt accumulates over time and makes the
> project harder to learn and maintain. So, personally I prefer to address
> these technical debts as soon as possible.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Aug 2, 2023 at 8:19 PM Jark Wu  wrote:
>
> > Hi,
> >
> > I agree with Becket that we may need to extend FieldReferenceExpression
> to
> > support nested field access (or maybe a new
> > NestedFieldReferenceExpression).
> > But I have some concerns about evolving the
> > SupportsProjectionPushDown.applyProjection.
> > A projection is much simpler than Filter Expression which only needs to
> > represent the field indexes.
> > If we evolve `applyProjection` to accept `List
> > projectedFields`,
> > users have to convert the `List` back to
> int[][]
> > which is an overhead for users.
> > Field indexes (int[][]) is required to project schemas with the
> > utility org.apache.flink.table.connector.Projection.
> >
> >
> > Best,
> > Jark
> >
> >
> >
> > On Wed, 2 Aug 2023 at 07:40, Venkatakrishnan Sowrirajan <
> vsowr...@asu.edu>
> > wrote:
> >
> > > Thanks Becket for the suggestion. That makes sense. Let me try it out
> and
> > > get back to you.
> > >
> > > Regards
> > > Venkata krishnan
> > >
> > >
> > > On Tue, Aug 1, 2023 at 9:04 AM Becket Qin 
> wrote:
> > >
> > > > This is a very useful feature in practice.
> > > >
> > > > It looks to me that the key issue here is that Flink
> ResolvedExpression
> > > > does not have necessary abstraction for nested field access. So the
> > > Calcite
> > > > RexFieldAccess does not have a counterpart in the ResolvedExpression.
> > The
> > > > FieldReferenceExpression only supports direct access to the fields,
> not
> > > > nested access.
> > > >
> > > > Theoretically speaking, this nested field reference is also required
> by
> > > > projection pushdown. However, we addressed that by using an int[][]
> in
> > > the
> > > > SupportsProjectionPushDown interface. Maybe we can do the following:
> > > >
> > > > 1. Extend the FieldReferenceExpression to include an int[] for nested
> > > field
> > > > access,
> > > > 2. By doing (1),
> > > > SupportsFilterPushDown#applyFilters(List) can
> > support
> > > > nested field access.
> > > > 3. Evolve the SupportsProjectionPushDown.applyProjection(int[][]
> > > > projectedFields, DataType producedDataType) to
> > > > applyProjection(List projectedFields,
> > DataType
> > > > producedDataType)
> > > >
> > > > This will need a FLIP.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Tue, Aug 1, 2023 at 11:42 PM Venkatakrishnan Sowrirajan <
> > > > vsowr...@asu.edu>
> > > > wrote:
> > > >
> > > > > Thanks for the response. Looking forward to your pointers. In the
> > > > > meanwhile, let me figure out how we can implement it. Will keep you
> > > > posted.
> > > > >
> > > > > On Mon, Jul 31, 2023, 11:43 PM liu ron  wrote:
> > > > >
> > > > > > Hi, Venkata
> > > > > >
> > > > > > Thanks for reporting this issue. Currently, Flink doesn't support
> > > > nested
> > > > > > filter pushdown. I also think that this optimization would be
> > useful,
> > > > > > especially for jobs, which may need to read a lot of data from
> the
> > > > > parquet
> > > > > > or orc file. We didn't move forward wi

Re: [VOTE] Release flink-connector-mongodb v1.0.2, release candidate 1

2023-08-06 Thread Ahmed Hamdy
Thanks Danny
+ 1 (non-binding)

* Signature and checksums are matching.
* Source Code builds locally.
* Web PR looks good.

Best Regards
Ahmed Hamdy


On Fri, 4 Aug 2023 at 17:44, Jiabao Sun 
wrote:

> Thanks Danny,
>
> +1 (non-binding)
>
> - Build and compile the source code locally
> - Tag exists in Github
> - Checked release notes
> - Source archive signature/checksum looks good
> - Binary (from Maven) signature/checksum looks good
> - Checked the contents contains jar and pom files in apache repo
>
> Non blocking findings:
> - NOTICE files year is 2022 and needs to be updated to 2023
>
> Best,
> Jiabao
>
>
> > 2023年8月4日 下午10:10,Danny Cranmer  写道:
> >
> > Hi everyone,
> > Please review and vote on the release candidate 1 for the version v1.0.2
> of
> > flink-connector-mongodb, as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> [2],
> > which are signed with the key with fingerprint
> > 0F79F2AFB2351BC29678544591F9C1EC125FD8DB [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag v1.0.2-rc1 [5],
> > * website pull request listing the new release [6].
> > * successful CI run on this tag [7].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Danny
> >
> > [1]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12353146
> > [2]
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-mongodb-1.0.2-rc1
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1647/
> > [5]
> >
> https://github.com/apache/flink-connector-mongodb/releases/tag/v1.0.2-rc1
> > [6] https://github.com/apache/flink-web/pull/669
> > [7]
> >
> https://github.com/apache/flink-connector-mongodb/actions/runs/5762820757
>
>