date:20230824

Re: [Re-DISCUSS] FLIP-202: Introduce ClickHouse Connector

2023-08-24 Thread Sergey Nuyanzin

Hi Conrad,

Thanks for driving it.

In general I'm +1
however I have a couple of questions

1. From FLIP's description it is not clear whether this connector will
support Async sink [1] or not, could you please clarify it?
2. Also may be a bit more details in FLIP's page would helpful  why a new
connector required instead of existing ClickHouse connector [2]

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink
[2] https://github.com/itinycheng/flink-connector-clickhouse

On Thu, Aug 24, 2023 at 8:53 AM Jing Ge  wrote:

> Hi Conrad,
>
> Thanks for driving it! +1 for starting a new round of discussion based on
> the input you wrote in the comment. Look forward to the new design.
>
> Best regards,
> Jing
>
> On Thu, Aug 24, 2023 at 8:00 AM ConradJam  wrote:
>
> > Hi Community
> >
> > I want to re-initiate the discussion related to FLIP-202. This is a
> > discussion related to the Clickhouse connector. Some previous discussions
> > are summarized in [1]
> >
> > ● We need an external code warehouse to store this part of code, as
> > @Martijn Visser said flink-clickhouse-connector to create
> > ● Discuss the current Flink write or sink and clickhouse need to solve
> the
> > common problems of users
> >
> >
> > Now I want to redesign this part of the function. The relevant FLIP will
> be
> > updated after collecting opinions. Welcome to join the discussion
> >
> > [1] FLINK-26999 
> > ：Introduce
> > ClickHouse Connector
> >
> >
> > --
> > Best
> >
> > ConradJam
> >
>


-- 
Best regards,
Sergey

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

2023-08-24 Thread Jark Wu

Hi Becket,

I think it is the second case, that a FieldReferenceExpression is
constructed
by the framework and passed to the connector (interfaces listed by
Venkata[1]
and Catalog#listPartitionsByFilter). Besides, understanding the nested
field
is optional for users/connectors (just treat it as an unknown expression if
the
connector doesn't want to support it).

If we extend FieldReferenceExpression, in the case of "where col.nested >
10",
for the connectors already supported filter/delete pushdown, they may
wrongly
pushdown "col > 10" instead of "nested > 10" because they still treat
FieldReferenceExpression as a top-level column. This problem can be
resolved
by introducing an additional "supportedNestedPushdown" for each interface,
but that method is not elegant and is hard to remove in the future, and
this could
be avoided if we have a separate NestedFieldReferenceExpression.

If we want to extend FieldReferenceExpression, we have to add protections
for every related API in one shot. Besides, FieldReferenceExpression is a
fundamental class in the planner, we have to go through all the code that
is using it to make sure it properly handling it if it is a nested field
which
is a big effort for the community.

If we were designing this API on day 1, I fully support merging them in a
FieldReferenceExpression. But in this case, I'm thinking about how to
provide
users with a smooth migration path, and allow the community to gradually
put efforts into evolving the API, and not block the "Nested Fields Filter
Pushdown"
requirement.

How about having a separate NestedFieldReferenceExpression, and
abstracting a common base class "ReferenceExpression" for
NestedFieldReferenceExpression and FieldReferenceExpression? This makes
unifying expressions in
"SupportsProjectionPushdown#applyProjections(List
...)"
possible.

Best,
Jark

On Thu, 24 Aug 2023 at 07:00, Venkatakrishnan Sowrirajan 
wrote:

> Becket and Jark,
>
>  Deprecate all the other
> > methods except tryApplyFilters() and tryApplyProjections().
>
> For *SupportsProjectionPushDown*, we still need a
> *supportsNestedProjections* API on the table source as some of the table
> sources might not be able to handle nested fields and therefore the Flink
> planner should not push down the nested projections or else the
> *applyProjection
> *API has to be appropriately changed to return
> *unconvertibleProjections *similar
> to *SupportsFilterPushDown*.
>
> Or we have to introduce two different applyProjections()
> > methods for FieldReferenceExpression / NestedFieldReferenceExpression
> > respectively.
>
> Agree this is not preferred. Given that *supportNestedProjections *cannot
> be deprecated/removed based on the current API form, extending
> *FieldReferenceExpression* to support nested fields should be okay.
>
> Another alternative could be to change *applyProjections *to take
> List and on the connector side they choose to handle
> *FieldReferenceExpression* and *NestedFieldReferenceExpression *as
> applicable and return the remainingProjections. In the case of nested field
> projections not supported, it should return them back but only projecting
> the top level fields. IMO, this is also *not preferred*.
>
> *SupportsAggregatePushDown*
>
> *AggregateExpression *currently takes in a list of
> *FieldReferenceExpression* as args for the aggregate function, if in future
> *SupportsAggregatePushDown* adds support for aggregate pushdown on nested
> fields then the AggregateExpression API also has to change if a new
> NestedFieldReferenceExpression is introduced for nested fields.
>
> If we add a
> > flag for each new filter,
> > the interface will be filled with lots of flags (e.g., supportsBetween,
> > supportsIN)
>
> In an ideal situation, I completely agree with you. But in the current
> state, *supportsNestedFilters* can act as a bridge to reach the eventual
> desired state which is to have a clean and consistent set of APIs
> throughout all Supports*PushDown.
>
> Also shared some thoughts on the end state API
> <
> https://docs.google.com/document/d/1stLRPKOcxlEv8eHblkrOh0Zf5PLM-h76WMhEINHOyPY/edit?usp=sharing
> >
> with extension to the *FieldReferenceExpression* to support nested fields.
> Please take a look.
>
> Regards
> Venkata krishnan
>
> On Tue, Aug 22, 2023 at 5:02 PM Becket Qin  wrote:
>
> > Hi Jark,
> >
> > Regarding the migration path, it would be useful to scrutinize the use
> case
> > of FiledReferenceExpression and ResolvedExpressions. There are two kinds
> of
> > use cases:
> >
> > 1. A ResolvedExpression is constructed by the user or connector / plugin
> > developers.
> > 2. A ResolvedExpression is constructed by the framework and passed to
> user
> > or connector / plugin developers.
> >
> > For the first case, both of the approaches provide the same migration
> > experience.
> >
> > For the second case, generally speaking, introducing
> > NestedFieldReferenceExpression and extending FieldReferenceExpression
> would
> > have the same impact fo

[jira] [Created] (FLINK-32951) VertexFlameGraphFactoryTest.testLambdaClassNamesCleanUp failed on AZP

2023-08-24 Thread Sergey Nuyanzin (Jira)

Sergey Nuyanzin created FLINK-32951:
---

 Summary: VertexFlameGraphFactoryTest.testLambdaClassNamesCleanUp 
failed on AZP
 Key: FLINK-32951
 URL: https://issues.apache.org/jira/browse/FLINK-32951
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Web Frontend
Reporter: Sergey Nuyanzin


This build fails 
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=52564&view=logs&j=f0ac5c25-1168-55a5-07ff-0e88223afed9&t=50bf7a25-bdc4-5e56-5478-c7b4511dde53&l=8747]
{noformat}
Aug 24 01:35:46 01:35:46.537 [ERROR] 
org.apache.flink.runtime.webmonitor.threadinfo.VertexFlameGraphFactoryTest.testLambdaClassNamesCleanUp
  Time elapsed: 0.024 s  <<< FAILURE!
Aug 24 01:35:46 java.lang.AssertionError: 
Aug 24 01:35:46 
Aug 24 01:35:46 Expecting actual:
Aug 24 01:35:46   
"org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$0/0x0"
Aug 24 01:35:46 to end with:
Aug 24 01:35:46   "$Lambda$0/0"
Aug 24 01:35:46 
Aug 24 01:35:46 at 
org.apache.flink.runtime.webmonitor.threadinfo.VertexFlameGraphFactoryTest.verifyRecursively(VertexFlameGraphFactoryTest.java:70)
Aug 24 01:35:46 at 
java.base/java.util.stream.ReferencePipeline$4$1.accept(ReferencePipeline.java:212)
Aug 24 01:35:46 at 
java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
Aug 24 01:35:46 at 
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
Aug 24 01:35:46 at 
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
Aug 24 01:35:46 at 
java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
Aug 24 01:35:46 at 
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
Aug 24 01:35:46 at 
java.base/java.util.stream.IntPipeline.reduce(IntPipeline.java:491)
Aug 24 01:35:46 at 
java.base/java.util.stream.IntPipeline.sum(IntPipeline.java:449)
Aug 24 01:35:46 at 
org.apache.flink.runtime.webmonitor.threadinfo.VertexFlameGraphFactoryTest.verifyRecursively(VertexFlameGraphFactoryTest.java:72)
Aug 24 01:35:46 at 
java.base/java.util.stream.ReferencePipeline$4$1.accept(ReferencePipeline.java:212)

{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-32952) Source reuse with readable metadata and watermark push down will get wrong watermark

2023-08-24 Thread Yunhong Zheng (Jira)

Yunhong Zheng created FLINK-32952:
-

 Summary: Source reuse with readable metadata and watermark push 
down will get wrong watermark 
 Key: FLINK-32952
 URL: https://issues.apache.org/jira/browse/FLINK-32952
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.18.0
Reporter: Yunhong Zheng
 Fix For: 1.18.0


Source reuse with readable metadata and watermark push down will get wrong 
result. In class SourceReuser, we will re-build watermark spec after projection 
push down. However, we will get wrong index while try to find index in new 
source type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-32953) [State TTL]resolve data correctness problem after ttl was changed

2023-08-24 Thread Jinzhong Li (Jira)

Jinzhong Li created FLINK-32953:
---

 Summary: [State TTL]resolve data correctness problem after ttl was 
changed 
 Key: FLINK-32953
 URL: https://issues.apache.org/jira/browse/FLINK-32953
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Reporter: Jinzhong Li


Because expired data is cleaned up in background on a best effort basis 
(hashmap use INCREMENTAL_CLEANUP strategy, rocksdb use 
ROCKSDB_COMPACTION_FILTER strategy), some expired state is often persisted into 
snapshots.

 

In some scenarios, user changes the state ttl of the job and then restore job 
from the old state. If the user adjust the state ttl from a short value to a 
long value (eg, from 12 hours to 24 hours),  some expired data that was not 
cleaned up will be alive after restore. Obviously this is unreasonable, and may 
break data regulatory requirements. 

 

Particularly, rocksdb stateBackend may cause data correctness problems due to 
level compaction in this case.(eg. One key has two versions at level-1 and 
level-2，both of which are ttl expired. Then level-1 version is cleaned up by 
compaction,  and level-2 version isn't.  
 If we adjust state ttl and restart job, the incorrect data of level-2 will 
become valid after restore)


To solve this problem, I think we can
1) persist old state ttl into snapshot meta info; (eg. 
RegisteredKeyValueStateBackendMetaInfo or others)
2) During state restore, check the size between the current ttl and old ttl;
3) If current ttl is longer than old ttl, we need to iterate over all data, 
filter out expired data with old ttl, and wirte valid data into stateBackend.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Propose a release of flink-connector-opensearch

2023-08-24 Thread Ilyevsky, Leonid

Hi everybody,

I would like to propose a release of
https://github.com/apache/flink-connector-opensearch
which includes the latest enhancements and fixes.

I believe the latest commit
https://github.com/apache/flink-connector-opensearch/commit/d853e3d6be3e0f15e25c1220800b7d5fcf152c43
adds a very important feature from which many users will benefit. It provides
the ability to optionally set a custom FailureHandler in the client code to
handle the errors passed from the OpenSearch server. See
https://issues.apache.org/jira/browse/FLINK-30998
.

I suggest to nominate
https://issues.apache.org/jira/secure/ViewProfile.jspa?name=martijnvisser
as a release manager.

Kind regards,
Leonid Ilyevsky
Technology Consultant

O
+1-646-660-8087

Liquidnet

[cid:image001.png@01D9D67C.70CE7E50]

Classification: Public

Disclaimer
This message may contain confidential, proprietary or legally privileged
information. No confidentiality or privilege is waived by misdirection. The
contents of this message are for the use of the intended recipient only. If
you are not the intended recipient, you are expressly prohibited from using,
disclosing, reproducing or distributing the contents of this message in any
way, either directly or indirectly. If you have received this in error, please
contact the sender by return email, then delete this email and any copies of it
from your system immediately. Please also destroy any hard copies of the
message. Views expressed herein are those of the individual sender and are not
given or endorsed by Liquidnet Europe Limited, through whom this message is
sent, unless otherwise clearly indicated within the above message.

Internet communications cannot be guaranteed to be secure or error-free.
Although this information has been compiled with great care, Liquidnet Europe
Limited shall not accept any responsibility for any errors, omissions or other
inaccuracies, nor for the consequences thereof, and shall not be bound in any
way to the contents of this email. We believe, but do not warrant that this
email and any attachments are virus free. You should take full responsibility
for virus checking.

Liquidnet Europe Limited. Authorised and regulated by the Financial Conduct
Authority in the UK. Member of the London Stock Exchange and a remote member
of the SIX Swiss Exchange. Licensed by the Financial Sector Conduct Authority
in South Africa. Registered in England and Wales no. 4232799. Liquidnet is part
of TP ICAP Group plc.

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

2023-08-24 Thread Venkatakrishnan Sowrirajan

Jark,

How about having a separate NestedFieldReferenceExpression, and
> abstracting a common base class "ReferenceExpression" for
> NestedFieldReferenceExpression and FieldReferenceExpression? This makes
> unifying expressions in
> "SupportsProjectionPushdown#applyProjections(List
> ...)"
> possible.

This should be fine for *SupportsProjectionPushDown* and
*SupportsFilterPushDown*. One concern in the case of
*SupportsAggregatePushDown* with nested fields support (to be added in the
future), with this proposal, the API will become backwards incompatible as
the *args *for the aggregate function is *List *that
needs to change to *List*.

Regards
Venkata krishnan


On Thu, Aug 24, 2023 at 1:18 AM Jark Wu  wrote:

> Hi Becket,
>
> I think it is the second case, that a FieldReferenceExpression is
> constructed
> by the framework and passed to the connector (interfaces listed by
> Venkata[1]
> and Catalog#listPartitionsByFilter). Besides, understanding the nested
> field
> is optional for users/connectors (just treat it as an unknown expression if
> the
> connector doesn't want to support it).
>
> If we extend FieldReferenceExpression, in the case of "where col.nested >
> 10",
> for the connectors already supported filter/delete pushdown, they may
> wrongly
> pushdown "col > 10" instead of "nested > 10" because they still treat
> FieldReferenceExpression as a top-level column. This problem can be
> resolved
> by introducing an additional "supportedNestedPushdown" for each interface,
> but that method is not elegant and is hard to remove in the future, and
> this could
> be avoided if we have a separate NestedFieldReferenceExpression.
>
> If we want to extend FieldReferenceExpression, we have to add protections
> for every related API in one shot. Besides, FieldReferenceExpression is a
> fundamental class in the planner, we have to go through all the code that
> is using it to make sure it properly handling it if it is a nested field
> which
> is a big effort for the community.
>
> If we were designing this API on day 1, I fully support merging them in a
> FieldReferenceExpression. But in this case, I'm thinking about how to
> provide
> users with a smooth migration path, and allow the community to gradually
> put efforts into evolving the API, and not block the "Nested Fields Filter
> Pushdown"
> requirement.
>
> How about having a separate NestedFieldReferenceExpression, and
> abstracting a common base class "ReferenceExpression" for
> NestedFieldReferenceExpression and FieldReferenceExpression? This makes
> unifying expressions in
> "SupportsProjectionPushdown#applyProjections(List
> ...)"
> possible.
>
> Best,
> Jark
>
> On Thu, 24 Aug 2023 at 07:00, Venkatakrishnan Sowrirajan  >
> wrote:
>
> > Becket and Jark,
> >
> >  Deprecate all the other
> > > methods except tryApplyFilters() and tryApplyProjections().
> >
> > For *SupportsProjectionPushDown*, we still need a
> > *supportsNestedProjections* API on the table source as some of the table
> > sources might not be able to handle nested fields and therefore the Flink
> > planner should not push down the nested projections or else the
> > *applyProjection
> > *API has to be appropriately changed to return
> > *unconvertibleProjections *similar
> > to *SupportsFilterPushDown*.
> >
> > Or we have to introduce two different applyProjections()
> > > methods for FieldReferenceExpression / NestedFieldReferenceExpression
> > > respectively.
> >
> > Agree this is not preferred. Given that *supportNestedProjections *cannot
> > be deprecated/removed based on the current API form, extending
> > *FieldReferenceExpression* to support nested fields should be okay.
> >
> > Another alternative could be to change *applyProjections *to take
> > List and on the connector side they choose to handle
> > *FieldReferenceExpression* and *NestedFieldReferenceExpression *as
> > applicable and return the remainingProjections. In the case of nested
> field
> > projections not supported, it should return them back but only projecting
> > the top level fields. IMO, this is also *not preferred*.
> >
> > *SupportsAggregatePushDown*
> >
> > *AggregateExpression *currently takes in a list of
> > *FieldReferenceExpression* as args for the aggregate function, if in
> future
> > *SupportsAggregatePushDown* adds support for aggregate pushdown on nested
> > fields then the AggregateExpression API also has to change if a new
> > NestedFieldReferenceExpression is introduced for nested fields.
> >
> > If we add a
> > > flag for each new filter,
> > > the interface will be filled with lots of flags (e.g., supportsBetween,
> > > supportsIN)
> >
> > In an ideal situation, I completely agree with you. But in the current
> > state, *supportsNestedFilters* can act as a bridge to reach the eventual
> > desired state which is to have a clean and consistent set of APIs
> > throughout all Supports*PushDown.
> >
> > Also shared some thoughts on the end state API
> > <
> >
> https://urldefense.com/v3/__https://d

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

2023-08-24 Thread Venkatakrishnan Sowrirajan

To keep it backwards compatible, introduce another API *applyAggregates *with
*List *when nested field support is added and
deprecate the current API. This will by default throw an exception. In
flink planner, *applyAggregates *with nested fields and if it throws
exception then *applyAggregates* without nested fields.

Regards
Venkata krishnan


On Thu, Aug 24, 2023 at 10:13 AM Venkatakrishnan Sowrirajan <
vsowr...@asu.edu> wrote:

> Jark,
>
> How about having a separate NestedFieldReferenceExpression, and
>> abstracting a common base class "ReferenceExpression" for
>> NestedFieldReferenceExpression and FieldReferenceExpression? This makes
>> unifying expressions in
>> "SupportsProjectionPushdown#applyProjections(List
>> ...)"
>> possible.
>
> This should be fine for *SupportsProjectionPushDown* and
> *SupportsFilterPushDown*. One concern in the case of
> *SupportsAggregatePushDown* with nested fields support (to be added in
> the future), with this proposal, the API will become backwards incompatible
> as the *args *for the aggregate function is *List
> *that needs to change to *List*.
>
> Regards
> Venkata krishnan
>
>
> On Thu, Aug 24, 2023 at 1:18 AM Jark Wu  wrote:
>
>> Hi Becket,
>>
>> I think it is the second case, that a FieldReferenceExpression is
>> constructed
>> by the framework and passed to the connector (interfaces listed by
>> Venkata[1]
>> and Catalog#listPartitionsByFilter). Besides, understanding the nested
>> field
>> is optional for users/connectors (just treat it as an unknown expression
>> if
>> the
>> connector doesn't want to support it).
>>
>> If we extend FieldReferenceExpression, in the case of "where col.nested >
>> 10",
>> for the connectors already supported filter/delete pushdown, they may
>> wrongly
>> pushdown "col > 10" instead of "nested > 10" because they still treat
>> FieldReferenceExpression as a top-level column. This problem can be
>> resolved
>> by introducing an additional "supportedNestedPushdown" for each interface,
>> but that method is not elegant and is hard to remove in the future, and
>> this could
>> be avoided if we have a separate NestedFieldReferenceExpression.
>>
>> If we want to extend FieldReferenceExpression, we have to add protections
>> for every related API in one shot. Besides, FieldReferenceExpression is a
>> fundamental class in the planner, we have to go through all the code that
>> is using it to make sure it properly handling it if it is a nested field
>> which
>> is a big effort for the community.
>>
>> If we were designing this API on day 1, I fully support merging them in a
>> FieldReferenceExpression. But in this case, I'm thinking about how to
>> provide
>> users with a smooth migration path, and allow the community to gradually
>> put efforts into evolving the API, and not block the "Nested Fields Filter
>> Pushdown"
>> requirement.
>>
>> How about having a separate NestedFieldReferenceExpression, and
>> abstracting a common base class "ReferenceExpression" for
>> NestedFieldReferenceExpression and FieldReferenceExpression? This makes
>> unifying expressions in
>> "SupportsProjectionPushdown#applyProjections(List
>> ...)"
>> possible.
>>
>> Best,
>> Jark
>>
>> On Thu, 24 Aug 2023 at 07:00, Venkatakrishnan Sowrirajan <
>> vsowr...@asu.edu>
>> wrote:
>>
>> > Becket and Jark,
>> >
>> >  Deprecate all the other
>> > > methods except tryApplyFilters() and tryApplyProjections().
>> >
>> > For *SupportsProjectionPushDown*, we still need a
>> > *supportsNestedProjections* API on the table source as some of the table
>> > sources might not be able to handle nested fields and therefore the
>> Flink
>> > planner should not push down the nested projections or else the
>> > *applyProjection
>> > *API has to be appropriately changed to return
>> > *unconvertibleProjections *similar
>> > to *SupportsFilterPushDown*.
>> >
>> > Or we have to introduce two different applyProjections()
>> > > methods for FieldReferenceExpression / NestedFieldReferenceExpression
>> > > respectively.
>> >
>> > Agree this is not preferred. Given that *supportNestedProjections
>> *cannot
>> > be deprecated/removed based on the current API form, extending
>> > *FieldReferenceExpression* to support nested fields should be okay.
>> >
>> > Another alternative could be to change *applyProjections *to take
>> > List and on the connector side they choose to handle
>> > *FieldReferenceExpression* and *NestedFieldReferenceExpression *as
>> > applicable and return the remainingProjections. In the case of nested
>> field
>> > projections not supported, it should return them back but only
>> projecting
>> > the top level fields. IMO, this is also *not preferred*.
>> >
>> > *SupportsAggregatePushDown*
>> >
>> > *AggregateExpression *currently takes in a list of
>> > *FieldReferenceExpression* as args for the aggregate function, if in
>> future
>> > *SupportsAggregatePushDown* adds support for aggregate pushdown on
>> nested
>> > fields then the AggregateExpression API also has to chang

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

2023-08-24 Thread yh z

Hi, Venkat,

Thanks for the FLIP, it sounds good to support nested fields filter
pushdown. Based on the design of flip and the above options, I would like
to make a few suggestions:

1.  At present, introducing NestedFieldReferenceExpression looks like a
better solution, which can fully meet our requirements while reducing
modifications to base class FieldReferenceExpression. In the long run, I
tend to abstract a basic class for NestedFieldReferenceExpression and
FieldReferenceExpression as u suggested.

2. Personally, I don't recommend introducing *supportsNestedFilters() in
supportsFilterPushdown. We just need to better declare the return value of
the method *applyFilters.

3. Finally, I think we need to look at the costs and benefits of unifying
the SupportsFilterPushDown and SupportsProjectionPushDown (or others) from
the perspective of interface implementers. A stable API can reduce user
development and change costs, if the current API can fully meet the
functional requirements at the framework level, I personal suggest reducing
the impact on connector developers.

Regards,
Yunhong Zheng (Swuferhong)


Venkatakrishnan Sowrirajan  于2023年8月25日周五 01:25写道：

> To keep it backwards compatible, introduce another API *applyAggregates
> *with
> *List *when nested field support is added and
> deprecate the current API. This will by default throw an exception. In
> flink planner, *applyAggregates *with nested fields and if it throws
> exception then *applyAggregates* without nested fields.
>
> Regards
> Venkata krishnan
>
>
> On Thu, Aug 24, 2023 at 10:13 AM Venkatakrishnan Sowrirajan <
> vsowr...@asu.edu> wrote:
>
> > Jark,
> >
> > How about having a separate NestedFieldReferenceExpression, and
> >> abstracting a common base class "ReferenceExpression" for
> >> NestedFieldReferenceExpression and FieldReferenceExpression? This makes
> >> unifying expressions in
> >> "SupportsProjectionPushdown#applyProjections(List
> >> ...)"
> >> possible.
> >
> > This should be fine for *SupportsProjectionPushDown* and
> > *SupportsFilterPushDown*. One concern in the case of
> > *SupportsAggregatePushDown* with nested fields support (to be added in
> > the future), with this proposal, the API will become backwards
> incompatible
> > as the *args *for the aggregate function is
> *List
> > *that needs to change to *List*.
> >
> > Regards
> > Venkata krishnan
> >
> >
> > On Thu, Aug 24, 2023 at 1:18 AM Jark Wu  wrote:
> >
> >> Hi Becket,
> >>
> >> I think it is the second case, that a FieldReferenceExpression is
> >> constructed
> >> by the framework and passed to the connector (interfaces listed by
> >> Venkata[1]
> >> and Catalog#listPartitionsByFilter). Besides, understanding the nested
> >> field
> >> is optional for users/connectors (just treat it as an unknown expression
> >> if
> >> the
> >> connector doesn't want to support it).
> >>
> >> If we extend FieldReferenceExpression, in the case of "where col.nested
> >
> >> 10",
> >> for the connectors already supported filter/delete pushdown, they may
> >> wrongly
> >> pushdown "col > 10" instead of "nested > 10" because they still treat
> >> FieldReferenceExpression as a top-level column. This problem can be
> >> resolved
> >> by introducing an additional "supportedNestedPushdown" for each
> interface,
> >> but that method is not elegant and is hard to remove in the future, and
> >> this could
> >> be avoided if we have a separate NestedFieldReferenceExpression.
> >>
> >> If we want to extend FieldReferenceExpression, we have to add
> protections
> >> for every related API in one shot. Besides, FieldReferenceExpression is
> a
> >> fundamental class in the planner, we have to go through all the code
> that
> >> is using it to make sure it properly handling it if it is a nested field
> >> which
> >> is a big effort for the community.
> >>
> >> If we were designing this API on day 1, I fully support merging them in
> a
> >> FieldReferenceExpression. But in this case, I'm thinking about how to
> >> provide
> >> users with a smooth migration path, and allow the community to gradually
> >> put efforts into evolving the API, and not block the "Nested Fields
> Filter
> >> Pushdown"
> >> requirement.
> >>
> >> How about having a separate NestedFieldReferenceExpression, and
> >> abstracting a common base class "ReferenceExpression" for
> >> NestedFieldReferenceExpression and FieldReferenceExpression? This makes
> >> unifying expressions in
> >> "SupportsProjectionPushdown#applyProjections(List
> >> ...)"
> >> possible.
> >>
> >> Best,
> >> Jark
> >>
> >> On Thu, 24 Aug 2023 at 07:00, Venkatakrishnan Sowrirajan <
> >> vsowr...@asu.edu>
> >> wrote:
> >>
> >> > Becket and Jark,
> >> >
> >> >  Deprecate all the other
> >> > > methods except tryApplyFilters() and tryApplyProjections().
> >> >
> >> > For *SupportsProjectionPushDown*, we still need a
> >> > *supportsNestedProjections* API on the table source as some of the
> table
> >> > sources might not be able to handle nested fields and t

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

2023-08-24 Thread Becket Qin

Hi Jark,

How about having a separate NestedFieldReferenceExpression, and
> abstracting a common base class "ReferenceExpression" for
> NestedFieldReferenceExpression and FieldReferenceExpression? This makes
> unifying expressions in
> "SupportsProjectionPushdown#applyProjections(List
> ...)"
> possible.

I'd be fine with this. It at least provides a consistent API style /
formality.

 Re: Yunhong,

3. Finally, I think we need to look at the costs and benefits of unifying
> the SupportsFilterPushDown and SupportsProjectionPushDown (or others) from
> the perspective of interface implementers. A stable API can reduce user
> development and change costs, if the current API can fully meet the
> functional requirements at the framework level, I personal suggest reducing
> the impact on connector developers.
>

I agree that the cost and benefit should be measured. And the measurement
should be in the long term instead of short term. That is why we always
need to align on the ideal end state first.
Meeting functionality requirements is the bare minimum bar for an API.
Simplicity, intuitiveness, robustness and evolvability are also important.
In addition, for projects with many APIs, such as Flink, a consistent API
style is also critical for the user adoption as well as bug avoidance. It
is very helpful for the community to agree on some API design conventions /
principles.
For example, in this particular case, via our discussion, hopefully we sort
of established the following API design conventions / principles for all
the Supports*PushDown interfaces.

1. By default, expressions should be used if applicable instead of other
representations.
2. In general, the pushdown method should not assume all the pushdowns will
succeed. So the applyX() method should return a boolean or List, to
handle the cases that some of the pushdowns cannot be fulfilled by the
implementation.

Establishing such conventions and principles demands careful thinking for
the aspects I mentioned earlier in addition to the API functionalities.
This helps lower the bar of understanding, reduces the chance of having
loose ends in the API, and will benefit all the participants in the project
over time. I think this is the right way to achieve real API stability.
Otherwise, we may end up chasing our tails to find ways not to change the
existing non-ideal APIs.

Thanks,

Jiangjie (Becket) Qin

On Fri, Aug 25, 2023 at 9:33 AM yh z  wrote:

> Hi, Venkat,
>
> Thanks for the FLIP, it sounds good to support nested fields filter
> pushdown. Based on the design of flip and the above options, I would like
> to make a few suggestions:
>
> 1.  At present, introducing NestedFieldReferenceExpression looks like a
> better solution, which can fully meet our requirements while reducing
> modifications to base class FieldReferenceExpression. In the long run, I
> tend to abstract a basic class for NestedFieldReferenceExpression and
> FieldReferenceExpression as u suggested.
>
> 2. Personally, I don't recommend introducing *supportsNestedFilters() in
> supportsFilterPushdown. We just need to better declare the return value of
> the method *applyFilters.
>
> 3. Finally, I think we need to look at the costs and benefits of unifying
> the SupportsFilterPushDown and SupportsProjectionPushDown (or others) from
> the perspective of interface implementers. A stable API can reduce user
> development and change costs, if the current API can fully meet the
> functional requirements at the framework level, I personal suggest reducing
> the impact on connector developers.
>
> Regards,
> Yunhong Zheng (Swuferhong)
>
>
> Venkatakrishnan Sowrirajan  于2023年8月25日周五 01:25写道：
>
> > To keep it backwards compatible, introduce another API *applyAggregates
> > *with
> > *List *when nested field support is added and
> > deprecate the current API. This will by default throw an exception. In
> > flink planner, *applyAggregates *with nested fields and if it throws
> > exception then *applyAggregates* without nested fields.
> >
> > Regards
> > Venkata krishnan
> >
> >
> > On Thu, Aug 24, 2023 at 10:13 AM Venkatakrishnan Sowrirajan <
> > vsowr...@asu.edu> wrote:
> >
> > > Jark,
> > >
> > > How about having a separate NestedFieldReferenceExpression, and
> > >> abstracting a common base class "ReferenceExpression" for
> > >> NestedFieldReferenceExpression and FieldReferenceExpression? This
> makes
> > >> unifying expressions in
> > >> "SupportsProjectionPushdown#applyProjections(List
> > >> ...)"
> > >> possible.
> > >
> > > This should be fine for *SupportsProjectionPushDown* and
> > > *SupportsFilterPushDown*. One concern in the case of
> > > *SupportsAggregatePushDown* with nested fields support (to be added in
> > > the future), with this proposal, the API will become backwards
> > incompatible
> > > as the *args *for the aggregate function is
> > *List
> > > *that needs to change to *List*.
> > >
> > > Regards
> > > Venkata krishnan
> > >
> > >
> > > On Thu, Aug 24, 2023 at 1:18 AM Jark Wu

[jira] [Created] (FLINK-32954) Metrics expose number of heap timer

2023-08-24 Thread Rui Xia (Jira)

Rui Xia created FLINK-32954:
---

 Summary: Metrics expose number of heap timer
 Key: FLINK-32954
 URL: https://issues.apache.org/jira/browse/FLINK-32954
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Metrics, Runtime / State Backends
Reporter: Rui Xia






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-32955) Support state compatibility between enabling TTL and disabling TTL

2023-08-24 Thread Jinzhong Li (Jira)

Jinzhong Li created FLINK-32955:
---

 Summary: Support state compatibility between enabling TTL and 
disabling TTL
 Key: FLINK-32955
 URL: https://issues.apache.org/jira/browse/FLINK-32955
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
Reporter: Jinzhong Li


Currently, trying to restore state, which was previously configured without 
TTL, using TTL enabled descriptor or vice versa will lead to compatibility 
failure and StateMigrationException.

In some scenario, user may enable state ttl and restore from old state which 
was previously configured without TTL;  or vice versa.

It would be useful for users if we support state compatibility between enabling 
TTL and disabling TTL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [Discuss] FLIP-355: Add parent dir of files to classpath using yarn.provided.lib.dirs

2023-08-24 Thread Venkatakrishnan Sowrirajan

Thanks Yang Wang.

In that case, whenever you get a chance could you please review the PR?


On Wed, Aug 23, 2023, 8:15 PM Yang Wang  wrote:

> +1 for this FLIP
>
> Maybe a FLIP is an overkill for this enhancement.
>
>
> Best,
> Yang
>
> Venkatakrishnan Sowrirajan  于2023年8月23日周三 01:44写道：
>
> > Thanks for the FLIP, Archit.
> >
> > This is definitely quite a useful addition to the
> *yarn.provided.lib.dirs*
> > . +1.
> >
> > IMO, except for the fact that *yarn.provided.lib.dirs* (platform specific
> > jars can be cached) takes only directories whereas *yarn.ship-files*
> (user
> > files) takes both files and dirs, the overall logic in terms of
> > constructing the classpath in both the cases should be roughly the same.
> >
> > Referencing the PR (
> https://urldefense.com/v3/__https://github.com/apache/flink/pull/23164__;!!IKRxdwAv5BmarQ!cgTpodngoQAdd-qu3CvbQeAwENiu1nf0eahTH-v1NhUsSn4Y7M7sVGQYSnBjB2XgaOlyzGe7XEiU3-cAOoy84Kw$
> ) with the
> > initial implementation you created as well here.
> >
> > Regards
> > Venkata krishnan
> >
> >
> > On Tue, Aug 22, 2023 at 10:09 AM Archit Goyal
>  > >
> > wrote:
> >
> > > Hi all,
> > >
> > > Gentle ping if I can get a review on the FLIP.
> > >
> > > Thanks,
> > > Archit Goyal
> > >
> > > From: Archit Goyal 
> > > Date: Thursday, August 17, 2023 at 5:51 PM
> > > To: dev@flink.apache.org 
> > > Subject: [Discuss] FLIP-355: Add parent dir of files to classpath using
> > > yarn.provided.lib.dirs
> > > Hi All,
> > >
> > > I am opening this thread to discuss the proposal to add parent
> > directories
> > > of files to classpath when using yarn.provided.lib.dirs. This is
> > documented
> > > in FLIP-355 <
> > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP*355*3A*Add*parent*dir*of*files*to*classpath*using*yarn.provided.lib.dirs__;KyUrKysrKysrKys!!IKRxdwAv5BmarQ!fFlyBKWuWwYcWfOcpLflTTi36tyHPiENIUry9J0ygaZY0VURnIs0glu5yafV0w0tfSsnOb9ZxDD9Cjw2TApTekVU$
> > > >.
> > >
> > > This FLIP mentions about enhancing YARN's classpath configuration to
> > > include parent directories of files in yarn.provided.lib.dirs.
> > >
> > > Please feel free to reply to this email thread and share your opinions.
> > >
> > > Thanks,
> > > Archit Goyal
> > >
> >
>

Re: [Re-DISCUSS] FLIP-202: Introduce ClickHouse Connector

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

[jira] [Created] (FLINK-32951) VertexFlameGraphFactoryTest.testLambdaClassNamesCleanUp failed on AZP

[jira] [Created] (FLINK-32952) Source reuse with readable metadata and watermark push down will get wrong watermark

[jira] [Created] (FLINK-32953) [State TTL]resolve data correctness problem after ttl was changed

Propose a release of flink-connector-opensearch

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

[jira] [Created] (FLINK-32954) Metrics expose number of heap timer

[jira] [Created] (FLINK-32955) Support state compatibility between enabling TTL and disabling TTL

Re: [Discuss] FLIP-355: Add parent dir of files to classpath using yarn.provided.lib.dirs

13 matches

Site Navigation

Mail list logo

Footer information