date:20230829

[jira] [Created] (FLINK-32994) LeaderElectionDriver.toString() is not implemented

2023-08-29 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-32994:
-

 Summary: LeaderElectionDriver.toString() is not implemented
 Key: FLINK-32994
 URL: https://issues.apache.org/jira/browse/FLINK-32994
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.18.0, 1.19.0
Reporter: Matthias Pohl


We noticed in FLINK-32678 that the {{toString()}} method of 
{{LeaderElectionDriver}} wasn't implemented with the FLINK-26522 changes. The 
legacy implementations actually provided a proper implementation. The 
{{MultipleComponentLeaderElectionDriver}}  implementations (which we reused in 
FLINK-26522) didn't provide such a method.

I'm marking this as a critical because it's a regression. But I'm not marking 
it as a blocker because it's only affecting the log output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

退订

2023-08-29 Thread 喻凯

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

2023-08-29 Thread Venkatakrishnan Sowrirajan

Sure, will reference this discussion to resume where we started as part of
the flip to refactor SupportsProjectionPushDown.

On Tue, Aug 29, 2023, 7:22 PM Jark Wu  wrote:

> I'm fine with this. `ReferenceExpression` and `SupportsProjectionPushDown`
> can be another FLIP. However, could you summarize the design of this part
> in the future part of the FLIP? This can be easier to get started with in
> the future.
>
>
> Best,
> Jark
>
>
> On Wed, 30 Aug 2023 at 02:45, Venkatakrishnan Sowrirajan  >
> wrote:
>
> > Thanks Jark. Sounds good.
> >
> > One more thing, earlier in my summary I mentioned,
> >
> > Introduce a new *ReferenceExpression* (or *BaseReferenceExpression*)
> > > abstract class which will be extended by both
> *FieldReferenceExpression*
> > >  and *NestedFieldReferenceExpression* (to be introduced as part of this
> > > FLIP)
> >
> > This can be punted for now and can be handled as part of refactoring
> > SupportsProjectionPushDown.
> >
> > Also I made minor changes to the *NestedFieldReferenceExpression,
> *instead
> > of *fieldIndexArray* we can just do away with *fieldNames *array that
> > includes fieldName at every level for the nested field.
> >
> > Updated the FLIP-357
> > <
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!YAk6kV4CYvUSPfpoUDQRs6VlbmJXVX8KOKqFxKbNDkUWKzShvwpkLRGkAV1tgV3EqClNrjGS-Ij86Q$
> > >
> > wiki as well.
> >
> > Regards
> > Venkata krishnan
> >
> >
> > On Tue, Aug 29, 2023 at 5:21 AM Jark Wu  wrote:
> >
> > > Hi Venkata,
> > >
> > > Your summary looks good to me. +1 to start a vote.
> > >
> > > I think we don't need "inputIndex" in NestedFieldReferenceExpression.
> > > Actually, I think it is also not needed in FieldReferenceExpression,
> > > and we should try to remove it (another topic). The RexInputRef in
> > Calcite
> > > also doesn't require an inputIndex because the field index should
> > represent
> > > index of the field in the underlying row type. Field references
> shouldn't
> > > be
> > >  aware of the number of inputs.
> > >
> > > Best,
> > > Jark
> > >
> > >
> > > On Tue, 29 Aug 2023 at 02:24, Venkatakrishnan Sowrirajan <
> > vsowr...@asu.edu
> > > >
> > > wrote:
> > >
> > > > Hi Jinsong,
> > > >
> > > > Thanks for your comments.
> > > >
> > > > What is inputIndex in NestedFieldReferenceExpression?
> > > >
> > > >
> > > > I haven't looked at it before. Do you mean, given that it is now only
> > > used
> > > > to push filters it won't be subsequently used in further
> > > > planning/optimization and therefore it is not required at this time?
> > > >
> > > > So if NestedFieldReferenceExpression doesn't need inputIndex, is
> there
> > > > > a need to introduce a base class `ReferenceExpression`?
> > > >
> > > > For SupportsFilterPushDown itself, *ReferenceExpression* base class
> is
> > > not
> > > > needed. But there were discussions around cleaning up and
> standardizing
> > > the
> > > > API for Supports*PushDown. SupportsProjectionPushDown currently
> pushes
> > > the
> > > > projects as a 2-d array, instead it would be better to use the
> standard
> > > API
> > > > which seems to be the *ResolvedExpression*. For
> > > SupportsProjectionPushDown
> > > > either FieldReferenceExpression (top level fields) or
> > > > NestedFieldReferenceExpression (nested fields) is enough, in order to
> > > > provide a single API that handles both top level and nested fields,
> > > > ReferenceExpression will be introduced as a base class.
> > > >
> > > > Eventually, *SupportsProjectionPushDown#applyProjections* would
> evolve
> > as
> > > > applyProjection(List projectedFields) and nested
> > > > fields would be pushed only if *supportsNestedProjections* returns
> > true.
> > > >
> > > > Regards
> > > > Venkata krishnan
> > > >
> > > >
> > > > On Sun, Aug 27, 2023 at 11:12 PM Jingsong Li  >
> > > > wrote:
> > > >
> > > > > So if NestedFieldReferenceExpression doesn't need inputIndex, is
> > there
> > > > > a need to introduce a base class `ReferenceExpression`?
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > > > On Mon, Aug 28, 2023 at 2:09 PM Jingsong Li <
> jingsongl...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi thanks all for your discussion.
> > > > > >
> > > > > > What is inputIndex in NestedFieldReferenceExpression?
> > > > > >
> > > > > > I know inputIndex has special usage in FieldReferenceExpression,
> > but
> > > > > > it is only for Join operators, and it is only for SQL
> optimization.
> > > It
> > > > > > looks like there is no requirement for Nested.
> > > > > >
> > > > > > Best,
> > > > > > Jingsong
> > > > > >
> > > > > > On Mon, Aug 28, 2023 at 1:13 PM Venkatakrishnan Sowrirajan
> > > > > >  wrote:
> > > > > > >
> > > > > > > Thanks for all the feedback and discussion everyone. Looks like
> > we
> > > > have
> > > > > > > reached a consensus here.
> > > > > > >
> > > > > > > Just to summarize:
> > > > > > >
> > > > > > > 1

[jira] [Created] (FLINK-32993) Datagen connector produce unexpected Char type data

2023-08-29 Thread Yubin Li (Jira)

Yubin Li created FLINK-32993:


 Summary: Datagen connector produce unexpected Char type data
 Key: FLINK-32993
 URL: https://issues.apache.org/jira/browse/FLINK-32993
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Common
Affects Versions: 1.19.0
Reporter: Yubin Li
 Attachments: image-2023-08-30-11-47-05-887.png, 
image-2023-08-30-11-47-43-719.png

create table as follows:

!image-2023-08-30-11-47-43-719.png!

results:

!image-2023-08-30-11-47-05-887.png!

I have found that Char type data length is 100, which same as String type, it 
is caused by the two types use the same data generation logic, and we could 
correct it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

2023-08-29 Thread Jark Wu

I'm fine with this. `ReferenceExpression` and `SupportsProjectionPushDown`
can be another FLIP. However, could you summarize the design of this part
in the future part of the FLIP? This can be easier to get started with in
the future.


Best,
Jark


On Wed, 30 Aug 2023 at 02:45, Venkatakrishnan Sowrirajan 
wrote:

> Thanks Jark. Sounds good.
>
> One more thing, earlier in my summary I mentioned,
>
> Introduce a new *ReferenceExpression* (or *BaseReferenceExpression*)
> > abstract class which will be extended by both *FieldReferenceExpression*
> >  and *NestedFieldReferenceExpression* (to be introduced as part of this
> > FLIP)
>
> This can be punted for now and can be handled as part of refactoring
> SupportsProjectionPushDown.
>
> Also I made minor changes to the *NestedFieldReferenceExpression, *instead
> of *fieldIndexArray* we can just do away with *fieldNames *array that
> includes fieldName at every level for the nested field.
>
> Updated the FLIP-357
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-356%3A+Support+Nested+Fields+Filter+Pushdown
> >
> wiki as well.
>
> Regards
> Venkata krishnan
>
>
> On Tue, Aug 29, 2023 at 5:21 AM Jark Wu  wrote:
>
> > Hi Venkata,
> >
> > Your summary looks good to me. +1 to start a vote.
> >
> > I think we don't need "inputIndex" in NestedFieldReferenceExpression.
> > Actually, I think it is also not needed in FieldReferenceExpression,
> > and we should try to remove it (another topic). The RexInputRef in
> Calcite
> > also doesn't require an inputIndex because the field index should
> represent
> > index of the field in the underlying row type. Field references shouldn't
> > be
> >  aware of the number of inputs.
> >
> > Best,
> > Jark
> >
> >
> > On Tue, 29 Aug 2023 at 02:24, Venkatakrishnan Sowrirajan <
> vsowr...@asu.edu
> > >
> > wrote:
> >
> > > Hi Jinsong,
> > >
> > > Thanks for your comments.
> > >
> > > What is inputIndex in NestedFieldReferenceExpression?
> > >
> > >
> > > I haven't looked at it before. Do you mean, given that it is now only
> > used
> > > to push filters it won't be subsequently used in further
> > > planning/optimization and therefore it is not required at this time?
> > >
> > > So if NestedFieldReferenceExpression doesn't need inputIndex, is there
> > > > a need to introduce a base class `ReferenceExpression`?
> > >
> > > For SupportsFilterPushDown itself, *ReferenceExpression* base class is
> > not
> > > needed. But there were discussions around cleaning up and standardizing
> > the
> > > API for Supports*PushDown. SupportsProjectionPushDown currently pushes
> > the
> > > projects as a 2-d array, instead it would be better to use the standard
> > API
> > > which seems to be the *ResolvedExpression*. For
> > SupportsProjectionPushDown
> > > either FieldReferenceExpression (top level fields) or
> > > NestedFieldReferenceExpression (nested fields) is enough, in order to
> > > provide a single API that handles both top level and nested fields,
> > > ReferenceExpression will be introduced as a base class.
> > >
> > > Eventually, *SupportsProjectionPushDown#applyProjections* would evolve
> as
> > > applyProjection(List projectedFields) and nested
> > > fields would be pushed only if *supportsNestedProjections* returns
> true.
> > >
> > > Regards
> > > Venkata krishnan
> > >
> > >
> > > On Sun, Aug 27, 2023 at 11:12 PM Jingsong Li 
> > > wrote:
> > >
> > > > So if NestedFieldReferenceExpression doesn't need inputIndex, is
> there
> > > > a need to introduce a base class `ReferenceExpression`?
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Mon, Aug 28, 2023 at 2:09 PM Jingsong Li 
> > > > wrote:
> > > > >
> > > > > Hi thanks all for your discussion.
> > > > >
> > > > > What is inputIndex in NestedFieldReferenceExpression?
> > > > >
> > > > > I know inputIndex has special usage in FieldReferenceExpression,
> but
> > > > > it is only for Join operators, and it is only for SQL optimization.
> > It
> > > > > looks like there is no requirement for Nested.
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > > > On Mon, Aug 28, 2023 at 1:13 PM Venkatakrishnan Sowrirajan
> > > > >  wrote:
> > > > > >
> > > > > > Thanks for all the feedback and discussion everyone. Looks like
> we
> > > have
> > > > > > reached a consensus here.
> > > > > >
> > > > > > Just to summarize:
> > > > > >
> > > > > > 1. Introduce a new *ReferenceExpression* (or
> > > *BaseReferenceExpression*)
> > > > > > abstract class which will be extended by both
> > > > *FieldReferenceExpression*
> > > > > > and *NestedFieldReferenceExpression* (to be introduced as part of
> > > this
> > > > FLIP)
> > > > > > 2. No need of *supportsNestedFilters *check as the current
> > > > > > *SupportsFilterPushDown* should already ignore unknown
> expressions
> > (
> > > > > > *NestedFieldReferenceExpression* for example) and return them as
> > > > > > *remainingFilters.
> > > > > > *Maybe this should be clarified explicitly in the Javadoc of
> > > > > > *SupportsFilt

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-29 Thread Xuannan Su

Hi Jing,

Thank you for the suggestion.

The definition of watermark lag is the same as the watermarkLag metric in 
FLIP-33[1]. More specifically, the watermark lag calculation is computed at the 
time when a watermark is emitted downstream in the following way: watermarkLag 
= CurrentTime - Watermark. I have added this description to the FLIP.

I hope this addresses your concern.

Best, 
Xuannan

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics


> On Aug 28, 2023, at 01:04, Jing Ge  wrote:
> 
> Hi Xuannan,
> 
> Thanks for the proposal. +1 for me.
> 
> There is one tiny thing that I am not sure if I understand it correctly.
> Since there will be many different WatermarkStrategies and different
> WatermarkGenerators. Could you please update the FLIP and add the
> description of how the watermark lag is calculated exactly? E.g. Watermark
> lag = A - B with A is the timestamp of the watermark emitted to the
> downstream and B is(this is the part I am not really sure after reading
> the FLIP).
> 
> Best regards,
> Jing
> 
> 
> On Mon, Aug 21, 2023 at 9:03 AM Xuannan Su  wrote:
> 
>> Hi Jark,
>> 
>> Thanks for the comments.
>> 
>> I agree that the current solution cannot support jobs that cannot define
>> watermarks. However, after considering the pending-record-based solution, I
>> believe the current solution is superior for the target use case as it is
>> more intuitive for users. The backlog status gives users the ability to
>> balance between throughput and latency. Making this trade-off decision
>> based on the watermark lag is more intuitive from the user's perspective.
>> For instance, a user can decide that if the job lags behind the current
>> time by more than 1 hour, the result is not usable. In that case, we can
>> optimize for throughput when the data lags behind by more than an hour.
>> With the pending-record-based solution, it's challenging for users to
>> determine when to optimize for throughput and when to prioritize latency.
>> 
>> Regarding the limitations of the watermark-based solution:
>> 
>> 1. The current solution can support jobs with sources that have event
>> time. Users can always define a watermark at the source operator, even if
>> it's not used by downstream operators, such as streaming join and unbounded
>> aggregate.
>> 
>> 2.I don't believe it's accurate to say that the watermark lag will keep
>> increasing if no data is generated in Kafka. The watermark lag and backlog
>> status are determined at the moment when the watermark is emitted to the
>> downstream operator. If no data is emitted from the source, the watermark
>> lag and backlog status will not be updated. If the WatermarkStrategy with
>> idleness is used, the source becomes non-backlog when it becomes idle.
>> 
>> 3. I think watermark lag is more intuitive to determine if a job is
>> processing backlog data. Even when using pending records, it faces a
>> similar issue. For example, if the source has 1K pending records, those
>> records can span from 1 day  to 1 hour to 1 second. If the records span 1
>> day, it's probably best to optimize for throughput. If they span 1 hour, it
>> depends on the business logic. If they span 1 second, optimizing for
>> latency is likely the better choice.
>> 
>> In summary, I believe the watermark-based solution is a superior choice
>> for the target use case where watermark/event time can be defined.
>> Additionally, I haven't come across a scenario that requires low-latency
>> processing and reads from a source that cannot define watermarks. If we
>> encounter such a use case, we can create another FLIP to address those
>> needs in the future. What do you think?
>> 
>> 
>> Best,
>> Xuannan
>> 
>> 
>> 
>>> On Aug 20, 2023, at 23:27, Jark Wu > imj...@gmail.com>> wrote:
>>> 
>>> Hi Xuannan,
>>> 
>>> Thanks for opening this discussion.
>>> 
>>> This current proposal may work in the mentioned watermark cases.
>>> However, it seems this is not a general solution for sources to determine
>>> "isProcessingBacklog".
>>> From my point of view, there are 3 limitations of the current proposal:
>>> 1. It doesn't cover jobs that don't have watermark/event-time defined,
>>> for example streaming join and unbounded aggregate. We may still need to
>>> figure out solutions for them.
>>> 2. Watermark lag can not be trusted, because it increases unlimited if no
>>> data is generated in the Kafka.
>>> But in this case, there is no backlog at all.
>>> 3. Watermark lag is hard to reflect the amount of backlog. If the
>> watermark
>>> lag is 1day or 1 hour or 1second,
>>> there is possibly only 1 pending record there, which means no backlog at
>>> all.
>>> 
>>> Therefore, IMO, watermark maybe not the ideal metric used to determine
>>> "isProcessingBacklog".
>>> What we need is something that reflects the number of records unprocessed
>>> by the job.
>>> Actually, that is the "pendingRecords" metric proposed in FLIP-33 and has
>>> been implemented by Ka

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

2023-08-29 Thread Venkatakrishnan Sowrirajan

Thanks Jark. Sounds good.

One more thing, earlier in my summary I mentioned,

Introduce a new *ReferenceExpression* (or *BaseReferenceExpression*)
> abstract class which will be extended by both *FieldReferenceExpression*
>  and *NestedFieldReferenceExpression* (to be introduced as part of this
> FLIP)

This can be punted for now and can be handled as part of refactoring
SupportsProjectionPushDown.

Also I made minor changes to the *NestedFieldReferenceExpression, *instead
of *fieldIndexArray* we can just do away with *fieldNames *array that
includes fieldName at every level for the nested field.

Updated the FLIP-357

wiki as well.

Regards
Venkata krishnan


On Tue, Aug 29, 2023 at 5:21 AM Jark Wu  wrote:

> Hi Venkata,
>
> Your summary looks good to me. +1 to start a vote.
>
> I think we don't need "inputIndex" in NestedFieldReferenceExpression.
> Actually, I think it is also not needed in FieldReferenceExpression,
> and we should try to remove it (another topic). The RexInputRef in Calcite
> also doesn't require an inputIndex because the field index should represent
> index of the field in the underlying row type. Field references shouldn't
> be
>  aware of the number of inputs.
>
> Best,
> Jark
>
>
> On Tue, 29 Aug 2023 at 02:24, Venkatakrishnan Sowrirajan  >
> wrote:
>
> > Hi Jinsong,
> >
> > Thanks for your comments.
> >
> > What is inputIndex in NestedFieldReferenceExpression?
> >
> >
> > I haven't looked at it before. Do you mean, given that it is now only
> used
> > to push filters it won't be subsequently used in further
> > planning/optimization and therefore it is not required at this time?
> >
> > So if NestedFieldReferenceExpression doesn't need inputIndex, is there
> > > a need to introduce a base class `ReferenceExpression`?
> >
> > For SupportsFilterPushDown itself, *ReferenceExpression* base class is
> not
> > needed. But there were discussions around cleaning up and standardizing
> the
> > API for Supports*PushDown. SupportsProjectionPushDown currently pushes
> the
> > projects as a 2-d array, instead it would be better to use the standard
> API
> > which seems to be the *ResolvedExpression*. For
> SupportsProjectionPushDown
> > either FieldReferenceExpression (top level fields) or
> > NestedFieldReferenceExpression (nested fields) is enough, in order to
> > provide a single API that handles both top level and nested fields,
> > ReferenceExpression will be introduced as a base class.
> >
> > Eventually, *SupportsProjectionPushDown#applyProjections* would evolve as
> > applyProjection(List projectedFields) and nested
> > fields would be pushed only if *supportsNestedProjections* returns true.
> >
> > Regards
> > Venkata krishnan
> >
> >
> > On Sun, Aug 27, 2023 at 11:12 PM Jingsong Li 
> > wrote:
> >
> > > So if NestedFieldReferenceExpression doesn't need inputIndex, is there
> > > a need to introduce a base class `ReferenceExpression`?
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Mon, Aug 28, 2023 at 2:09 PM Jingsong Li 
> > > wrote:
> > > >
> > > > Hi thanks all for your discussion.
> > > >
> > > > What is inputIndex in NestedFieldReferenceExpression?
> > > >
> > > > I know inputIndex has special usage in FieldReferenceExpression, but
> > > > it is only for Join operators, and it is only for SQL optimization.
> It
> > > > looks like there is no requirement for Nested.
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Mon, Aug 28, 2023 at 1:13 PM Venkatakrishnan Sowrirajan
> > > >  wrote:
> > > > >
> > > > > Thanks for all the feedback and discussion everyone. Looks like we
> > have
> > > > > reached a consensus here.
> > > > >
> > > > > Just to summarize:
> > > > >
> > > > > 1. Introduce a new *ReferenceExpression* (or
> > *BaseReferenceExpression*)
> > > > > abstract class which will be extended by both
> > > *FieldReferenceExpression*
> > > > > and *NestedFieldReferenceExpression* (to be introduced as part of
> > this
> > > FLIP)
> > > > > 2. No need of *supportsNestedFilters *check as the current
> > > > > *SupportsFilterPushDown* should already ignore unknown expressions
> (
> > > > > *NestedFieldReferenceExpression* for example) and return them as
> > > > > *remainingFilters.
> > > > > *Maybe this should be clarified explicitly in the Javadoc of
> > > > > *SupportsFilterPushDown.
> > > > > *I will file a separate JIRA to fix the documentation.
> > > > > 3. Refactor *SupportsProjectionPushDown* to use
> *ReferenceExpression
> > > *instead
> > > > > of existing 2-d arrays to consolidate and be consistent with other
> > > > > Supports*PushDown APIs - *outside the scope of this FLIP*
> > > > > 4. Similarly *SupportsAggregatePushDown* should also be evolved
> > > whenever
> > > > > nested fields support is added to use the *ReferenceExpression -
> > > **outside
> > > > > the scope of this FLIP*
> > > > >
> > > > > Does this sound good? Please let me know if I have missed any

[VOTE] FLIP-356: Support Nested Fields Filter Pushdown

2023-08-29 Thread Venkatakrishnan Sowrirajan

Hi everyone,

Thank you all for your feedback on FLIP-356. I'd like to start a vote.

Discussion thread:
https://lists.apache.org/thread/686bhgwrrb4xmbfzlk60szwxos4z64t7
FLIP:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-356%3A+Support+Nested+Fields+Filter+Pushdown

Regards
Venkata krishnan

[jira] [Created] (FLINK-32991) Some metrics from autoscaler never get registered

2023-08-29 Thread Maximilian Michels (Jira)

Maximilian Michels created FLINK-32991:
--

 Summary: Some metrics from autoscaler never get registered
 Key: FLINK-32991
 URL: https://issues.apache.org/jira/browse/FLINK-32991
 Project: Flink
  Issue Type: Bug
  Components: Autoscaler, Deployment / Kubernetes
Affects Versions: kubernetes-operator-1.6.0
Reporter: Maximilian Michels
Assignee: Maximilian Michels
 Fix For: kubernetes-operator-1.7.0


Not all metrics appear in the latest 1.6 release. This is because we report 
metrics as soon as they are available and the registration code assumes that 
they will all be available at once. In practice, some are only available after 
sampling data multiple times. For example, TARGET_DATA_RATE is only available 
after the source metrics have been aggregated and the lag has been computed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-32992) Recommended parallelism metric is a duplicate of Parallelism metric

2023-08-29 Thread Maximilian Michels (Jira)

Maximilian Michels created FLINK-32992:
--

 Summary: Recommended parallelism metric is a duplicate of 
Parallelism metric
 Key: FLINK-32992
 URL: https://issues.apache.org/jira/browse/FLINK-32992
 Project: Flink
  Issue Type: Bug
  Components: Autoscaler, Deployment / Kubernetes
Affects Versions: kubernetes-operator-1.6.0
Reporter: Maximilian Michels
Assignee: Maximilian Michels
 Fix For: kubernetes-operator-1.7.0


The two metrics are the same. Recommended parallelism seems to have been added 
as a way to report real-time parallelism updates before we changed all metrics 
to be reported in real time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

2023-08-29 Thread Jark Wu

Hi Venkata,

Your summary looks good to me. +1 to start a vote.

I think we don't need "inputIndex" in NestedFieldReferenceExpression.
Actually, I think it is also not needed in FieldReferenceExpression,
and we should try to remove it (another topic). The RexInputRef in Calcite
also doesn't require an inputIndex because the field index should represent
index of the field in the underlying row type. Field references shouldn't be
 aware of the number of inputs.

Best,
Jark


On Tue, 29 Aug 2023 at 02:24, Venkatakrishnan Sowrirajan 
wrote:

> Hi Jinsong,
>
> Thanks for your comments.
>
> What is inputIndex in NestedFieldReferenceExpression?
>
>
> I haven't looked at it before. Do you mean, given that it is now only used
> to push filters it won't be subsequently used in further
> planning/optimization and therefore it is not required at this time?
>
> So if NestedFieldReferenceExpression doesn't need inputIndex, is there
> > a need to introduce a base class `ReferenceExpression`?
>
> For SupportsFilterPushDown itself, *ReferenceExpression* base class is not
> needed. But there were discussions around cleaning up and standardizing the
> API for Supports*PushDown. SupportsProjectionPushDown currently pushes the
> projects as a 2-d array, instead it would be better to use the standard API
> which seems to be the *ResolvedExpression*. For SupportsProjectionPushDown
> either FieldReferenceExpression (top level fields) or
> NestedFieldReferenceExpression (nested fields) is enough, in order to
> provide a single API that handles both top level and nested fields,
> ReferenceExpression will be introduced as a base class.
>
> Eventually, *SupportsProjectionPushDown#applyProjections* would evolve as
> applyProjection(List projectedFields) and nested
> fields would be pushed only if *supportsNestedProjections* returns true.
>
> Regards
> Venkata krishnan
>
>
> On Sun, Aug 27, 2023 at 11:12 PM Jingsong Li 
> wrote:
>
> > So if NestedFieldReferenceExpression doesn't need inputIndex, is there
> > a need to introduce a base class `ReferenceExpression`?
> >
> > Best,
> > Jingsong
> >
> > On Mon, Aug 28, 2023 at 2:09 PM Jingsong Li 
> > wrote:
> > >
> > > Hi thanks all for your discussion.
> > >
> > > What is inputIndex in NestedFieldReferenceExpression?
> > >
> > > I know inputIndex has special usage in FieldReferenceExpression, but
> > > it is only for Join operators, and it is only for SQL optimization. It
> > > looks like there is no requirement for Nested.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Mon, Aug 28, 2023 at 1:13 PM Venkatakrishnan Sowrirajan
> > >  wrote:
> > > >
> > > > Thanks for all the feedback and discussion everyone. Looks like we
> have
> > > > reached a consensus here.
> > > >
> > > > Just to summarize:
> > > >
> > > > 1. Introduce a new *ReferenceExpression* (or
> *BaseReferenceExpression*)
> > > > abstract class which will be extended by both
> > *FieldReferenceExpression*
> > > > and *NestedFieldReferenceExpression* (to be introduced as part of
> this
> > FLIP)
> > > > 2. No need of *supportsNestedFilters *check as the current
> > > > *SupportsFilterPushDown* should already ignore unknown expressions (
> > > > *NestedFieldReferenceExpression* for example) and return them as
> > > > *remainingFilters.
> > > > *Maybe this should be clarified explicitly in the Javadoc of
> > > > *SupportsFilterPushDown.
> > > > *I will file a separate JIRA to fix the documentation.
> > > > 3. Refactor *SupportsProjectionPushDown* to use *ReferenceExpression
> > *instead
> > > > of existing 2-d arrays to consolidate and be consistent with other
> > > > Supports*PushDown APIs - *outside the scope of this FLIP*
> > > > 4. Similarly *SupportsAggregatePushDown* should also be evolved
> > whenever
> > > > nested fields support is added to use the *ReferenceExpression -
> > **outside
> > > > the scope of this FLIP*
> > > >
> > > > Does this sound good? Please let me know if I have missed anything
> > here. If
> > > > there are no concerns, I will start a vote tomorrow. I will also get
> > the
> > > > FLIP-356 wiki updated. Thanks everyone once again!
> > > >
> > > > Regards
> > > > Venkata krishnan
> > > >
> > > >
> > > > On Thu, Aug 24, 2023 at 8:19 PM Becket Qin 
> > wrote:
> > > >
> > > > > Hi Jark,
> > > > >
> > > > > How about having a separate NestedFieldReferenceExpression, and
> > > > > > abstracting a common base class "ReferenceExpression" for
> > > > > > NestedFieldReferenceExpression and FieldReferenceExpression? This
> > makes
> > > > > > unifying expressions in
> > > > > >
> > "SupportsProjectionPushdown#applyProjections(List
> > > > > > ...)"
> > > > > > possible.
> > > > >
> > > > >
> > > > > I'd be fine with this. It at least provides a consistent API style
> /
> > > > > formality.
> > > > >
> > > > >  Re: Yunhong,
> > > > >
> > > > > 3. Finally, I think we need to look at the costs and benefits of
> > unifying
> > > > > > the SupportsFilterPushDown and SupportsProjectionPushDown (or
> > others)
> > > >

[jira] [Created] (FLINK-32990) CREATE TABLE AS statement can't as Plan.translate function parameter

2023-08-29 Thread Licho Sun (Jira)

Licho Sun created FLINK-32990:
-

 Summary: CREATE TABLE AS statement can't as Plan.translate 
function parameter
 Key: FLINK-32990
 URL: https://issues.apache.org/jira/browse/FLINK-32990
 Project: Flink
  Issue Type: Bug
Reporter: Licho Sun


The `translate` function comment description `ModifyOperation` could be a 
parameter, but in the implementation function, there isn't a process for the 
`CreateTableASOperation` type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API

2023-08-29 Thread Timo Walther

Thanks everyone for the positive feedback.

I updated the FLIP with the proposed minimal solution:

https://cwiki.apache.org/confluence/display/FLINK/FLIP-348%3A+Make+expanding+behavior+of+virtual+metadata+columns+configurable

I also changed the title to not cause any confusion with the concept of
system columns.

Now the FLIP only introduces a new ConfigOption and does not specify
constraints on the naming of metadata columns anymore.

If there are no objections, I would start a voting thread by tomorrow.

Thanks,
Timo

On 29.08.23 00:21, Alexey Leonov-Vendrovskiy wrote:

SGTM

For future reference, responding here:

5) Any operator can introduce system (psedo) columns.

This is clearly out of scope for this FLIP. The implementation effort
would be huge and could introduce a lot of bugs.

I didn't imply any specific implementation or feature coverage in the
currently proposed FLIP, but rather as a way to describe the semantics of
system columns that those could be (but don't have to) introduced by any
operator.

Thanks,
Alexey

On Tue, Aug 22, 2023 at 2:18 PM Jing Ge wrote:

Hi Timo,

Your last suggestion sounds good.

Best regards,
Jing

On Mon, Aug 21, 2023 at 4:21 AM Benchao Li wrote:

It sounds good to me too, that we avoid introducing the concept of

"system

columns" for now.

Timo Walther 于2023年8月18日周五 22:38写道：

Great, I also like my last suggestion as it is even more elegant. I

will

update the FLIP until Monday.

Regards,
Timo

On 17.08.23 13:55, Jark Wu wrote:

Hi Timo,

I'm fine with your latest suggestion that introducing a flag to

control

expanding behavior of metadata virtual columns, but not introducing
any concept of system/pseudo columns for now.

Best,
Jark

On Tue, 15 Aug 2023 at 23:25, Timo Walther

wrote:

Hi everyone,

I would like to bump this thread up again.

Esp. I would like to hear opinions on my latest suggestions to

simply

use METADATA VIRTUAL as system columns and only introduce a config
option for the SELECT * behavior. Implementation-wise this means

minimal

effort and less new concepts.

Looking forward to any kind of feedback.

Thanks,
Timo

On 07.08.23 12:07, Timo Walther wrote:

Hi everyone,

thanks for the various feedback and lively discussion. Sorry, for

the

late reply as I was on vacation. Let me answer to some of the

topics:

1) Systems columns should not be shown with DESCRIBE statements

This sounds fine to me. I will update the FLIP in the next

iteration.

2) Do you know why most SQL systems do not need any prefix with

their

pseudo column?

Because most systems do not have external catalogs or connectors.

And

also the number of system columns is limited to a handful of

columns.

Flink is more generic and thus more complex. And we have already

the

concepts of metadata columns. We need to be careful with not

overloading

our language.

3) Implementation details

> how to you plan to implement the "system columns", do we need

add

it to `RelNode` level? Or we just need to do it in the
parsing/validating phase?
> I'm not sure that Calcite's "system column" feature is fully

ready

My plan would be to only modify the parsing/validating phase. I

would

like to avoid additional complexity in planner rules and
connector/catalog interfaces. Metadata columns already support
projection push down and are passed through the stack (via Schema,
ResolvedSchema, SupportsReadableMetadata). Calcite's "system

column"

feature is not fully ready yet and it would be a large effort
potentially introducing bugs in supporting it. Thus, I'm proposing

leverage what we already have. The only part that needs to be

modified

is the "expand star" method in SqlValidator and Table API.

Queries such as `SELECT * FROM (SELECT $rowtime, * FROM t);` would

show

$rowtime as the expand star has only a special case when in the

scope

for `FROM t`. All further subqueries treat it as a regular column.

4) Built-in defined pseudo-column "$rowtime"

> Did you consider making it as a built-in defined pseudo-column
"$rowtime" which returns the time attribute value (if exists) or

null

(if non-exists) for every table/query, and pseudo-column

"$proctime"

always returns PROCTIME() value for each table/query

Built-in pseudo-columns mean that connector or catalog providers

need

consensus in Flink which pseudo-columns should be built-in. We

should

keep the concept generic and let platform providers decide which

pseudo

columns to expose. $rowtime might be obvious but others such as
$partition or $offset are tricky to get consensus as every external
connector works differently. Also a connector might want to expose
different time semantics (such as ingestion time).

5) Any operator can introduce system (psedo) columns.

This is clearly out of scope for this FLIP. The implementation

effort

would be huge and could introduce a lot of bugs.

6) "Metadata Key Prefix Constraint" which is still a little complex

Another option could be to dro

[jira] [Created] (FLINK-32989) PyFlink wheel package build failed

2023-08-29 Thread Dian Fu (Jira)

Dian Fu created FLINK-32989:
---

 Summary: PyFlink wheel package build failed
 Key: FLINK-32989
 URL: https://issues.apache.org/jira/browse/FLINK-32989
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.19.0
Reporter: Dian Fu


{code}
Compiling pyflink/fn_execution/coder_impl_fast.pyx because it changed. 
Compiling pyflink/fn_execution/table/aggregate_fast.pyx because it changed. 
Compiling pyflink/fn_execution/table/window_aggregate_fast.pyx because it 
changed. 
Compiling pyflink/fn_execution/stream_fast.pyx because it changed. 
Compiling pyflink/fn_execution/beam/beam_stream_fast.pyx because it changed. 
Compiling pyflink/fn_execution/beam/beam_coder_impl_fast.pyx because it 
changed. 
Compiling pyflink/fn_execution/beam/beam_operations_fast.pyx because it 
changed. 
[1/7] Cythonizing pyflink/fn_execution/beam/beam_coder_impl_fast.pyx 
[2/7] Cythonizing pyflink/fn_execution/beam/beam_operations_fast.pyx 
[3/7] Cythonizing pyflink/fn_execution/beam/beam_stream_fast.pyx 
[4/7] Cythonizing pyflink/fn_execution/coder_impl_fast.pyx 
[5/7] Cythonizing pyflink/fn_execution/stream_fast.pyx 
[6/7] Cythonizing pyflink/fn_execution/table/aggregate_fast.pyx 
[7/7] Cythonizing pyflink/fn_execution/table/window_aggregate_fast.pyx 
/home/vsts/work/1/s/flink-python/dev/.conda/envs/3.7/lib/python3.7/site-packages/Cython/Compiler/Main.py:369:
 FutureWarning: Cython directive 'language_level' not set, using 2 for now 
(Py2). This will change in a later release! File: 
/home/vsts/work/1/s/flink-python/pyflink/fn_execution/table/window_aggregate_fast.pxd
 
 tree = Parsing.p_module(s, pxd, full_module_name) 
Exactly one Flink home directory must exist, but found: []
{code}

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=52740&view=logs&j=d15e2b2e-10cd-5f59-7734-42d57dc5564d&t=4a86776f-e6e1-598a-f75a-c43d8b819662



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-32988) HiveITCase failed due to TestContainer not coming up

2023-08-29 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-32988:
-

 Summary: HiveITCase failed due to TestContainer not coming up
 Key: FLINK-32988
 URL: https://issues.apache.org/jira/browse/FLINK-32988
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.17.1, 1.16.2, 1.18.0, 1.19.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=52740&view=logs&j=87489130-75dc-54e4-1f45-80c30aa367a3&t=efbee0b1-38ac-597d-6466-1ea8fc908c50&l=15866

{code}
Aug 29 02:47:56 org.testcontainers.containers.ContainerLaunchException: 
Container startup failed for image prestodb/hive3.1-hive:10
Aug 29 02:47:56 at 
org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:349)
Aug 29 02:47:56 at 
org.apache.flink.tests.hive.containers.HiveContainer.doStart(HiveContainer.java:81)
Aug 29 02:47:56 at 
org.testcontainers.containers.GenericContainer.start(GenericContainer.java:322)
Aug 29 02:47:56 at 
org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:1131)
Aug 29 02:47:56 at 
org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:28)
Aug 29 02:47:56 at 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
Aug 29 02:47:56 at org.junit.rules.RunRules.evaluate(RunRules.java:20)
Aug 29 02:47:56 at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
Aug 29 02:47:56 at 
org.junit.runners.ParentRunner.run(ParentRunner.java:413)
Aug 29 02:47:56 at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
Aug 29 02:47:56 at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
Aug 29 02:47:56 at 
org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
Aug 29 02:47:56 at 
org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
Aug 29 02:47:56 at 
org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72)
Aug 29 02:47:56 at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:147)
Aug 29 02:47:56 at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:127)
Aug 29 02:47:56 at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:90)
Aug 29 02:47:56 at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:55)
Aug 29 02:47:56 at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:102)
Aug 29 02:47:56 at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:54)
Aug 29 02:47:56 at 
org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:114)
Aug 29 02:47:56 at 
org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:86)
Aug 29 02:47:56 at 
org.junit.platform.launcher.core.DefaultLauncherSession$DelegatingLauncher.execute(DefaultLauncherSession.java:86)
Aug 29 02:47:56 at 
org.junit.platform.launcher.core.SessionPerRequestLauncher.execute(SessionPerRequestLauncher.java:53)
Aug 29 02:47:56 at 
org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.execute(JUnitPlatformProvider.java:188)
Aug 29 02:47:56 at 
org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invokeAllTests(JUnitPlatformProvider.java:154)
Aug 29 02:47:56 at 
org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invoke(JUnitPlatformProvider.java:128)

{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-32987) BlobClientSslTest>BlobClientTest.testSocketTimeout expected SocketTimeoutException but identified SSLException

2023-08-29 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-32987:
-

 Summary: BlobClientSslTest>BlobClientTest.testSocketTimeout 
expected SocketTimeoutException but identified SSLException
 Key: FLINK-32987
 URL: https://issues.apache.org/jira/browse/FLINK-32987
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.18.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=52740&view=logs&j=77a9d8e1-d610-59b3-fc2a-4766541e0e33&t=125e07e7-8de0-5c6c-a541-a567415af3ef&l=8692

{code}
Aug 29 03:28:11 03:28:11.280 [ERROR]   
BlobClientSslTest>BlobClientTest.testSocketTimeout:512 
Aug 29 03:28:11 Expecting a throwable with cause being an instance of:
Aug 29 03:28:11   java.net.SocketTimeoutException
Aug 29 03:28:11 but was an instance of:
Aug 29 03:28:11   javax.net.ssl.SSLException
Aug 29 03:28:11 Throwable that failed the check:
Aug 29 03:28:11 
Aug 29 03:28:11 java.io.IOException: GET operation failed: Read timed out
Aug 29 03:28:11 at 
org.apache.flink.runtime.blob.BlobClient.getInternal(BlobClient.java:231)
Aug 29 03:28:11 at 
org.apache.flink.runtime.blob.BlobClientTest.lambda$testSocketTimeout$2(BlobClientTest.java:510)
Aug 29 03:28:11 at 
org.assertj.core.api.ThrowableAssert.catchThrowable(ThrowableAssert.java:63)
Aug 29 03:28:11 at 
org.assertj.core.api.AssertionsForClassTypes.catchThrowable(AssertionsForClassTypes.java:892)
Aug 29 03:28:11 at 
org.assertj.core.api.Assertions.catchThrowable(Assertions.java:1366)
Aug 29 03:28:11 at 
org.assertj.core.api.Assertions.assertThatThrownBy(Assertions.java:1210)
Aug 29 03:28:11 at 
org.apache.flink.runtime.blob.BlobClientTest.testSocketTimeout(BlobClientTest.java:508)
Aug 29 03:28:11 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
Aug 29 03:28:11 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Aug 29 03:28:11 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Aug 29 03:28:11 at java.lang.reflect.Method.invoke(Method.java:498)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-32994) LeaderElectionDriver.toString() is not implemented

退订

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

[jira] [Created] (FLINK-32993) Datagen connector produce unexpected Char type data

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

[VOTE] FLIP-356: Support Nested Fields Filter Pushdown

[jira] [Created] (FLINK-32991) Some metrics from autoscaler never get registered

[jira] [Created] (FLINK-32992) Recommended parallelism metric is a duplicate of Parallelism metric

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

[jira] [Created] (FLINK-32990) CREATE TABLE AS statement can't as Plan.translate function parameter

Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API

[jira] [Created] (FLINK-32989) PyFlink wheel package build failed

[jira] [Created] (FLINK-32988) HiveITCase failed due to TestContainer not coming up

[jira] [Created] (FLINK-32987) BlobClientSslTest>BlobClientTest.testSocketTimeout expected SocketTimeoutException but identified SSLException

16 matches

Site Navigation

Mail list logo

Footer information