[jira] [Created] (FLINK-29791) Print sink result mess up with GC log in E2eTestBase

2022-10-28 Thread Jane Chan (Jira)
Jane Chan created FLINK-29791:
-

 Summary: Print sink result mess up with GC log in E2eTestBase 
 Key: FLINK-29791
 URL: https://issues.apache.org/jira/browse/FLINK-29791
 Project: Flink
  Issue Type: Bug
  Components: Table Store
Affects Versions: table-store-0.3.0
Reporter: Jane Chan
 Fix For: table-store-0.3.0


https://github.com/apache/flink-table-store/actions/runs/3343373246/jobs/5536523910



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-263: Improve resolving schema compatibility

2022-10-28 Thread Dawid Wysakowicz
I see the majority of the community prefers the not immediate breaking 
of the API. Even though I have different preference I am more than fine 
to commit to the others choice. Just please make sure it is prominently 
documented that one or the other methods MUST be implemented in any new 
serializer.


Best,

Dawid

On 27/10/2022 08:33, Yun Gao wrote:

Hi,
I have discussed offline with Hangxiang and Yuan, and the current
proposal also looks good to me.
Thanks Hangxiang for driving this.
Best,
Yun Gao
--
From:Yuan Mei 
Send Time:2022 Oct. 26 (Wed.) 12:20
To:dev 
Subject:Re: [DISCUSS] FLIP-263: Improve resolving schema compatibility
Hey Huangxiang,
The section of `Rejected Alternatives` may also need an update.
Current plan sounds like a reasonable one. I am fine with it.
Thanks for driving this.
Best
Yuan
On Tue, Oct 25, 2022 at 5:11 PM Hangxiang Yu  wrote:

(Resend the mail to fix the format issue)
Hi, everyone.

Thanks for your suggestions!

Let me summarize the remaining questions in the thread and share my ideas
based on your suggestions:


1. Should we put the new opposite interface in TypeSerializer or
TypeSerializerSnapshot ?

Just as I replied to Dawid, I'd like to put it in
TypeSerializerSnapshot so that we could still follow the contract between
two classes and make later code migration easier based on current tools.

Thanks Dawid for the initial suggestion and Godfrey for the additional
supplement.



2. Do we just break changes and make user codes incompatible or make sure
compatibility using a more suitable migration plan ?

I agree with Yuan that we should make sure that user jobs still work
without modifying any codes before removing the deprecated method.

Thanks Yuan for the migration plan. Let me try to add something to the
suggestion of Yuan:


a. In Step 1, I prefer to make the new interface like:


default TypeSerializerSchemaCompatibility
resolveSchemaCompatibility(
// Use 'oldSnapshot' not 'oldSerializer'
TypeSerializerSnapshot oldSnapshot) {
return INCOMPATIBLE;
}
}

I think using 'oldSnapshot' as the parameter will make the
logic clear --- TypeSerializerSnapshot will take all responsibility for
compatibility checks.
BTW, It's also easy to migrate original check logic to this
interface.

b. In Step 1, In addition to introducing default implementations
for both interfaces, we also need to implement the new interface in all
inner TypeSerializerSnapshots.
Users may implement their own serializers based on inner
serializers, we should make sure that the new interface of inner
TypeSerializerSnapshots is usable.

Then I think it could work for both old custom serializers or new
custom serializers.
No matter which interface the user implements, it could always work.
Of course, we will deprecate the old interface and encourage users to
use the new one.


3. Do we need to squash this with
https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w 
 ?
We will not break the compatibility based on 2, so it's not necessary
to squash them together.


Do you have any other suggestions ? Look forward to your reply!

On Tue, Oct 25, 2022 at 1:31 PM Hangxiang Yu  wrote:


Hi, everyone.

Thanks for your suggestions!

Let me summarize the remaining questions in the thread and share my ideas
based on your suggestions:

1. Should we put the new opposite interface in TypeSerializer or
TypeSerializerSnapshot ?

Just as I replied to Dawid, I'd like to put it in TypeSerializerSnapshot
so that we could still follow the contract between two classes and make
later code migration easier based on current tools.

Thanks Dawid for the initial suggestion and Godfrey for the additional
supplement.



1. Do we just break changes and make user codes incompatible or make
sure compatibility using a more suitable migration plan ?

I agree with Yuan that we should make sure that user jobs still work
without modifying any codes before removing the deprecated method.

Thanks Yuan for the migration plan. Let me try to add something to the
suggestion of Yuan:

1. In Step 1, I prefer to make the new interface like:

default TypeSerializerSchemaCompatibility resolveSchemaCompatibility(

// Use 'oldSnapshot' not 'oldSerializer'
TypeSerializerSnapshot oldSnapshot) {

return INCOMPATIBLE;

}

I think using 'oldSnapshot' as the parameter will make the logic clear

---

TypeSerializerSnapshot will take all responsibility for compatibility
checks.

BTW, It's also easy to migrate original check logic to this interface.

1. In Step 1, In addition to introducing default implementations for
both interfaces, we also need to implement the new interface in

all inner

TypeSerializerSnapshots.

Users may implement their own serializers based on inner serializers, we
should make sure that the new interface of inner TypeSerializerSnapshots

is

usable.


Then I think it could work for both old custom serialize

Re: [VOTE] FLIP-263: Improve resolving schema compatibility

2022-10-28 Thread Dawid Wysakowicz

+1,

Best,

Dawid

On 28/10/2022 08:08, godfrey he wrote:

+1 (binding)

Thanks for driving this!

Best,
Godfrey

Yun Gao  于2022年10月28日周五 13:50写道:

+1 (binding)

Thanks Hangxiang for driving the FLIP.

Best,
Yun Gao




  --Original Mail --
Sender:Zakelly Lan 
Send Date:Fri Oct 28 12:27:01 2022
Recipients:Flink Dev 
Subject:Re: [VOTE] FLIP-263: Improve resolving schema compatibility
Hi Hangxiang,

The current plan looks good to me, +1 (non-binding). Thanks for driving this.

Best,
Zakelly

On Fri, Oct 28, 2022 at 11:18 AM Yuan Mei  wrote:

+1 (binding)

Thanks for driving this.

Best
Yuan

On Fri, Oct 28, 2022 at 11:17 AM yanfei lei  wrote:


+1(non-binding) and thanks for Hangxiang's driving.



Hangxiang Yu  于2022年10月28日周五 09:24写道:


Hi everyone,

I'd like to start the vote for FLIP-263 [1].

Thanks for your feedback and the discussion in [2][3].

The vote will be open for at least 72 hours.

Best regards,
Hangxiang.

[1]



https://cwiki.apache.org/confluence/display/FLINK/FLIP-263%3A+Improve+resolving+schema+compatibility

[2] https://lists.apache.org/thread/4w36oof8dh28b9f593sgtk21o8qh8qx4

[3] https://lists.apache.org/thread/t0bdkx1161rlbnsf06x0kswb05mch164



--
Best,
Yanfei



OpenPGP_0x31D2DD10BFC15A2D.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [ANNOUNCE] Apache Flink 1.16.0 released

2022-10-28 Thread Lijie Wang
Congratulations!

Thanks everyone involved!

Best,
LIjie

Xingbo Huang  于2022年10月28日周五 14:46写道:

> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.16.0, which is the first release for the Apache Flink 1.16 series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the
> improvements for this release:
> https://flink.apache.org/news/2022/10/28/1.16-announcement.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351275
>
> We would like to thank all contributors of the Apache Flink community
> who made this release possible!
>
> Regards,
> Chesnay, Martijn, Godfrey & Xingbo
>


Re: [ANNOUNCE] Performance Daily Monitoring Moved from Ververica to Apache Flink Slack Channel

2022-10-28 Thread Congxian Qiu
Thanks for driving this and making the performance monitoring public,  this
can make us know and resolve the performance problem quickly.

Looking forward to the workflow and detailed descriptions fo
flink-dev-benchmarks.

Best,
Congxian


Yun Tang  于2022年10月27日周四 12:41写道:

> Thanks, Yanfei for driving this to monitor the performance in the Apache
> Flink Slack Channel.
>
> Look forward to the workflow and detailed descriptions of
> flink-dev-benchmarks.
>
> Best
> Yun Tang
> 
> From: Hangxiang Yu 
> Sent: Thursday, October 27, 2022 10:59
> To: dev@flink.apache.org 
> Subject: Re: [ANNOUNCE] Performance Daily Monitoring Moved from Ververica
> to Apache Flink Slack Channel
>
> Hi, Yanfei.
> Thanks for driving this.
> It could help us to detect and resolve the regression problem quickly and
> officially.
> I'd like to join as a maintainer.
> Looking forward to the workflow.
>
> On Wed, Oct 26, 2022 at 5:18 PM Yuan Mei  wrote:
>
> > Thanks, Yanfei, to drive this and make the performance monitoring
> publicly
> > available.
> >
> > Looking forward to seeing the workflow, and more details as Martijn
> > mentioned.
> >
> > Best
> > Yuan
> >
> > On Wed, Oct 26, 2022 at 2:59 PM Martijn Visser  >
> > wrote:
> >
> > > Hi Yanfei Lei,
> > >
> > > Thanks for setting this up! It would be interesting to also know which
> > > aspects of Flink are monitored for "performance". I'm assuming there
> are
> > > specific pieces of functionality that are performance tested, but it
> > would
> > > be great if this would be written down somewhere (next to a procedure
> how
> > > to detect a regression and what should be next steps).
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Wed, Oct 26, 2022 at 8:21 AM Zakelly Lan 
> > wrote:
> > >
> > > > Hi yanfei,
> > > >
> > > > Thanks for driving this! It's a great help.
> > > >
> > > > I would like to join as a maintainer.
> > > >
> > > > Best,
> > > > Zakelly
> > > >
> > > > On Wed, Oct 26, 2022 at 11:32 AM yanfei lei 
> > wrote:
> > > > >
> > > > > Hi everyone,
> > > > >
> > > > > As discussed earlier, we plan to create a benchmark channel in
> Apache
> > > > Flink
> > > > > slack[1], but the plan was shelved for a while[2]. So I went on
> with
> > > this
> > > > > work, and created the #flink-dev-benchmarks channel for performance
> > > > > regression notifications.
> > > > >
> > > > > We have a regression report script[3] that runs daily, and a
> > > notification
> > > > > would be sent to the slack channel when the last few benchmark
> > results
> > > > are
> > > > > significantly worse than the baseline.
> > > > > Note, regressions are detected by a simple script which may have
> > false
> > > > > positives and false negatives. And all benchmarks are executed on
> one
> > > > > physical machine[4] which is provided by Ververica(Alibaba)[5], it
> > > might
> > > > > happen that hardware issues affect performance, like "[FLINK-18614
> > > > > ] Performance
> > > > regression
> > > > > 2020.07.13"[6].
> > > > >
> > > > > After the migration, we need a procedure to watch over the entire
> > > > > performance of Flink code together. For example, if a regression
> > > > > occurs, investigating the cause and resolving the problem are
> needed.
> > > In
> > > > > the past, this procedure is maintained internally within Ververica,
> > but
> > > > we
> > > > > think making the procedure public would benefit all. I volunteer to
> > > serve
> > > > > as one of the initial maintainers, and would be glad if more
> > > contributors
> > > > > can join me. I'd also prepare some guidelines to help others get
> > > familiar
> > > > > with the workflow. I will start a new thread to discuss the
> workflow
> > > > soon.
> > > > >
> > > > >
> > > > > [1]
> https://www.mail-archive.com/dev@flink.apache.org/msg58666.html
> > > > > [2] https://issues.apache.org/jira/browse/FLINK-28468
> > > > > [3]
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink-benchmarks/blob/master/regression_report.py
> > > > > [4] http://codespeed.dak8s.net:8080
> > > > > [5]
> https://lists.apache.org/thread/jzljp4233799vwwqnr0vc9wgqs0xj1ro
> > > > >
> > > > > [6] https://issues.apache.org/jira/browse/FLINK-18614
> > > >
> > >
> >
>
>
> --
> Best,
> Hangxiang.
>


Re: [DISCUSS] FLINK-20578 Cannot create empty array using ARRAY[]

2022-10-28 Thread Timo Walther
Actually, the new type inference stack for UDFs is smart enough to solve 
this issue. It could derive a data type for the array from the 
surrounding call (expected data type).


So this can be supported with the right type inference logic: 
cast(ARRAY() as int)


Unfortunately, ARRAY is fully managed by Calcite and maybe deeply 
integrated also into the parser (at least this is the case for ROW).
TBH if I were to design a FLIP for the collection functions, I would 
actually propose to introduce `ARRAY_OF()`, `ROW_OF()` to have full 
control over the type inference in our stack. In our stack, this also 
means that NULL is unknown. Calcite distinguished between NULL and unknown.


So if we wanna go the easy path (without introducing ARRAY_OF), ARRAY() 
should result in ARRAY if the type can not be derived by the 
surrounding call.


Regards,
Timo

On 28.10.22 03:46, yuxia wrote:

For an empty array, seems different engine use different data type:
Hive: string
Spark: string ?
Trino:  Unknown
BigQuery: Integer

I have tried with Hive and Spark, but haven't tried with Trino and BigQuery.

I'm a little of doubt about the spark's behavior. But from my sides, seems 
Spark actually use string type which is different from your investigation.
I try with the following sql in spark-cli:
`
select array() + 1
`

The exception is
`
Error in query: cannot resolve '(array() + 1)' due to data type mismatch: differing 
types in '(array() + 1)' (array and int).; line 1 pos 7;
'Project [unresolvedalias((array() + 1), None)]
+- OneRowRelation
`


Seems it's hard to decide which data type Flink should use. I'm insterested in 
the reason why you would like to use Integer type.
I haven't cheked whether the sql stardard specifies it. But from my side, I 
prefer to follow Hive/Spark.

BTW: the query `SELECT COALESCE(1, cast(ARRAY() as int))` will fail in Hive and 
Spark.


Best regards,
Yuxia

- 原始邮件 -
发件人: "eric xiao" 
收件人: "dev" 
发送时间: 星期四, 2022年 10 月 27日 下午 9:13:51
主题: [DISCUSS] FLINK-20578 Cannot create empty array using ARRAY[]

Hi,

I would like to propose a solution to this JIRA issue. I looked at the
comments and there was some guidance around where in the code we should
update to allow for this behaviour. But I believe there are still two
questions that remain open:

1. Is this expected behaviour (i.e. users should not be able to create
an empty array)?
2. If this is indeed expected behaviour, what should the data type be of
the empty array?

I did some digging into other query engines / databases in hopes of
answering the following two questions - That can be found at the end of
this thread.

*Q: *Is this expected behaviour (i.e. users should not be able to create an
empty array)?
*A: *Yes I would say this is expected behaviour and something we should add
into the Flink SQL API.

*Q: *What should the data type be of the empty array?
*A: *This question is a bit harder to answer and I think it would require
two steps.

*Step 1: Pick a default data type to initialize the empty array.*

We can use an "empty data type" such as NULL, VOID.

*Step 2: Create or reuse type coercion to make using empty arrays easier.*

The above should unblock users from creating empty arrays, but if one would
use an empty array in an COALESCE operation.

i.e. SELECT COALESCE(int_column, ARRAY[])

I believe they will get a query issue where the type for int_column (INTEGER)
and the empty array (NULL, VOID) do not match. Thus a user will need to
cast the empty array:

i.e. SELECT COALESCE(int_column, CAST(ARRAY[] AS INT))

as such to have the COALESCE query to execute successfully.

-
*Trino*

EXPLAIN SELECT ARRAY[]

Fragment 0 [SINGLE]
 Output layout: [expr]
 Output partitioning: SINGLE []
 Output[columnNames = [_col0]]
 │   Layout: [expr:array(unknown)]
 │   Estimates: {rows: 1 (55B), cpu: 0, memory: 0B, network: 0B}
 │   _col0 := expr
 └─ Values[]
Layout: [expr:array(unknown)]
Estimates: {rows: 1 (55B), cpu: 0, memory: 0B, network: 0B}

  ("$literal$"(from_base64('AwAAAFJMRQAKQllURV9BUlJBWQEBgAA=')))

Expected behaviour? *Yes.*
Array data type? *Unknown.*

*Spark*

sc.sql.sql("SELECT ARRAY[]").explain()

  DataFrame[array(): array]

Expected behaviour? *Yes.*
Array data type? *Void.*

*BigQuery*

SELECT ARRAY[]

Field name Type  Mode
f0_  INTEGER  REPEATED

Expected behaviour? *Yes.*
Array data type? *Integer.*

Best,

Eric





Re: [ANNOUNCE] Performance Daily Monitoring Moved from Ververica to Apache Flink Slack Channel

2022-10-28 Thread weijie guo
Thanks Yanfei for driving this.

It allows us to easily find the problem of performance regression.
Especially recently, I have made some improvements to the scheduling
related parts, your work is very important to ensure that these changes do
not cause some unexpected problems.

Best regards,

Weijie


Congxian Qiu  于2022年10月28日周五 16:03写道:

> Thanks for driving this and making the performance monitoring public,  this
> can make us know and resolve the performance problem quickly.
>
> Looking forward to the workflow and detailed descriptions fo
> flink-dev-benchmarks.
>
> Best,
> Congxian
>
>
> Yun Tang  于2022年10月27日周四 12:41写道:
>
> > Thanks, Yanfei for driving this to monitor the performance in the Apache
> > Flink Slack Channel.
> >
> > Look forward to the workflow and detailed descriptions of
> > flink-dev-benchmarks.
> >
> > Best
> > Yun Tang
> > 
> > From: Hangxiang Yu 
> > Sent: Thursday, October 27, 2022 10:59
> > To: dev@flink.apache.org 
> > Subject: Re: [ANNOUNCE] Performance Daily Monitoring Moved from Ververica
> > to Apache Flink Slack Channel
> >
> > Hi, Yanfei.
> > Thanks for driving this.
> > It could help us to detect and resolve the regression problem quickly and
> > officially.
> > I'd like to join as a maintainer.
> > Looking forward to the workflow.
> >
> > On Wed, Oct 26, 2022 at 5:18 PM Yuan Mei  wrote:
> >
> > > Thanks, Yanfei, to drive this and make the performance monitoring
> > publicly
> > > available.
> > >
> > > Looking forward to seeing the workflow, and more details as Martijn
> > > mentioned.
> > >
> > > Best
> > > Yuan
> > >
> > > On Wed, Oct 26, 2022 at 2:59 PM Martijn Visser <
> martijnvis...@apache.org
> > >
> > > wrote:
> > >
> > > > Hi Yanfei Lei,
> > > >
> > > > Thanks for setting this up! It would be interesting to also know
> which
> > > > aspects of Flink are monitored for "performance". I'm assuming there
> > are
> > > > specific pieces of functionality that are performance tested, but it
> > > would
> > > > be great if this would be written down somewhere (next to a procedure
> > how
> > > > to detect a regression and what should be next steps).
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > >
> > > > On Wed, Oct 26, 2022 at 8:21 AM Zakelly Lan 
> > > wrote:
> > > >
> > > > > Hi yanfei,
> > > > >
> > > > > Thanks for driving this! It's a great help.
> > > > >
> > > > > I would like to join as a maintainer.
> > > > >
> > > > > Best,
> > > > > Zakelly
> > > > >
> > > > > On Wed, Oct 26, 2022 at 11:32 AM yanfei lei 
> > > wrote:
> > > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > As discussed earlier, we plan to create a benchmark channel in
> > Apache
> > > > > Flink
> > > > > > slack[1], but the plan was shelved for a while[2]. So I went on
> > with
> > > > this
> > > > > > work, and created the #flink-dev-benchmarks channel for
> performance
> > > > > > regression notifications.
> > > > > >
> > > > > > We have a regression report script[3] that runs daily, and a
> > > > notification
> > > > > > would be sent to the slack channel when the last few benchmark
> > > results
> > > > > are
> > > > > > significantly worse than the baseline.
> > > > > > Note, regressions are detected by a simple script which may have
> > > false
> > > > > > positives and false negatives. And all benchmarks are executed on
> > one
> > > > > > physical machine[4] which is provided by Ververica(Alibaba)[5],
> it
> > > > might
> > > > > > happen that hardware issues affect performance, like
> "[FLINK-18614
> > > > > > ] Performance
> > > > > regression
> > > > > > 2020.07.13"[6].
> > > > > >
> > > > > > After the migration, we need a procedure to watch over the entire
> > > > > > performance of Flink code together. For example, if a regression
> > > > > > occurs, investigating the cause and resolving the problem are
> > needed.
> > > > In
> > > > > > the past, this procedure is maintained internally within
> Ververica,
> > > but
> > > > > we
> > > > > > think making the procedure public would benefit all. I volunteer
> to
> > > > serve
> > > > > > as one of the initial maintainers, and would be glad if more
> > > > contributors
> > > > > > can join me. I'd also prepare some guidelines to help others get
> > > > familiar
> > > > > > with the workflow. I will start a new thread to discuss the
> > workflow
> > > > > soon.
> > > > > >
> > > > > >
> > > > > > [1]
> > https://www.mail-archive.com/dev@flink.apache.org/msg58666.html
> > > > > > [2] https://issues.apache.org/jira/browse/FLINK-28468
> > > > > > [3]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink-benchmarks/blob/master/regression_report.py
> > > > > > [4] http://codespeed.dak8s.net:8080
> > > > > > [5]
> > https://lists.apache.org/thread/jzljp4233799vwwqnr0vc9wgqs0xj1ro
> > > > > >
> > > > > > [6] https://issues.apache.org/jira/browse/FLINK-18614
> > > > >
> > > >
> > >
> >
> >
> > --
> > Best,
> > Hangxian

Re: [ANNOUNCE] Apache Flink 1.16.0 released

2022-10-28 Thread Biao Liu
Congrats! Glad to hear that.

BTW, I just found the document link of 1.16 from https://flink.apache.org/
is not correct.

[image: 截屏2022-10-28 17.01.28.png]

Thanks,
Biao /'bɪ.aʊ/



On Fri, 28 Oct 2022 at 14:46, Xingbo Huang  wrote:

> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.16.0, which is the first release for the Apache Flink 1.16 series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the
> improvements for this release:
> https://flink.apache.org/news/2022/10/28/1.16-announcement.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351275
>
> We would like to thank all contributors of the Apache Flink community
> who made this release possible!
>
> Regards,
> Chesnay, Martijn, Godfrey & Xingbo
>


[jira] [Created] (FLINK-29792) FileStoreCommitTest is unstable and may stuck

2022-10-28 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-29792:
---

 Summary: FileStoreCommitTest is unstable and may stuck
 Key: FLINK-29792
 URL: https://issues.apache.org/jira/browse/FLINK-29792
 Project: Flink
  Issue Type: Bug
  Components: Table Store
Affects Versions: table-store-0.3.0
Reporter: Caizhi Weng


{{FileStoreCommitTest}} may stuck because the {{FileStoreCommit}} in 
{{TestCommitThread}} does not commit APPEND snapshot when no new files are 
produced. In this case, if the following COMPACT snapshot conflicts with the 
current merge tree, the test will stuck.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] Apache Flink 1.16.0 released

2022-10-28 Thread Márton Balassi
Awesome, thanks team! Thanks Xingbo for managing the release.

On Fri, Oct 28, 2022 at 11:04 AM Biao Liu  wrote:

> Congrats! Glad to hear that.
>
> BTW, I just found the document link of 1.16 from https://flink.apache.org/
> is not correct.
>
> [image: 截屏2022-10-28 17.01.28.png]
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Fri, 28 Oct 2022 at 14:46, Xingbo Huang  wrote:
>
>> The Apache Flink community is very happy to announce the release of Apache
>> Flink 1.16.0, which is the first release for the Apache Flink 1.16 series.
>>
>> Apache Flink® is an open-source stream processing framework for
>> distributed, high-performing, always-available, and accurate data
>> streaming
>> applications.
>>
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>>
>> Please check out the release blog post for an overview of the
>> improvements for this release:
>> https://flink.apache.org/news/2022/10/28/1.16-announcement.html
>>
>> The full release notes are available in Jira:
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351275
>>
>> We would like to thank all contributors of the Apache Flink community
>> who made this release possible!
>>
>> Regards,
>> Chesnay, Martijn, Godfrey & Xingbo
>>
>


Re: [ANNOUNCE] Apache Flink 1.16.0 released

2022-10-28 Thread yuxia
Congratulations!

Best regards,
Yuxia

- 原始邮件 -
发件人: "Márton Balassi" 
收件人: "dev" 
发送时间: 星期五, 2022年 10 月 28日 下午 5:59:42
主题: Re: [ANNOUNCE] Apache Flink 1.16.0 released

Awesome, thanks team! Thanks Xingbo for managing the release.

On Fri, Oct 28, 2022 at 11:04 AM Biao Liu  wrote:

> Congrats! Glad to hear that.
>
> BTW, I just found the document link of 1.16 from https://flink.apache.org/
> is not correct.
>
> [image: 截屏2022-10-28 17.01.28.png]
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Fri, 28 Oct 2022 at 14:46, Xingbo Huang  wrote:
>
>> The Apache Flink community is very happy to announce the release of Apache
>> Flink 1.16.0, which is the first release for the Apache Flink 1.16 series.
>>
>> Apache Flink® is an open-source stream processing framework for
>> distributed, high-performing, always-available, and accurate data
>> streaming
>> applications.
>>
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>>
>> Please check out the release blog post for an overview of the
>> improvements for this release:
>> https://flink.apache.org/news/2022/10/28/1.16-announcement.html
>>
>> The full release notes are available in Jira:
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351275
>>
>> We would like to thank all contributors of the Apache Flink community
>> who made this release possible!
>>
>> Regards,
>> Chesnay, Martijn, Godfrey & Xingbo
>>
>


[jira] [Created] (FLINK-29793) Scala Code 2 Java Code

2022-10-28 Thread linweijiang (Jira)
linweijiang created FLINK-29793:
---

 Summary: Scala Code 2 Java Code
 Key: FLINK-29793
 URL: https://issues.apache.org/jira/browse/FLINK-29793
 Project: Flink
  Issue Type: Technical Debt
Reporter: linweijiang


Is there any plan to replace all Scala code with Java, thks~



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] Apache Flink 1.16.0 released

2022-10-28 Thread Wei Zhong
Congratulations!

Thank you very much! Thanks Xingbo for managing the release.

Best,
Wei


> 2022年10月28日 下午6:10,yuxia  写道:
> 
> Congratulations!
> 
> Best regards,
> Yuxia
> 
> - 原始邮件 -
> 发件人: "Márton Balassi" 
> 收件人: "dev" 
> 发送时间: 星期五, 2022年 10 月 28日 下午 5:59:42
> 主题: Re: [ANNOUNCE] Apache Flink 1.16.0 released
> 
> Awesome, thanks team! Thanks Xingbo for managing the release.
> 
> On Fri, Oct 28, 2022 at 11:04 AM Biao Liu  wrote:
> 
>> Congrats! Glad to hear that.
>> 
>> BTW, I just found the document link of 1.16 from https://flink.apache.org/
>> is not correct.
>> 
>> [image: 截屏2022-10-28 17.01.28.png]
>> 
>> Thanks,
>> Biao /'bɪ.aʊ/
>> 
>> 
>> 
>> On Fri, 28 Oct 2022 at 14:46, Xingbo Huang  wrote:
>> 
>>> The Apache Flink community is very happy to announce the release of Apache
>>> Flink 1.16.0, which is the first release for the Apache Flink 1.16 series.
>>> 
>>> Apache Flink® is an open-source stream processing framework for
>>> distributed, high-performing, always-available, and accurate data
>>> streaming
>>> applications.
>>> 
>>> The release is available for download at:
>>> https://flink.apache.org/downloads.html
>>> 
>>> Please check out the release blog post for an overview of the
>>> improvements for this release:
>>> https://flink.apache.org/news/2022/10/28/1.16-announcement.html
>>> 
>>> The full release notes are available in Jira:
>>> 
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351275
>>> 
>>> We would like to thank all contributors of the Apache Flink community
>>> who made this release possible!
>>> 
>>> Regards,
>>> Chesnay, Martijn, Godfrey & Xingbo
>>> 
>> 



Re: [ANNOUNCE] Apache Flink 1.16.0 released

2022-10-28 Thread Geng Biao
Congrats!Many thanks to Xingbo,Chesnay, Martijn and Godfrey!

Best,
Bias Geng

获取 Outlook for iOS

发件人: Wei Zhong 
发送时间: Friday, October 28, 2022 6:58:21 PM
收件人: dev@flink.apache.org 
主题: Re: [ANNOUNCE] Apache Flink 1.16.0 released

Congratulations!

Thank you very much! Thanks Xingbo for managing the release.

Best,
Wei


> 2022年10月28日 下午6:10,yuxia  写道:
>
> Congratulations!
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Márton Balassi" 
> 收件人: "dev" 
> 发送时间: 星期五, 2022年 10 月 28日 下午 5:59:42
> 主题: Re: [ANNOUNCE] Apache Flink 1.16.0 released
>
> Awesome, thanks team! Thanks Xingbo for managing the release.
>
> On Fri, Oct 28, 2022 at 11:04 AM Biao Liu  wrote:
>
>> Congrats! Glad to hear that.
>>
>> BTW, I just found the document link of 1.16 from https://flink.apache.org/
>> is not correct.
>>
>> [image: 截屏2022-10-28 17.01.28.png]
>>
>> Thanks,
>> Biao /'bɪ.aʊ/
>>
>>
>>
>> On Fri, 28 Oct 2022 at 14:46, Xingbo Huang  wrote:
>>
>>> The Apache Flink community is very happy to announce the release of Apache
>>> Flink 1.16.0, which is the first release for the Apache Flink 1.16 series.
>>>
>>> Apache Flink® is an open-source stream processing framework for
>>> distributed, high-performing, always-available, and accurate data
>>> streaming
>>> applications.
>>>
>>> The release is available for download at:
>>> https://flink.apache.org/downloads.html
>>>
>>> Please check out the release blog post for an overview of the
>>> improvements for this release:
>>> https://flink.apache.org/news/2022/10/28/1.16-announcement.html
>>>
>>> The full release notes are available in Jira:
>>>
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351275
>>>
>>> We would like to thank all contributors of the Apache Flink community
>>> who made this release possible!
>>>
>>> Regards,
>>> Chesnay, Martijn, Godfrey & Xingbo
>>>
>>



Re: [ANNOUNCE] Apache Flink 1.16.0 released

2022-10-28 Thread 任庆盛
Congratulations and a big thanks to Chesnay, Martijn, Godfrey and Xingbo for 
the awesome work for 1.16! 

Best regards,
Qingsheng Ren

> On Oct 28, 2022, at 14:46, Xingbo Huang  wrote:
> 
> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.16.0, which is the first release for the Apache Flink 1.16 series.
> 
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
> 
> The release is available for download at:
> https://flink.apache.org/downloads.html
> 
> Please check out the release blog post for an overview of the
> improvements for this release:
> https://flink.apache.org/news/2022/10/28/1.16-announcement.html
> 
> The full release notes are available in Jira:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351275
> 
> We would like to thank all contributors of the Apache Flink community
> who made this release possible!
> 
> Regards,
> Chesnay, Martijn, Godfrey & Xingbo


Re: [ANNOUNCE] Apache Flink 1.16.0 released

2022-10-28 Thread 任庆盛
Congratulations and a big thanks to Chesnay, Martijn, Godfrey and Xingbo for 
the awesome work for 1.16! 

Best regards,
Qingsheng Ren

> On Oct 28, 2022, at 14:46, Xingbo Huang  wrote:
> 
> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.16.0, which is the first release for the Apache Flink 1.16 series.
> 
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
> 
> The release is available for download at:
> https://flink.apache.org/downloads.html
> 
> Please check out the release blog post for an overview of the
> improvements for this release:
> https://flink.apache.org/news/2022/10/28/1.16-announcement.html
> 
> The full release notes are available in Jira:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351275
> 
> We would like to thank all contributors of the Apache Flink community
> who made this release possible!
> 
> Regards,
> Chesnay, Martijn, Godfrey & Xingbo


Re: [ANNOUNCE] Apache Flink 1.16.0 released

2022-10-28 Thread Jing Ge
Congrats!

On Fri, Oct 28, 2022 at 1:22 PM 任庆盛  wrote:

> Congratulations and a big thanks to Chesnay, Martijn, Godfrey and Xingbo
> for the awesome work for 1.16!
>
> Best regards,
> Qingsheng Ren
>
> > On Oct 28, 2022, at 14:46, Xingbo Huang  wrote:
> >
> > The Apache Flink community is very happy to announce the release of
> Apache
> > Flink 1.16.0, which is the first release for the Apache Flink 1.16
> series.
> >
> > Apache Flink® is an open-source stream processing framework for
> > distributed, high-performing, always-available, and accurate data
> streaming
> > applications.
> >
> > The release is available for download at:
> > https://flink.apache.org/downloads.html
> >
> > Please check out the release blog post for an overview of the
> > improvements for this release:
> > https://flink.apache.org/news/2022/10/28/1.16-announcement.html
> >
> > The full release notes are available in Jira:
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351275
> >
> > We would like to thank all contributors of the Apache Flink community
> > who made this release possible!
> >
> > Regards,
> > Chesnay, Martijn, Godfrey & Xingbo
>


Re: [DISCUSS] Drop TypeSerializerConfigSnapshot and savepoint support from Flink versions < 1.8.0

2022-10-28 Thread Piotr Nowojski
Hi,

Thanks for the support. It looks like nobody objects to this proposal, so I
will start a formal vote.

Best,
Piotrek

pt., 21 paź 2022 o 08:24 Timo Walther  napisał(a):

> Makes sense to me. The serializer stack is pretty complex right now, the
> more legacy we remove the better.
>
> Regards,
> Timo
>
>
> On 20.10.22 12:49, Chesnay Schepler wrote:
> > +1
> >
> > Sounds like a good reason to drop these long-deprecated APIs.
> >
> > On 19/10/2022 15:13, Piotr Nowojski wrote:
> >> Hi devs,
> >>
> >> I would like to open a discussion to remove the long deprecated
> >> (@PublicEvolving) TypeSerializerConfigSnapshot class [1] and the related
> >> code.
> >>
> >> The motivation behind this move is two fold. One reason is that it
> >> complicates our code base unnecessarily and creates confusion on how to
> >> actually implement custom serializers. The immediate reason is that I
> >> wanted to clean up Flink's configuration stack a bit and refactor the
> >> ExecutionConfig class [2]. This refactor would keep the API
> compatibility
> >> of the ExecutionConfig, but it would break savepoint compatibility with
> >> snapshots written with some of the old serializers, which had
> >> ExecutionConfig as a field and were serialized in the snapshot. This
> >> issue
> >> has been resolved by the introduction of TypeSerializerSnapshot in Flink
> >> 1.7 [3], where serializers are no longer part of the snapshot.
> >>
> >> TypeSerializerConfigSnapshot has been deprecated and no longer used by
> >> built-in serializers since Flink 1.8 [4] and [5]. Users were
> >> encouraged to
> >> migrate to TypeSerializerSnapshot since then with their own custom
> >> serializers. That has been plenty of time for the migration.
> >>
> >> This proposal would have the following impact for the users:
> >> 1. we would drop support for recovery from savepoints taken with Flink <
> >> 1.7.0 for all built in types serializers
> >> 2. we would drop support for recovery from savepoints taken with Flink <
> >> 1.8.0 for built in kryo serializers
> >> 3. we would drop support for recovery from savepoints taken with Flink <
> >> 1.17 for custom serializers using deprecated
> TypeSerializerConfigSnapshot
> >>
> >> 1. and 2. would have a simple migration path. Users migrating from those
> >> old savepoints would have to first start his job using a Flink version
> >> from
> >> the [1.8, 1.16] range, and take a new savepoint that would be compatible
> >> with Flink 1.17.
> >> 3. This is a bit more problematic, because users would have to first
> >> migrate their own custom serializers to use TypeSerializerSnapshot
> >> (using a
> >> Flink version from the [1.8, 1.16]), take a savepoint, and only then
> >> migrate to Flink 1.17. However users had already 4 years to migrate,
> >> which
> >> in my opinion has been plenty of time to do so.
> >>
> >> As a side effect, we could also drop support for some of the legacy
> >> metadata serializers from LegacyStateMetaInfoReaders and potentially
> >> other
> >> places that we are keeping for the sake of compatibility with old
> >> savepoints.
> >>
> >> What do you think?
> >>
> >> Best,
> >> Piotrek
> >>
> >> [1]
> >>
> https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/common/typeutils/TypeSerializerConfigSnapshot.html
> >> [2] https://issues.apache.org/jira/browse/FLINK-29379
> >> [3] https://issues.apache.org/jira/browse/FLINK-9377
> >> [4] https://issues.apache.org/jira/browse/FLINK-9376
> >> [5] https://issues.apache.org/jira/browse/FLINK-11323
> >>
> >
>
>


Re: [DISCUSS] Issue tracking workflow

2022-10-28 Thread Piotr Nowojski
Hi,

I'm afraid of the migration cost to only github issues and lack of many
features that we are currently using. That would be very disruptive and
annoying. For me github issues are far worse compared to using the Jira.

I would strongly prefer Option 1. over the others. Option 4 I would like
the least. I would be fine with Option 3, and Option 2 but assuming that
Jira would stay the source of truth.
For option 2, maybe we could have a bot that would backport/copy user
created issues in github to Jira (and link them together)? Discussions
could still happen in the github, but we could track all of the issues as
we are doing right now. Bot could also sync it the other way around (like
marking tickets closed, affected/fixed versions etc).

Best,
Piotrek

czw., 27 paź 2022 o 07:48 Martijn Visser 
napisał(a):

> Hi,
>
> We have to keep in mind that if a users asks for a new Jira account, that
> person will need to provide its email address which is the Flink PMC
> processing personal identifiable information. There needs to be a careful
> process for that and to be honest, I don't think the ASF should do this
> from a privacy perspective.
>
> As an example, the Calcite community decided to create a dedicated, private
> list where users can ask for an account to avoid making the email address
> public.
>
> Best regards,
>
> Martijn
>
> Op wo 26 okt. 2022 om 22:31 schreef Danny Cranmer  >
>
> > Hello,
> >
> > I agree with Gyula. My preference is also option 1, and as a fallback
> > option 3. Handling new user account requests will be manageable,
> especially
> > via slack. We could setup a dedicated channel for people to ask for
> > Jira/wiki access.
> >
> > Thanks,
> > Danny
> >
> > On Wed, 26 Oct 2022, 12:16 Gyula Fóra,  wrote:
> >
> > > Hi!
> > >
> > > I would also personally prefer staying with JIRA given the feature set
> > and
> > > the past positive experience with it.
> > > I think the structured nature of JIRA with flexible components, issue
> > > types, epics, release handling etc have been a great benefit to the
> > > project, it would be a shame to give some of these up.
> > >
> > > If for some reason Option 1 is not possible, I would still prefer
> Option
> > 3
> > > (requiring new contributors to ask for JIRA access) compared to the
> > > alternatives.
> > >
> > > Cheers,
> > > Gyula
> > >
> > >
> > > On Tue, Oct 25, 2022 at 3:48 PM Robert Metzger 
> > > wrote:
> > >
> > > > Thank you for starting this discussion Xintong!
> > > >
> > > > I would also prefer option 1.
> > > >
> > > > The ASF Jira is probably one of the largest, public Jira instances on
> > the
> > > > internet. Most other Jiras are internal within companies, so
> Atlassian
> > is
> > > > probably not putting a lot of effort into automatically detecting and
> > > > preventing spam and malicious account creation.
> > > > If we want to convince Infra to keep the current sign up process, we
> > > > probably need to help them find a solution for the problem.
> > > > Maybe we can configure the ASF Jira to rely on GitHub as an identity
> > > > provider? I've just proposed that in the discussion on
> > > > us...@infra.apache.org, let's see ;)
> > > >
> > > > Best,
> > > > Robert
> > > >
> > > >
> > > > On Tue, Oct 25, 2022 at 2:08 PM Konstantin Knauf 
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > while I see some benefits in moving to Github Issues completely, we
> > > need
> > > > to
> > > > > be aware that Github Issues lacks many features that Jira has. From
> > the
> > > > top
> > > > > of my head:
> > > > > * there are no issue types
> > > > > * no priorities
> > > > > * issues can only be assigned to one milestone
> > > > > So, you need to work a lot with labels and conventions and
> basically
> > > need
> > > > > bots or actions to manage those. Agreeing on those processes,
> setting
> > > > them
> > > > > up and getting used to them will be a lot of work for the
> community.
> > > > >
> > > > > So, I am also in favor of 1) for now, because I don't really see a
> > good
> > > > > alternative option.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Konstantin
> > > > >
> > > > >
> > > > >
> > > > > Am Mo., 24. Okt. 2022 um 22:27 Uhr schrieb Matthias Pohl
> > > > > :
> > > > >
> > > > > > I agree that leaving everything as is would be the best option. I
> > > also
> > > > > tend
> > > > > > to lean towards option 4 as a fallback for the reasons already
> > > > mentioned.
> > > > > > I'm still not a big fan of the Github issues. But that's probably
> > > only
> > > > > > because I'm used to the look-and-feel and the workflows of Jira.
> I
> > > see
> > > > > > certain benefits of moving to Github, though. We're still having
> > the
> > > > idea
> > > > > > of migrating from AzureCI to GitHub Actions. Moving the issues to
> > > > GitHub
> > > > > as
> > > > > > well might improve the user experience even more. Reducing the
> > number
> > > > of
> > > > > > services a new contributor should be aware of to reach the
> 

[VOTE] Drop TypeSerializerConfigSnapshot and savepoint support from Flink versions < 1.8.0

2022-10-28 Thread Piotr Nowojski
Hi,

As discussed on the dev mailing list [0] I would like to start a vote to
drop support of older savepoint formats (for Flink versions older than
1.8). You can find the original explanation from the aforementioned dev
mailing list thread at the bottom of this message.

Draft PR containing the proposed change you can find here:
https://github.com/apache/flink/pull/21056

Vote will be open at least until Wednesday, November 2nd 18:00 CET.

Best,
Piotrek

[0] https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w

I would like to open a discussion to remove the long deprecated
(@PublicEvolving) TypeSerializerConfigSnapshot class [1] and the related
code.

The motivation behind this move is two fold. One reason is that it
complicates our code base unnecessarily and creates confusion on how to
actually implement custom serializers. The immediate reason is that I
wanted to clean up Flink's configuration stack a bit and refactor the
ExecutionConfig class [2]. This refactor would keep the API compatibility
of the ExecutionConfig, but it would break savepoint compatibility with
snapshots written with some of the old serializers, which had
ExecutionConfig as a field and were serialized in the snapshot. This issue
has been resolved by the introduction of TypeSerializerSnapshot in Flink
1.7 [3], where serializers are no longer part of the snapshot.

TypeSerializerConfigSnapshot has been deprecated and no longer used by
built-in serializers since Flink 1.8 [4] and [5]. Users were encouraged to
migrate to TypeSerializerSnapshot since then with their own custom
serializers. That has been plenty of time for the migration.

This proposal would have the following impact for the users:
1. we would drop support for recovery from savepoints taken with Flink <
1.7.0 for all built in types serializers
2. we would drop support for recovery from savepoints taken with Flink <
1.8.0 for built in kryo serializers
3. we would drop support for recovery from savepoints taken with Flink <
1.17 for custom serializers using deprecated TypeSerializerConfigSnapshot

1. and 2. would have a simple migration path. Users migrating from those
old savepoints would have to first start his job using a Flink version from
the [1.8, 1.16] range, and take a new savepoint that would be compatible
with Flink 1.17.
3. This is a bit more problematic, because users would have to first
migrate their own custom serializers to use TypeSerializerSnapshot (using a
Flink version from the [1.8, 1.16]), take a savepoint, and only then
migrate to Flink 1.17. However users had already 4 years to migrate, which
in my opinion has been plenty of time to do so.

As a side effect, we could also drop support for some of the legacy
metadata serializers from LegacyStateMetaInfoReaders and potentially other
places that we are keeping for the sake of compatibility with old
savepoints.

[1]
https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/common/typeutils/TypeSerializerConfigSnapshot.html
[2] https://issues.apache.org/jira/browse/FLINK-29379
[3] https://issues.apache.org/jira/browse/FLINK-9377
[4] https://issues.apache.org/jira/browse/FLINK-9376
[5] https://issues.apache.org/jira/browse/FLINK-11323


Re: [VOTE] Dedicated AWS externalized connector repo

2022-10-28 Thread Ahmed Hamdy
+1 (non-binding)
Regards,
Ahmed

On Thu, 27 Oct 2022 at 08:38, Teoh, Hong 
wrote:

> +1 (non-binding)
>
> Thanks for driving this, Danny!
>
> Hong
>
> On 26/10/2022, 08:14, "Martijn Visser"  wrote:
>
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender and
> know the content is safe.
>
>
>
> +1 binding
>
> Thanks Danny!
>
> On Wed, Oct 26, 2022 at 8:48 AM Danny Cranmer  >
> wrote:
>
> > Hello all,
> >
> > As discussed in the discussion thread [1], I propose to create a
> dedicated
> > repository for AWS connectors called flink-connector-aws. This will
> house
> > 3x connectors: Amazon Kinesis Data Streams, Amazon Kinesis Data
> Firehose
> > and Amazon DynamoDB and any future AWS connectors. We will also
> externalize
> > the AWS base module from the main Flink repository [2] and create a
> parent
> > pom for version management.
> >
> > All modules within this repository will share the same version, and
> be
> > released/evolved together. We will adhere to the common Flink rules
> [3] for
> > connector development.
> >
> > Motivation: grouping AWS connectors together will reduce the number
> of
> > connector releases, simplify development, dependency management and
> > versioning for users.
> >
> > Voting schema:
> > Consensus, committers have binding votes, open for at least 72 hours.
> >
> > [1] https://lists.apache.org/thread/swp4bs8407gtsgn2gh0k3wx1m4o3kqqp
> > [2]
> >
> >
> https://github.com/apache/flink/tree/master/flink-connectors/flink-connector-aws-base
> > [3]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development
> >
>
>


Re: [DISCUSS] FLINK-20578 Cannot create empty array using ARRAY[]

2022-10-28 Thread eric xiao
I seem to still get the datatype array:
from pyspark.sql import SparkSession
spark = (
SparkSession.builder
.getOrCreate()
)
spark.sql("SELECT ARRAY() + 1")

AnalysisException: cannot resolve '(array() + 1)' due to data type
mismatch: differing types in '(array() + 1)' (array and int).;
line 1 pos 7;
'Project [unresolvedalias((array() + 1), None)]
+- OneRowRelation

Perhaps it has to do with me using PySpark vs Java or Scala Spark?

$ pip3 freeze | grep pyspark
pyspark==3.3.1


Regarding the coalesce query, I should have been a bit more explicit with
my example:
SELECT COALESCE(ARRAY(1), CAST(ARRAY() as ARRAY))

When I was referring to int_column I meant an int array column 😅. It makes
sense why you cannot COALESCE a non array data type with an array data type
- this is also why the SELECT ARRAY() + 1 query fails.

---

> Unfortunately, ARRAY is fully managed by Calcite and maybe deeply
> integrated also into the parser (at least this is the case for ROW).
>

I think that is the problem indeed and people have suggested overriding
some of the functions inherited from the Calcite SqlArrayValueConstructor
class to allow for the creation of empty arrays. Those suggestions worked
as shown in my exploratory PR: https://github.com/apache/flink/pull/21156.

I think this is one change we will need to make in Flink.

So if we wanna go the easy path (without introducing ARRAY_OF), ARRAY()
> should result in ARRAY if the type can not be derived by the
> surrounding call.
>
Are the SQL functions ARRAY_OF and ROW_OF supported in any other SQL
dialect? I did a quick search and came out blank. I am still fairly new to
the community so I don't know what the stance of trying to stay close to
ANSI SQL is - having helper functions is always nice though.

I am inlined that we should let the parser figure out the typing of the
array based on the surrounding call and if it cannot use a default type
such as ARRAY.

On Fri, Oct 28, 2022 at 4:07 AM Timo Walther  wrote:

> Actually, the new type inference stack for UDFs is smart enough to solve
> this issue. It could derive a data type for the array from the
> surrounding call (expected data type).
>
> So this can be supported with the right type inference logic:
> cast(ARRAY() as int)
>
> Unfortunately, ARRAY is fully managed by Calcite and maybe deeply
> integrated also into the parser (at least this is the case for ROW).
> TBH if I were to design a FLIP for the collection functions, I would
> actually propose to introduce `ARRAY_OF()`, `ROW_OF()` to have full
> control over the type inference in our stack. In our stack, this also
> means that NULL is unknown. Calcite distinguished between NULL and unknown.
>
> So if we wanna go the easy path (without introducing ARRAY_OF), ARRAY()
> should result in ARRAY if the type can not be derived by the
> surrounding call.
>
> Regards,
> Timo
>
> On 28.10.22 03:46, yuxia wrote:
> > For an empty array, seems different engine use different data type:
> > Hive: string
> > Spark: string ?
> > Trino:  Unknown
> > BigQuery: Integer
> >
> > I have tried with Hive and Spark, but haven't tried with Trino and
> BigQuery.
> >
> > I'm a little of doubt about the spark's behavior. But from my sides,
> seems Spark actually use string type which is different from your
> investigation.
> > I try with the following sql in spark-cli:
> > `
> > select array() + 1
> > `
> >
> > The exception is
> > `
> > Error in query: cannot resolve '(array() + 1)' due to data type
> mismatch: differing types in '(array() + 1)' (array and int).; line
> 1 pos 7;
> > 'Project [unresolvedalias((array() + 1), None)]
> > +- OneRowRelation
> > `
> >
> >
> > Seems it's hard to decide which data type Flink should use. I'm
> insterested in the reason why you would like to use Integer type.
> > I haven't cheked whether the sql stardard specifies it. But from my
> side, I prefer to follow Hive/Spark.
> >
> > BTW: the query `SELECT COALESCE(1, cast(ARRAY() as int))` will fail in
> Hive and Spark.
> >
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "eric xiao" 
> > 收件人: "dev" 
> > 发送时间: 星期四, 2022年 10 月 27日 下午 9:13:51
> > 主题: [DISCUSS] FLINK-20578 Cannot create empty array using ARRAY[]
> >
> > Hi,
> >
> > I would like to propose a solution to this JIRA issue. I looked at the
> > comments and there was some guidance around where in the code we should
> > update to allow for this behaviour. But I believe there are still two
> > questions that remain open:
> >
> > 1. Is this expected behaviour (i.e. users should not be able to
> create
> > an empty array)?
> > 2. If this is indeed expected behaviour, what should the data type
> be of
> > the empty array?
> >
> > I did some digging into other query engines / databases in hopes of
> > answering the following two questions - That can be found at the end of
> > this thread.
> >
> > *Q: *Is this expected behaviour (i.e. users should not be able to create
> an
> > empty array)?
> > *A: *Yes I would say th

Re: [VOTE] Dedicated AWS externalized connector repo

2022-10-28 Thread Samrat Deb
+1 (non binding)

Thanks for driving Danny

Bests
Samrat

On Fri, 28 Oct 2022 at 8:36 PM, Ahmed Hamdy  wrote:

> +1 (non-binding)
> Regards,
> Ahmed
>
> On Thu, 27 Oct 2022 at 08:38, Teoh, Hong 
> wrote:
>
> > +1 (non-binding)
> >
> > Thanks for driving this, Danny!
> >
> > Hong
> >
> > On 26/10/2022, 08:14, "Martijn Visser" 
> wrote:
> >
> > CAUTION: This email originated from outside of the organization. Do
> > not click links or open attachments unless you can confirm the sender and
> > know the content is safe.
> >
> >
> >
> > +1 binding
> >
> > Thanks Danny!
> >
> > On Wed, Oct 26, 2022 at 8:48 AM Danny Cranmer <
> dannycran...@apache.org
> > >
> > wrote:
> >
> > > Hello all,
> > >
> > > As discussed in the discussion thread [1], I propose to create a
> > dedicated
> > > repository for AWS connectors called flink-connector-aws. This will
> > house
> > > 3x connectors: Amazon Kinesis Data Streams, Amazon Kinesis Data
> > Firehose
> > > and Amazon DynamoDB and any future AWS connectors. We will also
> > externalize
> > > the AWS base module from the main Flink repository [2] and create a
> > parent
> > > pom for version management.
> > >
> > > All modules within this repository will share the same version, and
> > be
> > > released/evolved together. We will adhere to the common Flink rules
> > [3] for
> > > connector development.
> > >
> > > Motivation: grouping AWS connectors together will reduce the number
> > of
> > > connector releases, simplify development, dependency management and
> > > versioning for users.
> > >
> > > Voting schema:
> > > Consensus, committers have binding votes, open for at least 72
> hours.
> > >
> > > [1]
> https://lists.apache.org/thread/swp4bs8407gtsgn2gh0k3wx1m4o3kqqp
> > > [2]
> > >
> > >
> >
> https://github.com/apache/flink/tree/master/flink-connectors/flink-connector-aws-base
> > > [3]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development
> > >
> >
> >
>


Re: [VOTE] Drop TypeSerializerConfigSnapshot and savepoint support from Flink versions < 1.8.0

2022-10-28 Thread Konstantin Knauf
+1 (binding)

Am Fr., 28. Okt. 2022 um 16:58 Uhr schrieb Piotr Nowojski <
pnowoj...@apache.org>:

> Hi,
>
> As discussed on the dev mailing list [0] I would like to start a vote to
> drop support of older savepoint formats (for Flink versions older than
> 1.8). You can find the original explanation from the aforementioned dev
> mailing list thread at the bottom of this message.
>
> Draft PR containing the proposed change you can find here:
> https://github.com/apache/flink/pull/21056
>
> Vote will be open at least until Wednesday, November 2nd 18:00 CET.
>
> Best,
> Piotrek
>
> [0] https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w
>
> I would like to open a discussion to remove the long deprecated
> (@PublicEvolving) TypeSerializerConfigSnapshot class [1] and the related
> code.
>
> The motivation behind this move is two fold. One reason is that it
> complicates our code base unnecessarily and creates confusion on how to
> actually implement custom serializers. The immediate reason is that I
> wanted to clean up Flink's configuration stack a bit and refactor the
> ExecutionConfig class [2]. This refactor would keep the API compatibility
> of the ExecutionConfig, but it would break savepoint compatibility with
> snapshots written with some of the old serializers, which had
> ExecutionConfig as a field and were serialized in the snapshot. This issue
> has been resolved by the introduction of TypeSerializerSnapshot in Flink
> 1.7 [3], where serializers are no longer part of the snapshot.
>
> TypeSerializerConfigSnapshot has been deprecated and no longer used by
> built-in serializers since Flink 1.8 [4] and [5]. Users were encouraged to
> migrate to TypeSerializerSnapshot since then with their own custom
> serializers. That has been plenty of time for the migration.
>
> This proposal would have the following impact for the users:
> 1. we would drop support for recovery from savepoints taken with Flink <
> 1.7.0 for all built in types serializers
> 2. we would drop support for recovery from savepoints taken with Flink <
> 1.8.0 for built in kryo serializers
> 3. we would drop support for recovery from savepoints taken with Flink <
> 1.17 for custom serializers using deprecated TypeSerializerConfigSnapshot
>
> 1. and 2. would have a simple migration path. Users migrating from those
> old savepoints would have to first start his job using a Flink version from
> the [1.8, 1.16] range, and take a new savepoint that would be compatible
> with Flink 1.17.
> 3. This is a bit more problematic, because users would have to first
> migrate their own custom serializers to use TypeSerializerSnapshot (using a
> Flink version from the [1.8, 1.16]), take a savepoint, and only then
> migrate to Flink 1.17. However users had already 4 years to migrate, which
> in my opinion has been plenty of time to do so.
>
> As a side effect, we could also drop support for some of the legacy
> metadata serializers from LegacyStateMetaInfoReaders and potentially other
> places that we are keeping for the sake of compatibility with old
> savepoints.
>
> [1]
>
> https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/common/typeutils/TypeSerializerConfigSnapshot.html
> [2] https://issues.apache.org/jira/browse/FLINK-29379
> [3] https://issues.apache.org/jira/browse/FLINK-9377
> [4] https://issues.apache.org/jira/browse/FLINK-9376
> [5] https://issues.apache.org/jira/browse/FLINK-11323
>


-- 
https://twitter.com/snntrable
https://github.com/knaufk


Re: [VOTE] Drop TypeSerializerConfigSnapshot and savepoint support from Flink versions < 1.8.0

2022-10-28 Thread Tzu-Li (Gordon) Tai
+1

On Fri, Oct 28, 2022 at 10:21 AM Konstantin Knauf  wrote:

> +1 (binding)
>
> Am Fr., 28. Okt. 2022 um 16:58 Uhr schrieb Piotr Nowojski <
> pnowoj...@apache.org>:
>
> > Hi,
> >
> > As discussed on the dev mailing list [0] I would like to start a vote to
> > drop support of older savepoint formats (for Flink versions older than
> > 1.8). You can find the original explanation from the aforementioned dev
> > mailing list thread at the bottom of this message.
> >
> > Draft PR containing the proposed change you can find here:
> > https://github.com/apache/flink/pull/21056
> >
> > Vote will be open at least until Wednesday, November 2nd 18:00 CET.
> >
> > Best,
> > Piotrek
> >
> > [0] https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w
> >
> > I would like to open a discussion to remove the long deprecated
> > (@PublicEvolving) TypeSerializerConfigSnapshot class [1] and the related
> > code.
> >
> > The motivation behind this move is two fold. One reason is that it
> > complicates our code base unnecessarily and creates confusion on how to
> > actually implement custom serializers. The immediate reason is that I
> > wanted to clean up Flink's configuration stack a bit and refactor the
> > ExecutionConfig class [2]. This refactor would keep the API compatibility
> > of the ExecutionConfig, but it would break savepoint compatibility with
> > snapshots written with some of the old serializers, which had
> > ExecutionConfig as a field and were serialized in the snapshot. This
> issue
> > has been resolved by the introduction of TypeSerializerSnapshot in Flink
> > 1.7 [3], where serializers are no longer part of the snapshot.
> >
> > TypeSerializerConfigSnapshot has been deprecated and no longer used by
> > built-in serializers since Flink 1.8 [4] and [5]. Users were encouraged
> to
> > migrate to TypeSerializerSnapshot since then with their own custom
> > serializers. That has been plenty of time for the migration.
> >
> > This proposal would have the following impact for the users:
> > 1. we would drop support for recovery from savepoints taken with Flink <
> > 1.7.0 for all built in types serializers
> > 2. we would drop support for recovery from savepoints taken with Flink <
> > 1.8.0 for built in kryo serializers
> > 3. we would drop support for recovery from savepoints taken with Flink <
> > 1.17 for custom serializers using deprecated TypeSerializerConfigSnapshot
> >
> > 1. and 2. would have a simple migration path. Users migrating from those
> > old savepoints would have to first start his job using a Flink version
> from
> > the [1.8, 1.16] range, and take a new savepoint that would be compatible
> > with Flink 1.17.
> > 3. This is a bit more problematic, because users would have to first
> > migrate their own custom serializers to use TypeSerializerSnapshot
> (using a
> > Flink version from the [1.8, 1.16]), take a savepoint, and only then
> > migrate to Flink 1.17. However users had already 4 years to migrate,
> which
> > in my opinion has been plenty of time to do so.
> >
> > As a side effect, we could also drop support for some of the legacy
> > metadata serializers from LegacyStateMetaInfoReaders and potentially
> other
> > places that we are keeping for the sake of compatibility with old
> > savepoints.
> >
> > [1]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/common/typeutils/TypeSerializerConfigSnapshot.html
> > [2] https://issues.apache.org/jira/browse/FLINK-29379
> > [3] https://issues.apache.org/jira/browse/FLINK-9377
> > [4] https://issues.apache.org/jira/browse/FLINK-9376
> > [5] https://issues.apache.org/jira/browse/FLINK-11323
> >
>
>
> --
> https://twitter.com/snntrable
> https://github.com/knaufk
>


Re: [VOTE] Dedicated AWS externalized connector repo

2022-10-28 Thread Jing Ge
+1 (non-binding)

Thanks!

Best Regards,
Jing

On Fri, Oct 28, 2022 at 5:29 PM Samrat Deb  wrote:

> +1 (non binding)
>
> Thanks for driving Danny
>
> Bests
> Samrat
>
> On Fri, 28 Oct 2022 at 8:36 PM, Ahmed Hamdy  wrote:
>
> > +1 (non-binding)
> > Regards,
> > Ahmed
> >
> > On Thu, 27 Oct 2022 at 08:38, Teoh, Hong 
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > Thanks for driving this, Danny!
> > >
> > > Hong
> > >
> > > On 26/10/2022, 08:14, "Martijn Visser" 
> > wrote:
> > >
> > > CAUTION: This email originated from outside of the organization. Do
> > > not click links or open attachments unless you can confirm the sender
> and
> > > know the content is safe.
> > >
> > >
> > >
> > > +1 binding
> > >
> > > Thanks Danny!
> > >
> > > On Wed, Oct 26, 2022 at 8:48 AM Danny Cranmer <
> > dannycran...@apache.org
> > > >
> > > wrote:
> > >
> > > > Hello all,
> > > >
> > > > As discussed in the discussion thread [1], I propose to create a
> > > dedicated
> > > > repository for AWS connectors called flink-connector-aws. This
> will
> > > house
> > > > 3x connectors: Amazon Kinesis Data Streams, Amazon Kinesis Data
> > > Firehose
> > > > and Amazon DynamoDB and any future AWS connectors. We will also
> > > externalize
> > > > the AWS base module from the main Flink repository [2] and
> create a
> > > parent
> > > > pom for version management.
> > > >
> > > > All modules within this repository will share the same version,
> and
> > > be
> > > > released/evolved together. We will adhere to the common Flink
> rules
> > > [3] for
> > > > connector development.
> > > >
> > > > Motivation: grouping AWS connectors together will reduce the
> number
> > > of
> > > > connector releases, simplify development, dependency management
> and
> > > > versioning for users.
> > > >
> > > > Voting schema:
> > > > Consensus, committers have binding votes, open for at least 72
> > hours.
> > > >
> > > > [1]
> > https://lists.apache.org/thread/swp4bs8407gtsgn2gh0k3wx1m4o3kqqp
> > > > [2]
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/tree/master/flink-connectors/flink-connector-aws-base
> > > > [3]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development
> > > >
> > >
> > >
> >
>


Would like to fix "FLINK-27246 Code of method "processElement...grows beyond 64 KB"

2022-10-28 Thread Krzysztof Chmielewski
Hi community,
I would like to work on fixing FLINK-27246 [1].
I verified that it still happens on current master branch.

I also did an initial investigation and I believe I've found what seems to
be a cause of this problem. I have added comment to the ticket [2] where
I've asked couple of question.

I would appreciate for comment if my analysis is correct.

Thanks,
Krzysztof Chielewski

[1] https://issues.apache.org/jira/browse/FLINK-27246
[2]
https://issues.apache.org/jira/browse/FLINK-27246?focusedCommentId=17625871&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17625871


[jira] [Created] (FLINK-29794) Flink Operator Need to Include Cause for Exceptions If Possible

2022-10-28 Thread Zhou Jiang (Jira)
Zhou Jiang created FLINK-29794:
--

 Summary: Flink Operator Need to Include Cause for Exceptions If 
Possible
 Key: FLINK-29794
 URL: https://issues.apache.org/jira/browse/FLINK-29794
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.3.0
Reporter: Zhou Jiang


For easier debugging and consistent stack trace, it would be appreciated if 
Exceptions are always thrown with cause (if exists), instead of persisting 
string message & silently swallow the cause. 

This issue currently present in 
apache.flink.kubernetes.operator.reconciler.diff.ReflectiveDiffBuilder and 
apache.flink.kubernetes.operator.utils.EnvUtils

 

We would like to suggest this as a pattern for future exception handling as 
well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLINK-20578 Cannot create empty array using ARRAY[]

2022-10-28 Thread eric xiao
Upon further investigation of the Flink docs of array types, I noticed in
the data type docs [1] it has the following syntax for declaring an array *with
a data type*:

> ARRAY
> t ARRAY
>


The type can be declared using ARRAY where t is the data type of the
> contained elements.
> t ARRAY is a synonym for being closer to the SQL standard. For example, INT
> ARRAY is equivalent to ARRAY.


But when I try to run:
SELECT ARRAY [1]
or
SELECT INT ARRAY [1]

In Flink SQL I get this error:
org.apache.flink.table.api.SqlParserException: SQL parse failed. Incorrect
syntax near the keyword 'INT' at line 1, column 8.
Was expecting one of:
"ABS" ...
"ALL" ...
"ARRAY" ...
"AVG" ...
"CARDINALITY" ...

Am I missing something?

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/types/#array



On Fri, Oct 28, 2022 at 11:27 AM eric xiao  wrote:

> I seem to still get the datatype array:
> from pyspark.sql import SparkSession
> spark = (
> SparkSession.builder
> .getOrCreate()
> )
> spark.sql("SELECT ARRAY() + 1")
>
> AnalysisException: cannot resolve '(array() + 1)' due to data type mismatch: 
> differing types in '(array() + 1)' (array and int).; line 1 pos 7;
> 'Project [unresolvedalias((array() + 1), None)]
> +- OneRowRelation
>
> Perhaps it has to do with me using PySpark vs Java or Scala Spark?
>
> $ pip3 freeze | grep pyspark
> pyspark==3.3.1
>
>
> Regarding the coalesce query, I should have been a bit more explicit with
> my example:
> SELECT COALESCE(ARRAY(1), CAST(ARRAY() as ARRAY))
>
> When I was referring to int_column I meant an int array column 😅. It
> makes sense why you cannot COALESCE a non array data type with an array
> data type - this is also why the SELECT ARRAY() + 1 query fails.
>
> ---
>
>> Unfortunately, ARRAY is fully managed by Calcite and maybe deeply
>> integrated also into the parser (at least this is the case for ROW).
>>
>
> I think that is the problem indeed and people have suggested overriding
> some of the functions inherited from the Calcite SqlArrayValueConstructor
> class to allow for the creation of empty arrays. Those suggestions worked
> as shown in my exploratory PR: https://github.com/apache/flink/pull/21156.
>
> I think this is one change we will need to make in Flink.
>
> So if we wanna go the easy path (without introducing ARRAY_OF), ARRAY()
>> should result in ARRAY if the type can not be derived by the
>> surrounding call.
>>
> Are the SQL functions ARRAY_OF and ROW_OF supported in any other SQL
> dialect? I did a quick search and came out blank. I am still fairly new to
> the community so I don't know what the stance of trying to stay close to
> ANSI SQL is - having helper functions is always nice though.
>
> I am inlined that we should let the parser figure out the typing of the
> array based on the surrounding call and if it cannot use a default type
> such as ARRAY.
>
> On Fri, Oct 28, 2022 at 4:07 AM Timo Walther  wrote:
>
>> Actually, the new type inference stack for UDFs is smart enough to solve
>> this issue. It could derive a data type for the array from the
>> surrounding call (expected data type).
>>
>> So this can be supported with the right type inference logic:
>> cast(ARRAY() as int)
>>
>> Unfortunately, ARRAY is fully managed by Calcite and maybe deeply
>> integrated also into the parser (at least this is the case for ROW).
>> TBH if I were to design a FLIP for the collection functions, I would
>> actually propose to introduce `ARRAY_OF()`, `ROW_OF()` to have full
>> control over the type inference in our stack. In our stack, this also
>> means that NULL is unknown. Calcite distinguished between NULL and
>> unknown.
>>
>> So if we wanna go the easy path (without introducing ARRAY_OF), ARRAY()
>> should result in ARRAY if the type can not be derived by the
>> surrounding call.
>>
>> Regards,
>> Timo
>>
>> On 28.10.22 03:46, yuxia wrote:
>> > For an empty array, seems different engine use different data type:
>> > Hive: string
>> > Spark: string ?
>> > Trino:  Unknown
>> > BigQuery: Integer
>> >
>> > I have tried with Hive and Spark, but haven't tried with Trino and
>> BigQuery.
>> >
>> > I'm a little of doubt about the spark's behavior. But from my sides,
>> seems Spark actually use string type which is different from your
>> investigation.
>> > I try with the following sql in spark-cli:
>> > `
>> > select array() + 1
>> > `
>> >
>> > The exception is
>> > `
>> > Error in query: cannot resolve '(array() + 1)' due to data type
>> mismatch: differing types in '(array() + 1)' (array and int).; line
>> 1 pos 7;
>> > 'Project [unresolvedalias((array() + 1), None)]
>> > +- OneRowRelation
>> > `
>> >
>> >
>> > Seems it's hard to decide which data type Flink should use. I'm
>> insterested in the reason why you would like to use Integer type.
>> > I haven't cheked whether the sql stardard specifies it. But from my
>> side, I prefer to follow Hive/Spark.
>> >
>> > BTW: the query `SELECT COALESCE(1, cast(ARRAY() as int))`

Re: [DISCUSS] FLINK-20578 Cannot create empty array using ARRAY[]

2022-10-28 Thread yuxia
`ARRAY`/ `INT ARRAY` is for declaring data type in DDL like `create table 
t1(a INT ARRAY)`.

Best regards,
Yuxia

- 原始邮件 -
发件人: "eric xiao" 
收件人: "dev" 
发送时间: 星期六, 2022年 10 月 29日 上午 7:50:22
主题: Re: [DISCUSS] FLINK-20578 Cannot create empty array using ARRAY[]

Upon further investigation of the Flink docs of array types, I noticed in
the data type docs [1] it has the following syntax for declaring an array *with
a data type*:

> ARRAY
> t ARRAY
>


The type can be declared using ARRAY where t is the data type of the
> contained elements.
> t ARRAY is a synonym for being closer to the SQL standard. For example, INT
> ARRAY is equivalent to ARRAY.


But when I try to run:
SELECT ARRAY [1]
or
SELECT INT ARRAY [1]

In Flink SQL I get this error:
org.apache.flink.table.api.SqlParserException: SQL parse failed. Incorrect
syntax near the keyword 'INT' at line 1, column 8.
Was expecting one of:
"ABS" ...
"ALL" ...
"ARRAY" ...
"AVG" ...
"CARDINALITY" ...

Am I missing something?

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/types/#array



On Fri, Oct 28, 2022 at 11:27 AM eric xiao  wrote:

> I seem to still get the datatype array:
> from pyspark.sql import SparkSession
> spark = (
> SparkSession.builder
> .getOrCreate()
> )
> spark.sql("SELECT ARRAY() + 1")
>
> AnalysisException: cannot resolve '(array() + 1)' due to data type mismatch: 
> differing types in '(array() + 1)' (array and int).; line 1 pos 7;
> 'Project [unresolvedalias((array() + 1), None)]
> +- OneRowRelation
>
> Perhaps it has to do with me using PySpark vs Java or Scala Spark?
>
> $ pip3 freeze | grep pyspark
> pyspark==3.3.1
>
>
> Regarding the coalesce query, I should have been a bit more explicit with
> my example:
> SELECT COALESCE(ARRAY(1), CAST(ARRAY() as ARRAY))
>
> When I was referring to int_column I meant an int array column 😅. It
> makes sense why you cannot COALESCE a non array data type with an array
> data type - this is also why the SELECT ARRAY() + 1 query fails.
>
> ---
>
>> Unfortunately, ARRAY is fully managed by Calcite and maybe deeply
>> integrated also into the parser (at least this is the case for ROW).
>>
>
> I think that is the problem indeed and people have suggested overriding
> some of the functions inherited from the Calcite SqlArrayValueConstructor
> class to allow for the creation of empty arrays. Those suggestions worked
> as shown in my exploratory PR: https://github.com/apache/flink/pull/21156.
>
> I think this is one change we will need to make in Flink.
>
> So if we wanna go the easy path (without introducing ARRAY_OF), ARRAY()
>> should result in ARRAY if the type can not be derived by the
>> surrounding call.
>>
> Are the SQL functions ARRAY_OF and ROW_OF supported in any other SQL
> dialect? I did a quick search and came out blank. I am still fairly new to
> the community so I don't know what the stance of trying to stay close to
> ANSI SQL is - having helper functions is always nice though.
>
> I am inlined that we should let the parser figure out the typing of the
> array based on the surrounding call and if it cannot use a default type
> such as ARRAY.
>
> On Fri, Oct 28, 2022 at 4:07 AM Timo Walther  wrote:
>
>> Actually, the new type inference stack for UDFs is smart enough to solve
>> this issue. It could derive a data type for the array from the
>> surrounding call (expected data type).
>>
>> So this can be supported with the right type inference logic:
>> cast(ARRAY() as int)
>>
>> Unfortunately, ARRAY is fully managed by Calcite and maybe deeply
>> integrated also into the parser (at least this is the case for ROW).
>> TBH if I were to design a FLIP for the collection functions, I would
>> actually propose to introduce `ARRAY_OF()`, `ROW_OF()` to have full
>> control over the type inference in our stack. In our stack, this also
>> means that NULL is unknown. Calcite distinguished between NULL and
>> unknown.
>>
>> So if we wanna go the easy path (without introducing ARRAY_OF), ARRAY()
>> should result in ARRAY if the type can not be derived by the
>> surrounding call.
>>
>> Regards,
>> Timo
>>
>> On 28.10.22 03:46, yuxia wrote:
>> > For an empty array, seems different engine use different data type:
>> > Hive: string
>> > Spark: string ?
>> > Trino:  Unknown
>> > BigQuery: Integer
>> >
>> > I have tried with Hive and Spark, but haven't tried with Trino and
>> BigQuery.
>> >
>> > I'm a little of doubt about the spark's behavior. But from my sides,
>> seems Spark actually use string type which is different from your
>> investigation.
>> > I try with the following sql in spark-cli:
>> > `
>> > select array() + 1
>> > `
>> >
>> > The exception is
>> > `
>> > Error in query: cannot resolve '(array() + 1)' due to data type
>> mismatch: differing types in '(array() + 1)' (array and int).; line
>> 1 pos 7;
>> > 'Project [unresolvedalias((array() + 1), None)]
>> > +- OneRowRelation
>> > `
>> >
>> >
>> > Seems it's hard to decide which data type F