Re: [VOTE] FLIP-241: Completed Jobs Information Enhancement

2022-06-22 Thread Yangze Guo
+1 (binding)

Best,
Yangze Guo

On Thu, Jun 23, 2022 at 1:07 PM junhan yang  wrote:
>
> Hi everyone,
>
> Thanks for the feedbacks on the discussion thread[1]. I would like to start
> a vote thread here for FLIP-241: Completed Jobs Information Enhancement[2].
>
> The vote will last for at least 72 hours unless there is an objection, I
> will try to close it by *next Tuesday* if we receive sufficient votes until
> then.
>
> Thank you again for your participation in this FLIP discussion.
>
> [1] https://lists.apache.org/thread/qycqmxbh37b5qzs72y110rp8457kkxkb
> [2] https://cwiki.apache.org/confluence/x/dRD1D
>
> Best regards,
> Junhan


[jira] [Created] (FLINK-28215) Bump Maven Surefire plugin to 3.0.0-M7

2022-06-22 Thread Martijn Visser (Jira)
Martijn Visser created FLINK-28215:
--

 Summary: Bump Maven Surefire plugin to 3.0.0-M7
 Key: FLINK-28215
 URL: https://issues.apache.org/jira/browse/FLINK-28215
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System
Reporter: Martijn Visser
Assignee: Martijn Visser






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [VOTE] FLIP-221: Abstraction for lookup source and metric

2022-06-22 Thread Jark Wu
+1 (binding)

Best,
Jark

On Thu, 23 Jun 2022 at 12:49, Qingsheng Ren  wrote:

> Hi devs,
>
> I’d like to start a vote thread for FLIP-221: Abstraction for lookup
> source and metric. You can find the discussion thread in [2]*.
>
> The vote will be open for at least 72 hours unless there is an objection
> or not enough binding votes.
>
> Thanks everyone participating in the discussion!
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
> [2] https://lists.apache.org/thread/9c0fbgkofkbfdr5hvs62m0cxd2bkgwho
> [*] The link to the discussion thread might not include all emails. Please
> search the Apache email archive with keyword "FLIP-221" to get all
> discussion histories.
>
> Best regards,
> Qingsheng


[jira] [Created] (FLINK-28214) ArrayDataSerializer can not be reused to copy customized type of array data

2022-06-22 Thread Yi Tang (Jira)
Yi Tang created FLINK-28214:
---

 Summary: ArrayDataSerializer can not be reused to copy customized 
type of array data 
 Key: FLINK-28214
 URL: https://issues.apache.org/jira/browse/FLINK-28214
 Project: Flink
  Issue Type: Improvement
Reporter: Yi Tang


In FLINK-25238, we fix the ArrayDataSerializer to support copying customized 
type of array data with similar way in MapDataSerializer.

The MapDataSerializer#toBinaryMap always contains copy semantics implicitly
but ArrayDataSerializer#toBinaryArray not.
So the returned value of ArrayDataSerializer#toBinaryArray will be covered by 
new copied data.

We should always copy from the returned value of 
ArrayDataSerializer#toBinaryArray in ArrayDataSerializer#copy explicitly.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[VOTE] FLIP-241: Completed Jobs Information Enhancement

2022-06-22 Thread junhan yang
Hi everyone,

Thanks for the feedbacks on the discussion thread[1]. I would like to start
a vote thread here for FLIP-241: Completed Jobs Information Enhancement[2].

The vote will last for at least 72 hours unless there is an objection, I
will try to close it by *next Tuesday* if we receive sufficient votes until
then.

Thank you again for your participation in this FLIP discussion.

[1] https://lists.apache.org/thread/qycqmxbh37b5qzs72y110rp8457kkxkb
[2] https://cwiki.apache.org/confluence/x/dRD1D

Best regards,
Junhan


Re: [DISCUSS] FLIP-241: Completed Jobs Information Enhancement

2022-06-22 Thread junhan yang
Hi all,

Thank you all for your feedbacks. As far as I can see, it looks like the
discussion on this FLIP has been converged.

I will start a new vote thread now.

Best regards,
Junhan

Yangze Guo  于2022年6月17日周五 14:05写道:

> Thanks for the input, Jiangang.
>
> I think it's a valid demand to distinguish completed jobs with the same
> name.
> - If they are different jobs, I think users need to give them
> different meaningful names respectively.
> - If they are exactly the same job, IIUC, what you need is to figure
> out the order. ApplicationId in Yarn might help. But in this case, you
> can just sort them with the start time.
>
> Best,
> Yangze Guo
>
> On Fri, Jun 17, 2022 at 12:13 PM Jiangang Liu 
> wrote:
> >
> > Thanks for the FLIP. It is helpful to track detail infos for completed
> jobs.
> >
> > I want to ask another question. In our environment, sometimes it is hard
> to
> > distinguish jobs since the same job names may appear multi times in the
> > completed jobs. Because a job may run multi times or different jobs have
> > the same job names. I wonder that wether we can enhance the complete jobs
> > display with more information, such as applicationId and application name
> > in yarn. Maybe it is different in k8s to identify a job.
> >
> > Best
> > Jiangang Liu
> >
> > Yangze Guo  于2022年6月17日周五 11:40写道:
> >
> > > Thanks for the feedback, Aitozi and Jing.
> > >
> > > > Are each attempts of the TaskManager or JobManager pods (if failure
> > > occurs)
> > > all be shown in the ui?
> > >
> > > The info of the prior execution attempts will be archived, you could
> > > refer to `ArchivedExecutionVertex$priorExecutions`.
> > >
> > > > It seems that most of these metrics are more interesting to batch
> jobs.
> > > Does it make sense to calculate them for pure streaming jobs too?
> > >
> > > All the proposed metrics will be calculated no matter what the job
> type is.
> > >
> > > > Why "duration is less interesting" which is mentioned in the FLIP?
> > >
> > > As a first step, we mainly focus on the most interesting status during
> > > the job lifecycle. The duration of final states like FINISHED and
> > > CANCELED is meaningless, while abnormal conditions like CANCELING will
> > > not be included at the moment.
> > >
> > > > Could you share your thoughts on "accumulated-busy-time"? It should
> > > describe the time while the task is working as expected, i.e. the happy
> > > path. When do we need it for analytics or diagnosis?
> > >
> > > A task could be busy or idle while it is working. Users may adjust the
> > > parallelism or the partition key according to the ratio between them.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Fri, Jun 17, 2022 at 5:08 AM Jing Ge  wrote:
> > > >
> > > > Hi Junhan
> > > >
> > > > These are must-to-have information for batch processing. Thanks for
> > > > bringing it up.
> > > >
> > > > I have some comments:
> > > >
> > > > 1. It seems that most of these metrics are more interesting to batch
> > > jobs.
> > > > Does it make sense to calculate them for pure streaming jobs too?
> > > > 2. Why "duration is less interesting" which is mentioned in the FLIP?
> > > > 3. Could you share your thoughts on "accumulated-busy-time"? It
> should
> > > > describe the time while the task is working as expected, i.e. the
> happy
> > > > path. When do we need it for analytics or diagnosis?
> > > >
> > > > BTW, you might want to optimize the format of the FLIP. Some text is
> > > > running out of the right border of the wiki page.
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > > On Thu, Jun 16, 2022 at 4:40 PM Aitozi  wrote:
> > > >
> > > > > Thanks Junhan for driving this. It a great improvement for the
> batch
> > > jobs.
> > > > > I'm looking forward to this feature in our internal use case. +1
> for
> > > it.
> > > > >
> > > > > One more question:
> > > > >
> > > > > Are each attempts of the TaskManager or JobManager pods (if failure
> > > occurs)
> > > > > all be shown in the ui ?
> > > > >
> > > > > Best,
> > > > > Aitozi.
> > > > >
> > > > > Yang Wang  于2022年6月16日周四 19:10写道:
> > > > >
> > > > > > Thanks Xintong for the explanation.
> > > > > >
> > > > > > It makes sense to leave the discussion about job result store in
> a
> > > > > > dedicated thread.
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Yang
> > > > > >
> > > > > > Xintong Song  于2022年6月16日周四 13:40写道:
> > > > > >
> > > > > > > My impression of JobResultStore is more about fault tolerance
> and
> > > high
> > > > > > > availability. Using it for providing information to users
> sounds
> > > worth
> > > > > > > exploring. We probably need more time to think it through.
> > > > > > >
> > > > > > > Given that it doesn't conflict with what we have proposed in
> this
> > > FLIP,
> > > > > > I'd
> > > > > > > suggest considering it as a separate thread and exclude it
> from the
> > > > > scope
> > > > > > > of this one.
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Xintong
> > > > > > >
> > > > > 

[VOTE] FLIP-221: Abstraction for lookup source and metric

2022-06-22 Thread Qingsheng Ren
Hi devs,

I’d like to start a vote thread for FLIP-221: Abstraction for lookup source and 
metric. You can find the discussion thread in [2]*.

The vote will be open for at least 72 hours unless there is an objection or not 
enough binding votes.

Thanks everyone participating in the discussion!

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
[2] https://lists.apache.org/thread/9c0fbgkofkbfdr5hvs62m0cxd2bkgwho
[*] The link to the discussion thread might not include all emails. Please 
search the Apache email archive with keyword "FLIP-221" to get all discussion 
histories. 

Best regards, 
Qingsheng

Re: [DISCUSS] Maintain a Calcite repository for Flink to accelerate the development for Flink SQL features

2022-06-22 Thread Jing Zhang
Hi Martijin,
This is really exciting news.
Thanks a lot for the effort to improve collaboration and communication with
the Calcite community.

> My take away from the discussion in the Flink community and the discussion
in the Calcite community is that I believe we should do 3 things.

Agreed on these 3 points.
About keeping up with the Calcite updates, I would like to take this issue.
Is it too late to schedule the 1.16 version? How about scheduling this work
on version 1.17?

Best,
Jing Zhang


Martijn Visser  于2022年6月23日周四 02:01写道:

> Hi everyone,
>
> I've recently reached out to the Calcite community to see if we could
> somehow get something done with regards to the PR that Jing Zhang had
> opened a long time ago. In that thread, I also mentioned that we had a
> discussion in the Flink community on potentially forking Calcite. I would
> recommend reading up on the thread [1]. Specifically the replies from other
> projects/PMCs (Apache Drill, Apache Dremio) are super interesting. These
> projects have forked Calcite in the past, regret that move, have reverted
> back to Calcite / are in the process of reverting and are elaborating on
> that. This thread also gained some traction on Twitter in case you're
> interested in more opinions. [3]
>
> My take away from the discussion in the Flink community and the discussion
> in the Calcite community is that I believe we should do 3 things:
>
> 1. We should not fork Calcite. There might be short term benefits but long
> term pain. I think we already are suffering from enough long term pain in
> the Flink codebase that we shouldn't take a step that will increase that
> pain even more, scattered over multiple places.
> 2. I think we should try to help out the Calcite community more. Not only
> by opening new PRs for new features, but we can also help by reviewing
> those PRs, reviewing other PRs that could be relevant for Flink or propose
> improvements given our experience at Flink. As you can see in the Calcite
> thread, Timo has already expressed desire in doing so. Part of the OSS
> community is also about helping each other; if we improve Calcite, we will
> also improve Flink.
> 3. I think we need to prioritise keeping up with the Calcite updates. They
> are currently working on releasing version 1.31, while Flink is still at
> 1.26.0. We don't necessarily need to stay in sync with the latest available
> version, but I definitely think we should be at most 2 versions (and
> preferably 1 version) behind (so currently that would be 1.28 and 1.29
> soonish). Not only are we increasing our own tech debt by not updating, we
> are also limiting ourselves in adding new features in the Table/SQL space.
> As you can also see for the 1.26 release notes, there's a warning to only
> use 1.26 for development since it can corrupt your data [3]. There are
> already multiple upgrade tickets for Calcite [4] [5] [6].
>
> [1] https://lists.apache.org/thread/3lkfhwjpqwy9pfhnvwmfkwmwlfyqs45z
> [2]
>
> https://twitter.com/gunnarmorling/status/1539499415337111553?s=21=8fGk3PxScOx4FJPJWE5UeA
> [3] https://calcite.apache.org/news/2020/10/06/release-1.26.0/
> [4] https://issues.apache.org/jira/browse/FLINK-20873
> [5] https://issues.apache.org/jira/browse/FLINK-21239
> [6] https://issues.apache.org/jira/browse/FLINK-27998
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
>
> Op do 5 mei 2022 om 10:34 schreef godfrey he :
>
> > Hi, Timo & Martijn,
> >
> > Sorry for the late reply, thanks for the feedback.
> >
> > I strongly agree that the best solution would be to cooperate more
> > with the Calcite community
> > and maintain all new features and bug fixes in the Calcite community,
> > without any forking.
> > It is a long-term process. I think it's difficult to change community
> > rules, because the Calcite
> > project is a neutral lib that serves multiple projects simultaneously.
> > I don't think fork calcite is the perfect solution, but rather a
> > better balance within limited resources:
> > it's possible to introduce some necessary minor features and bug fixes
> > without having to
> > upgrade to the latest version.
> >
> >
> > I investigate other projects that use Calcite[1] and find that most of
> > them do not use
> > the latest version of the Calcite. Even for the Kylin community, the
> > version, based on
> > Calcite-1.16.0 has been updated to 70[2]. (Similar projects are quark and
> > drill)
> > My guess is that these projects choosed a stable version,
> > (or even choose to maintain a fork project), to maintain the stability.
> > When Flink does not need to introduce new syntax anymore,
> > I guess it's less expensive and more manageable to maintain a fork
> Calcite.
> >
> >
> > Even if we don't end up going the fork calcite route,
> > I hope that we could discuss the options for subsequent calcite upgrades
> > here.
> > Just like Timo mentioned, how to balance feature development and code
> > maintenance.
> 

[jira] [Created] (FLINK-28213) StreamExecutionEnvironment configure method support override pipeline.jars option

2022-06-22 Thread dalongliu (Jira)
dalongliu created FLINK-28213:
-

 Summary: StreamExecutionEnvironment configure method support 
override pipeline.jars option
 Key: FLINK-28213
 URL: https://issues.apache.org/jira/browse/FLINK-28213
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Configuration
Affects Versions: 1.16.0
Reporter: dalongliu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28212) IndexOutOfBoundsException is thrown when project contains window which dosen't refer all fields of input when using Hive dialect

2022-06-22 Thread luoyuxia (Jira)
luoyuxia created FLINK-28212:


 Summary: IndexOutOfBoundsException is thrown when project contains 
window which dosen't refer all fields of input when using Hive dialect
 Key: FLINK-28212
 URL: https://issues.apache.org/jira/browse/FLINK-28212
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Reporter: luoyuxia
 Fix For: 1.16.0


Can be reproduced by following sql
{code:java}
CREATE TABLE alltypesorc(
ctinyint TINYINT,
csmallint SMALLINT,
cint INT,
cbigint BIGINT,
cfloat FLOAT,
cdouble DOUBLE,
cstring1 STRING,
cstring2 STRING,
ctimestamp1 TIMESTAMP,
ctimestamp2 TIMESTAMP,
cboolean1 BOOLEAN,
cboolean2 BOOLEAN);

select a.ctinyint, a.cint, count(a.cdouble)
  over(partition by a.ctinyint order by a.cint desc
rows between 1 preceding and 1 following)
from alltypesorc {code}
Then it will throw Caused by: java.lang.IndexOutOfBoundsException: index (7) 
must be less than size (1)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28211) Rename Schema to TableSchema

2022-06-22 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-28211:


 Summary: Rename Schema to TableSchema
 Key: FLINK-28211
 URL: https://issues.apache.org/jira/browse/FLINK-28211
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: table-store-0.2.0


There are some systems that use schema as a concept of database, so the Schema 
class will be very confuse in this case, it is better to rename it as 
TableSchema.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28210) FlinkSessionJob fails after FlinkDeployment is updated

2022-06-22 Thread Daniel Crowe (Jira)
Daniel Crowe created FLINK-28210:


 Summary: FlinkSessionJob fails after FlinkDeployment is updated
 Key: FLINK-28210
 URL: https://issues.apache.org/jira/browse/FLINK-28210
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.0.0
 Environment: The [quick 
start|https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/try-flink-kubernetes-operator/quick-start/]
 was followed to install minikube and the flink operator. 

 

minikube 1.24.1

kubectl 1.24.2

flink operator: 1.0.0
Reporter: Daniel Crowe


I created a flink deployment using this example:
{code}
curl 
https://raw.githubusercontent.com/apache/flink-kubernetes-operator/main/examples/basic-session-job.yaml
 -o basic-session-job.yaml 

kubectl create -f basic-session-job.yaml 
{code}

Then, I modified the memory allocated to the jobManager and applied the change
{code}
kubectl apply -f basic-session-job.yaml 
{code}

The job manager is restarted to apply the change, but the jobs are not. 

Looking at the operator logs, it appears that something is failing during job 
status observation:

{noformat}
2022-06-23 03:29:51,189 o.a.f.k.o.c.FlinkSessionJobController [INFO 
][default/basic-session-job-example2] Starting reconciliation
2022-06-23 03:29:51,190 o.a.f.k.o.o.JobStatusObserver  [INFO 
][default/basic-session-job-example2] Observing job status
2022-06-23 03:29:51,205 o.a.f.k.o.c.FlinkSessionJobController [INFO 
][default/basic-session-job-example] Starting reconciliation
2022-06-23 03:29:51,206 o.a.f.k.o.o.JobStatusObserver  [INFO 
][default/basic-session-job-example] Observing job status
2022-06-23 03:29:51,208 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][default/basic-session-cluster] Starting reconciliation
2022-06-23 03:29:51,227 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][default/basic-session-cluster] End of reconciliation
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[DISCUSS] Releasing Flink ML 2.1.0

2022-06-22 Thread Zhipeng Zhang
Hi devs,

Yun and I would like to start a discussion for releasing Flink ML
 2.1.0.

In the past few months, we focused on improving the infra (e.g. memory
management, benchmark infra, online training, python support) of Flink ML
by implementing, benchmarking, and optimizing 9 new algorithms in Flink ML.
Our results have shown that Flink ML is able to meet or exceed the
performance of selected algorithms in alternative popular ML libraries.

Please see below for a detailed list of improvements:

- A set of representative machine learning algorithms:
- feature engineering
- MinMaxScaler (https://issues.apache.org/jira/browse/FLINK-25552)
- StringIndexer (https://issues.apache.org/jira/browse/FLINK-25527)
- VectorAssembler (https://issues.apache.org/jira/browse/FLINK-25616
)
- StandardScaler (https://issues.apache.org/jira/browse/FLINK-26626)
- Bucketizer (https://issues.apache.org/jira/browse/FLINK-27072)
- online learning:
- OnlineKmeans (https://issues.apache.org/jira/browse/FLINK-26313)
- OnlineLogisiticRegression (
https://issues.apache.org/jira/browse/FLINK-27170)
- regression:
- LinearRegression (
https://issues.apache.org/jira/browse/FLINK-27093)
- classification:
- LinearSVC (https://issues.apache.org/jira/browse/FLINK-27091)
- Evaluation:
- BinaryClassificationEvaluator (
https://issues.apache.org/jira/browse/FLINK-27294)
- A benchmark framework for Flink ML. (
https://issues.apache.org/jira/browse/FLINK-26443)
- A website for Flink ML users (
https://nightlies.apache.org/flink/flink-ml-docs-stable/)
- Python support for Flink ML algorithms (
https://issues.apache.org/jira/browse/FLINK-26268,
https://issues.apache.org/jira/browse/FLINK-26269)
- Several optimizations for FlinkML infrastructure (
https://issues.apache.org/jira/browse/FLINK-27096,
https://issues.apache.org/jira/browse/FLINK-27877)

With the improvements and throughput benchmarks we have made, we think it
is time to release Flink ML 2.1.0, so that interested developers in the
community can try out the new Flink ML infra to develop algorithms with
high throughput and low latency.

If there is any concern, please let us know.


Best,
Yun and Zhipeng


[jira] [Created] (FLINK-28209) KafkaSink with EXACTLY_ONCE produce reduplicate data(flink kafka connector1.14.4)

2022-06-22 Thread tanyao (Jira)
tanyao created FLINK-28209:
--

 Summary: KafkaSink with EXACTLY_ONCE  produce reduplicate 
data(flink kafka connector1.14.4)
 Key: FLINK-28209
 URL: https://issues.apache.org/jira/browse/FLINK-28209
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.14.4
Reporter: tanyao
 Attachments: image-2022-06-23-10-49-01-213.png, 
image-2022-06-23-10-58-15-141.png

I'm trying to read mysql binlog and transport it to kafka;

here is what i'm using :

*Flink: 1.14.4*

*Flink-CDC : 2.2*

*Kafka: CDH6.2(2.1)*

 

*Stage-1:* 

mysql-cdc-connector was used to consume mysql binlog data . about 40W rows 
changed when i executed some sql in mysql, and i can get those 40W rows without 
any data lose or reduplicate, just the some number as mysql changed . So, i 
don't think cdc is the problem.

 

Stage-2:

when i got binlog data, first i deserialized it to type of 
Tuple2, which tuple2.f0 has the format  "db.table" and i intend 
to use it as kafka topic for every different db.table, tuple2.f1 contains 
binlog value only.

 

*Stage-3:*

then, i used KafkaSink (which was introduced in flink 1.14) to write binlog to 
different kafka topic as tuple2.f0 indicated. 

Here is the code like :

!image-2022-06-23-10-49-01-213.png!

 

As u can see, I just want to use EXACTLY_ONCE semantics,but here is the problem:

after about 10mins waiting for all binlog consumed, i checked all data in a 
single kafka topic   (just one topic ), the total number of rows is much more 
than the number of binlog rows from mysql data changed, because too many 
reduplicated data sink to kafka. For example

!image-2022-06-23-10-58-15-141.png!

 

Stage-4:

however, when i changed  EXACTLY_ONCE. to.  AT_LEAST_ONCE, everything worked 
very well, no more reduplicated data in kafka.

 

 

So i'm wonderring , is there any bug in KafkaSink when EXACTLY_ONCE is 
configured.

 

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28208) The method createBatchSink in class HiveTableSink should setParallelism for map operator

2022-06-22 Thread Liu (Jira)
Liu created FLINK-28208:
---

 Summary: The method createBatchSink in class HiveTableSink should 
setParallelism for map operator
 Key: FLINK-28208
 URL: https://issues.apache.org/jira/browse/FLINK-28208
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Hive
Affects Versions: 1.16.0
Reporter: Liu


The problem is found when using Adaptive Batch Scheduler. In these, a simple 
SQL like "select * from * where *" would generate three operators including 
source, map and sink. The map's parallelism is set to -1 by default and is not 
the same with source and sink. As a result, the three operators can not be 
chained together.

 The reason is that we add map operator in method createBatchSink but not 
setParallelism. The changed code is as following:
{code:java}
private DataStreamSink createBatchSink(
DataStream dataStream,
DataStructureConverter converter,
StorageDescriptor sd,
HiveWriterFactory recordWriterFactory,
OutputFileConfig fileNaming,
final int parallelism)
throws IOException {

...

return dataStream
.map((MapFunction) value -> (Row) 
converter.toExternal(value))
.setParallelism(parallelism) // New added to ensure the right 
parallelism             .writeUsingOutputFormat(builder.build())
.setParallelism(parallelism);
} {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[DISCUSS] FLIP-243: Dedicated Opensearch connectors

2022-06-22 Thread Andriy Redko
Hi Folks, 

We would like to start a discussion thread on FLIP-243: Dedicated Opensearch 
connectors [1], [2] where we propose to provide dedicated connectors for 
Opensearch [3], on par with existing Elasticsearch ones. Looking forward to 
comments and feedback. Thank you.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-243%3A+Dedicated+Opensearch+connectors
[2] https://github.com/apache/flink/pull/18541 
[3] https://opensearch.org/

Best Regards,
Andriy Redko



[jira] [Created] (FLINK-28207) Disabling webhook should also disable mutator

2022-06-22 Thread Jira
Márton Balassi created FLINK-28207:
--

 Summary: Disabling webhook should also disable mutator
 Key: FLINK-28207
 URL: https://issues.apache.org/jira/browse/FLINK-28207
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Márton Balassi
Assignee: Márton Balassi
 Fix For: kubernetes-operator-1.1.0


The configuration for the mutating webhook suggests that it is nested inside 
the (validating) webhook:
https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/values.yaml#L73-L76

Based on this I would expect that if I disable the top level webhook it also 
disables the mutator, however this is not the case:
https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/templates/webhook.yaml#L19-L79
https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/templates/webhook.yaml#L115-L148

I do not see a use case currently where we would want the mutating webhook 
without having the validating one, so I suggest following the hierarchy that 
the helm configs imply. 




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28206) EOFException on Checkpoint Recovery

2022-06-22 Thread uharaqo (Jira)
uharaqo created FLINK-28206:
---

 Summary: EOFException on Checkpoint Recovery
 Key: FLINK-28206
 URL: https://issues.apache.org/jira/browse/FLINK-28206
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Affects Versions: 1.14.4
Reporter: uharaqo


 

We have only one Job Manager in Kubernetes and it suddenly got killed without 
any logs. A new Job Manager process could not recover from a checkpoint due to 
an EOFException. 
Task Managers killed themselves since they could not find any Job Manager. 
There were no error logs other than that on the Task Manager side.

It looks to me that the checkpoint is corrupted. Is there a way to identify the 
cause? What would you recommend us to do to mitigate this problem?

Here's the logs during the recovery phase. (Removed the stacktrace. Please find 
that at the bottom.)
{noformat}
{"timestamp":"2022-06-22T17:21:25.870Z","level":"INFO","logger":"org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStoreUtils","message":"Recovering
 checkpoints from 
KubernetesStateHandleStore{configMapName='univex-flink-record-collector-46071c6a64e47d1ce828dfe032f943a6-jobmanager-leader'}."}
{"timestamp":"2022-06-22T17:21:25.875Z","level":"INFO","logger":"org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStoreUtils","message":"Found
 1 checkpoints in 
KubernetesStateHandleStore{configMapName='univex-flink-record-collector-46071c6a64e47d1ce828dfe032f943a6-jobmanager-leader'}."}
{"timestamp":"2022-06-22T17:21:25.876Z","level":"INFO","logger":"org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStoreUtils","message":"Trying
 to fetch 1 checkpoints from storage."}
{"timestamp":"2022-06-22T17:21:25.876Z","level":"INFO","logger":"org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStoreUtils","message":"Trying
 to retrieve checkpoint 58130."}
{"timestamp":"2022-06-22T17:21:25.901Z","level":"ERROR","logger":"org.apache.flink.runtime.entrypoint.ClusterEntrypoint","message":"Fatal
 error occurred in the cluster 
entrypoint.","level":"INFO","logger":"org.apache.flink.runtime.entrypoint.ClusterEntrypoint","message":"Shutting
 StandaloneSessionClusterEntrypoint down with application status UNKNOWN. 
Diagnostics Cluster entrypoint has been closed externally.."}
{"timestamp":"2022-06-22T17:21:25.921Z","level":"INFO","logger":"org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint","message":"Shutting
 down rest endpoint."}
{"timestamp":"2022-06-22T17:21:25.922Z","level":"INFO","logger":"org.apache.flink.runtime.blob.BlobServer","message":"Stopped
 BLOB server at 0.0.0.0:6124"}
{noformat}

The stacktrace of the ERROR:
{noformat}
org.apache.flink.util.FlinkException: JobMaster for job 
46071c6a64e47d1ce828dfe032f943a6 failed.
    at 
org.apache.flink.runtime.dispatcher.Dispatcher.jobMasterFailed(Dispatcher.java:913)
    at 
org.apache.flink.runtime.dispatcher.Dispatcher.jobManagerRunnerFailed(Dispatcher.java:473)
    at 
org.apache.flink.runtime.dispatcher.Dispatcher.handleJobManagerRunnerResult(Dispatcher.java:450)
    at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$runJob$3(Dispatcher.java:427)
    at 
java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
    at 
java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
    at 
java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
    at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRunAsync$4(AkkaRpcActor.java:455)
    at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
    at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:455)
    at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:213)
    at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78)
    at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
    at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
    at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
    at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
    at akka.actor.Actor.aroundReceive(Actor.scala:537)
    at akka.actor.Actor.aroundReceive$(Actor.scala:535)
    at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
    at 

Re: [DISCUSS] Maintain a Calcite repository for Flink to accelerate the development for Flink SQL features

2022-06-22 Thread Martijn Visser
Hi everyone,

I've recently reached out to the Calcite community to see if we could
somehow get something done with regards to the PR that Jing Zhang had
opened a long time ago. In that thread, I also mentioned that we had a
discussion in the Flink community on potentially forking Calcite. I would
recommend reading up on the thread [1]. Specifically the replies from other
projects/PMCs (Apache Drill, Apache Dremio) are super interesting. These
projects have forked Calcite in the past, regret that move, have reverted
back to Calcite / are in the process of reverting and are elaborating on
that. This thread also gained some traction on Twitter in case you're
interested in more opinions. [3]

My take away from the discussion in the Flink community and the discussion
in the Calcite community is that I believe we should do 3 things:

1. We should not fork Calcite. There might be short term benefits but long
term pain. I think we already are suffering from enough long term pain in
the Flink codebase that we shouldn't take a step that will increase that
pain even more, scattered over multiple places.
2. I think we should try to help out the Calcite community more. Not only
by opening new PRs for new features, but we can also help by reviewing
those PRs, reviewing other PRs that could be relevant for Flink or propose
improvements given our experience at Flink. As you can see in the Calcite
thread, Timo has already expressed desire in doing so. Part of the OSS
community is also about helping each other; if we improve Calcite, we will
also improve Flink.
3. I think we need to prioritise keeping up with the Calcite updates. They
are currently working on releasing version 1.31, while Flink is still at
1.26.0. We don't necessarily need to stay in sync with the latest available
version, but I definitely think we should be at most 2 versions (and
preferably 1 version) behind (so currently that would be 1.28 and 1.29
soonish). Not only are we increasing our own tech debt by not updating, we
are also limiting ourselves in adding new features in the Table/SQL space.
As you can also see for the 1.26 release notes, there's a warning to only
use 1.26 for development since it can corrupt your data [3]. There are
already multiple upgrade tickets for Calcite [4] [5] [6].

[1] https://lists.apache.org/thread/3lkfhwjpqwy9pfhnvwmfkwmwlfyqs45z
[2]
https://twitter.com/gunnarmorling/status/1539499415337111553?s=21=8fGk3PxScOx4FJPJWE5UeA
[3] https://calcite.apache.org/news/2020/10/06/release-1.26.0/
[4] https://issues.apache.org/jira/browse/FLINK-20873
[5] https://issues.apache.org/jira/browse/FLINK-21239
[6] https://issues.apache.org/jira/browse/FLINK-27998

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser

Op do 5 mei 2022 om 10:34 schreef godfrey he :

> Hi, Timo & Martijn,
>
> Sorry for the late reply, thanks for the feedback.
>
> I strongly agree that the best solution would be to cooperate more
> with the Calcite community
> and maintain all new features and bug fixes in the Calcite community,
> without any forking.
> It is a long-term process. I think it's difficult to change community
> rules, because the Calcite
> project is a neutral lib that serves multiple projects simultaneously.
> I don't think fork calcite is the perfect solution, but rather a
> better balance within limited resources:
> it's possible to introduce some necessary minor features and bug fixes
> without having to
> upgrade to the latest version.
>
>
> I investigate other projects that use Calcite[1] and find that most of
> them do not use
> the latest version of the Calcite. Even for the Kylin community, the
> version, based on
> Calcite-1.16.0 has been updated to 70[2]. (Similar projects are quark and
> drill)
> My guess is that these projects choosed a stable version,
> (or even choose to maintain a fork project), to maintain the stability.
> When Flink does not need to introduce new syntax anymore,
> I guess it's less expensive and more manageable to maintain a fork Calcite.
>
>
> Even if we don't end up going the fork calcite route,
> I hope that we could discuss the options for subsequent calcite upgrades
> here.
> Just like Timo mentioned, how to balance feature development and code
> maintenance.
> There are a few realistic questions about the Calcite upgrade
> situation now, such as:
> 1. If we keep up with the latest version of Calcite, who is
> responsible for each upgrade?
> The current status is that no one has motivation to upgrade the version
> unless he/she wants to drive new features.
> 2. Do we have the resources/energy to upgrade each version?
> 3. How do we ensure that each upgrade is expected? It took a lot of effort
> to
> verify the correctness of the upgrade results.The Test set for
> uncommon sql usage is not enough now.
>
>
> > I still don't quite understand why we want to avoid Calcite upgrades.
> Not every feature in Calcite is a feature we really need. While some
> refactorings 

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-06-22 Thread Alexander Smirnov
Hi Qingsheng,

I like the current design, thanks for your efforts! I have no
objections at the moment.
+1 to start the vote.

Best regards,
Alexander

ср, 22 июн. 2022 г. в 15:59, Qingsheng Ren :
>
> Hi Jingsong,
>
> 1. Updated and thanks for the reminder!
>
> 2. We could do so for implementation but as public interface I prefer not to 
> introduce another layer and expose too much since this FLIP is already a huge 
> one with bunch of classes and interfaces.
>
> Best,
> Qingsheng
>
> > On Jun 22, 2022, at 11:16, Jingsong Li  wrote:
> >
> > Thanks Qingsheng and all.
> >
> > I like this design.
> >
> > Some comments:
> >
> > 1. LookupCache implements Serializable?
> >
> > 2. Minor: After FLIP-234 [1], there should be many connectors that
> > implement both PartialCachingLookupProvider and
> > PartialCachingAsyncLookupProvider. Can we extract a common interface
> > for `LookupCache getCache();` to ensure consistency?
> >
> > [1] 
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> >
> > Best,
> > Jingsong
> >
> > On Tue, Jun 21, 2022 at 4:09 PM Qingsheng Ren  wrote:
> >>
> >> Hi devs,
> >>
> >> I’d like to push FLIP-221 forward a little bit. Recently we had some 
> >> offline discussions and updated the FLIP. Here’s the diff compared to the 
> >> previous version:
> >>
> >> 1. (Async)LookupFunctionProvider is designed as a base interface for 
> >> constructing lookup functions.
> >> 2. From the LookupFunction we extend PartialCaching / 
> >> FullCachingLookupProvider for partial and full caching mode.
> >> 3. Introduce CacheReloadTrigger for specifying reload stratrgy in full 
> >> caching mode, and provide 2 default implementations (Periodic / 
> >> TimedCacheReloadTrigger)
> >>
> >> Looking forward to your replies~
> >>
> >> Best,
> >> Qingsheng
> >>
> >>> On Jun 2, 2022, at 17:15, Qingsheng Ren  wrote:
> >>>
> >>> Hi Becket,
> >>>
> >>> Thanks for your feedback!
> >>>
> >>> 1. An alternative way is to let the implementation of cache to decide
> >>> whether to store a missing key in the cache instead of the framework.
> >>> This sounds more reasonable and makes the LookupProvider interface
> >>> cleaner. I can update the FLIP and clarify in the JavaDoc of
> >>> LookupCache#put that the cache should decide whether to store an empty
> >>> collection.
> >>>
> >>> 2. Initially the builder pattern is for the extensibility of
> >>> LookupProvider interfaces that we could need to add more
> >>> configurations in the future. We can remove the builder now as we have
> >>> resolved the issue in 1. As for the builder in DefaultLookupCache I
> >>> prefer to keep it because we have a lot of arguments in the
> >>> constructor.
> >>>
> >>> 3. I think this might overturn the overall design. I agree with
> >>> Becket's idea that the API design should be layered considering
> >>> extensibility and it'll be great to have one unified interface
> >>> supporting both partial, full and even mixed custom strategies, but we
> >>> have some issues to resolve. The original purpose of treating full
> >>> caching separately is that we'd like to reuse the ability of
> >>> ScanRuntimeProvider. Developers just need to hand over Source /
> >>> SourceFunction / InputFormat so that the framework could be able to
> >>> compose the underlying topology and control the reload (maybe in a
> >>> distributed way). Under your design we leave the reload operation
> >>> totally to the CacheStrategy and I think it will be hard for
> >>> developers to reuse the source in the initializeCache method.
> >>>
> >>> Best regards,
> >>>
> >>> Qingsheng
> >>>
> >>> On Thu, Jun 2, 2022 at 1:50 PM Becket Qin  wrote:
> 
>  Thanks for updating the FLIP, Qingsheng. A few more comments:
> 
>  1. I am still not sure about what is the use case for cacheMissingKey().
>  More specifically, when would users want to have getCache() return a
>  non-empty value and cacheMissingKey() returns false?
> 
>  2. The builder pattern. Usually the builder pattern is used when there 
>  are
>  a lot of variations of constructors. For example, if a class has three
>  variables and all of them are optional, so there could potentially be 
>  many
>  combinations of the variables. But in this FLIP, I don't see such case.
>  What is the reason we have builders for all the classes?
> 
>  3. Should the caching strategy be excluded from the top level provider 
>  API?
>  Technically speaking, the Flink framework should only have two interfaces
>  to deal with:
>    A) LookupFunction
>    B) AsyncLookupFunction
>  Orthogonally, we *believe* there are two different strategies people can 
>  do
>  caching. Note that the Flink framework does not care what is the caching
>  strategy here.
>    a) partial caching
>    b) full caching
> 
>  Putting them together, we end up with 3 

[DISCUSS] Support partition pruning for streaming reading

2022-06-22 Thread cao zou
Hi devs, I want to start a discussion to find a way to support partition
pruning for streaming reading.


Now, Flink has supported the partition pruning, the implementation consists
of *Source Ability*, *Logical Rule*, and the interface
*SupportsPartitionPushDown*, but they all only take effect in batch
reading. When reading a table in streaming mode, the existing mechanism
will cause some problems posted by FLINK-27898
[1], and the records
that should be filtered will be sent downstream.

To solve this drawback, this discussion is proposed, and the Hive and other
BigData systems stored with partitions will benefit more from it.

 Now, the existing partitions which are needed to consume will be generated
in *PushPartitionIntoTableSourceScanRule*. Then, the partitions will be
pushed into TableSource. It’s working well in batch mode, but if we want to
read records from Hive in streaming mode, and consider the partitions
committed in the future, it’s not enough.

To support pruning the partitions committed in the feature, the pruning
function should be pushed into the TableSource, and then delivered to
*ContinuousPartitionFetcher*, such that the pruning for uncommitted
partitions can be invoked here.

Before proposing the changes, I think it is necessary to clarify the
existing pruning logic. The main logic of the pruning in
*PushPartitionIntoTableSourceScanRule* is as follows.

Firstly, generating a pruning function called partitionPruner, the function
is extended from a RichMapFunction.


if tableSource.listPartitions() is not empty:
  partitions = dynamicTableSource.listPartitions()

  for p in partitions:
boolean predicate = partitionPruner.map(convertPartitionToRow(p))

add p to partitionsAfterPruning where the predicate is true.

else  tableSource.listPartitions() is empty:
  if the filter can be converted to ResolvedExpression &&
the catalog can support the filter :

partitionsAfterPruning = catalog.listPartitionsByFilter()

the value of partitionsAfterPruning is all needed.
  else :

partitions = catalog.listPartitions()
for p in partitions:
boolean predicate = partitionPruner.map(convertPartitionToRow(p))

 add p to partitionsAfterPruning where the predicate is true.

I think the main logic can be classified into two sides, one exists in the
logical rule, and the other exists in the connector side. The catalog info
should be used on the rule side, and not on the connector side, the pruning
function could be used on both of them or unified on the connector side.


Proposed changes


   - add a new method in SupportsPartitionPushDown
   - let HiveSourceTable, HiveSourceBuilder, and
   HiveContinuousPartitionFetcher hold the pruning function.
   - pruning after fetchPartitions invoked.

Considering the version compatibility and the optimization for the method
of listing partitions with filter in the catalog, I think we can add a new
method in *SupportsPartitionPushDown*

/**
* Provides a list of remaining partitions. After those partitions are
applied, a source must
* not read the data of other partitions during runtime.
*
* See the documentation of {@link SupportsPartitionPushDown} for more
information.
*/
void applyPartitions(List> remainingPartitions);

/**
* Provides a pruning function for uncommitted partitions.
*/
default void applyPartitionPuringFunction(MapFunction
partitionPruningFunction) { }

We can push the generated function into TableSource, such that the
ContinuousPartitionFetcher can get it.

For Batch reading, the 'remainingPartitions' will be seen as the partitions
needed to consume, for streaming reading, we use the
'partitionPruningFunction' to ignore the unneeded partitions.
Rejected Alternatives

Do not remove the filter logic in Filter Node about the partition keys, if
the source will execute streaming reading.


Looking forward to your opinions.


[1] https://issues.apache.org/jira/browse/FLINK-27898

best

zoucao


Re: Re: [ANNOUNCE] New Apache Flink Committers: Qingsheng Ren, Shengkai Fang

2022-06-22 Thread Yu Li
Congrats and welcome, Qingsheng and Shengkai!

Best Regards,
Yu


On Wed, 22 Jun 2022 at 17:43, Jiangang Liu 
wrote:

> Congratulations!
>
> Best,
> Jiangang Liu
>
> Mason Chen  于2022年6月22日周三 00:37写道:
>
> > Awesome work Qingsheng and Shengkai!
> >
> > Best,
> > Mason
> >
> > On Tue, Jun 21, 2022 at 4:53 AM Zhipeng Zhang 
> > wrote:
> >
> > > Congratulations, Qingsheng and ShengKai.
> > >
> > > Yang Wang  于2022年6月21日周二 19:43写道:
> > >
> > > > Congratulations, Qingsheng and ShengKai.
> > > >
> > > >
> > > > Best,
> > > > Yang
> > > >
> > > > Benchao Li  于2022年6月21日周二 19:33写道:
> > > >
> > > > > Congratulations!
> > > > >
> > > > > weijie guo  于2022年6月21日周二 13:44写道:
> > > > >
> > > > > > Congratulations, Qingsheng and ShengKai!
> > > > > >
> > > > > > Best regards,
> > > > > >
> > > > > > Weijie
> > > > > >
> > > > > >
> > > > > > Yuan Mei  于2022年6月21日周二 13:07写道:
> > > > > >
> > > > > > > Congrats Qingsheng and ShengKai!
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Yuan
> > > > > > >
> > > > > > > On Tue, Jun 21, 2022 at 11:27 AM Terry Wang <
> zjuwa...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Congratulations, Qingsheng and ShengKai!
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best,
> > > > > Benchao Li
> > > > >
> > > >
> > >
> > >
> > > --
> > > best,
> > > Zhipeng
> > >
> >
>


Re: [DISCUSS] Release Kubernetes operator 1.0.1

2022-06-22 Thread Őrhidi Mátyás
+1 for the patch release. Thanks Gyula!

On Wed, Jun 22, 2022 at 5:35 PM Márton Balassi 
wrote:

> Hi team,
>
> +1 for having a 1.0.1 for the Kubernetes Operator.
>
> On Wed, Jun 22, 2022 at 4:23 PM Gyula Fóra  wrote:
>
> > Hi Devs!
> >
> > How do you feel about releasing the 1.0.1 patch release for the
> Kubernetes
> > operator?
> >
> > We have fixed a few annoying issues that many people tend to hit.
> >
> > Given that we are about halfway until the next minor release based on the
> > proposed schedule I think we could prepare a 1.0.1 RC1 in the next 1-2
> days
> > .
> >
> > I can volunteer to be the release manager.
> >
> > What do you think?
> >
> > Cheers,
> > Gyula
> >
>


Re: [DISCUSS] Release Kubernetes operator 1.0.1

2022-06-22 Thread Márton Balassi
Hi team,

+1 for having a 1.0.1 for the Kubernetes Operator.

On Wed, Jun 22, 2022 at 4:23 PM Gyula Fóra  wrote:

> Hi Devs!
>
> How do you feel about releasing the 1.0.1 patch release for the Kubernetes
> operator?
>
> We have fixed a few annoying issues that many people tend to hit.
>
> Given that we are about halfway until the next minor release based on the
> proposed schedule I think we could prepare a 1.0.1 RC1 in the next 1-2 days
> .
>
> I can volunteer to be the release manager.
>
> What do you think?
>
> Cheers,
> Gyula
>


[DISCUSS] Release Kubernetes operator 1.0.1

2022-06-22 Thread Gyula Fóra
Hi Devs!

How do you feel about releasing the 1.0.1 patch release for the Kubernetes
operator?

We have fixed a few annoying issues that many people tend to hit.

Given that we are about halfway until the next minor release based on the
proposed schedule I think we could prepare a 1.0.1 RC1 in the next 1-2 days
.

I can volunteer to be the release manager.

What do you think?

Cheers,
Gyula


[jira] [Created] (FLINK-28205) jdbc connector 定时调度 flush 存在内存泄漏 bug

2022-06-22 Thread michaelxiang (Jira)
michaelxiang created FLINK-28205:


 Summary: jdbc connector 定时调度 flush 存在内存泄漏 bug
 Key: FLINK-28205
 URL: https://issues.apache.org/jira/browse/FLINK-28205
 Project: Flink
  Issue Type: Bug
  Components: Connectors / JDBC
Affects Versions: 1.14.5, 1.13.6, 1.15.0
Reporter: michaelxiang


类路径:org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat

bug位置:open方法,调度线程Runnable 实例

采用flink-connector-jdbc 进行写入时, 定时调度线程进行 flush 出现异常情况时对 RuntimeException 
进行了捕获,这会导致在新数据到达 Task 前不会发生故障退出,因而定时调度线程则会不停的通过创建RuntimeException 进行包裹 上一个创建的 
flushException,对于flushException 无法释放引用被GC回收,从而导致内存泄漏。



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28204) Deleting a FlinkDeployment results in an error on the pod

2022-06-22 Thread Matt Casters (Jira)
Matt Casters created FLINK-28204:


 Summary: Deleting a FlinkDeployment results in an error on the pod
 Key: FLINK-28204
 URL: https://issues.apache.org/jira/browse/FLINK-28204
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.0.0
 Environment: AWS EKS

 
{code:java}
kubectl version 
   
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.8", 
GitCommit:"a12b886b1da059e0190c54d09c5eab5219dd7acf", GitTreeState:"clean", 
BuildDate:"2022-06-17T22:27:29
Z", GoVersion:"go1.17.10", Compiler:"gc", Platform:"linux/amd64"}

Server Version: version.Info{Major:"1", Minor:"22+", 
GitVersion:"v1.22.9-eks-a64ea69", 
GitCommit:"540410f9a2e24b7a2a870ebfacb3212744b5f878", GitTreeState:"clean", 
BuildDate:"2022-0
5-12T19:15:31Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}


 {code}
Reporter: Matt Casters


I didn't configure the memory settings of my Flink cluster correctly in the 
Flink deployment Yaml.

So I thought I would delete the deployment but I'm getting this error in the 
log of the f-k-o pod:
{code:java}
2022-06-22 13:19:13,521 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][default/apache-hop-flink] Deleting FlinkDeployment
2022-06-22 13:19:13,521 i.j.o.p.e.ReconciliationDispatcher 
[ERROR][default/apache-hop-flink] Error during event processing ExecutionScope{ 
resource id: CustomResourceID{name='apache-hop-flink', namespace='default'}, 
version: 23709} failed.
java.lang.RuntimeException: Cannot create observe config before first 
deployment, this indicates a bug.
at 
org.apache.flink.kubernetes.operator.config.FlinkConfigManager.getObserveConfig(FlinkConfigManager.java:137)
at 
org.apache.flink.kubernetes.operator.service.FlinkService.cancelJob(FlinkService.java:357)
at 
org.apache.flink.kubernetes.operator.reconciler.deployment.ApplicationReconciler.shutdown(ApplicationReconciler.java:327)
at 
org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractDeploymentReconciler.cleanup(AbstractDeploymentReconciler.java:56)
at 
org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractDeploymentReconciler.cleanup(AbstractDeploymentReconciler.java:37)
at 
org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.cleanup(FlinkDeploymentController.java:107)
at 
org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.cleanup(FlinkDeploymentController.java:59)
at 
io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:68)
at 
io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:50)
at 
io.javaoperatorsdk.operator.api.monitoring.Metrics.timeControllerExecution(Metrics.java:34)
at io.javaoperatorsdk.operator.processing.Controller.cleanup(Controller.java:49)
at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleCleanup(ReconciliationDispatcher.java:252)
at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:72)
at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:50)
at 
io.javaoperatorsdk.operator.processing.event.EventProcessor$ControllerExecution.run(EventProcessor.java:349)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source) {code}
So in essence this leaves me in a state between not deployed and not able to 
delete the flinkdeployment.

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28203) Mark all bundled dependencies as optional

2022-06-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-28203:


 Summary: Mark all bundled dependencies as optional
 Key: FLINK-28203
 URL: https://issues.apache.org/jira/browse/FLINK-28203
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28202) Generalize utils around shade-plugin

2022-06-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-28202:


 Summary: Generalize utils around shade-plugin
 Key: FLINK-28202
 URL: https://issues.apache.org/jira/browse/FLINK-28202
 Project: Flink
  Issue Type: Sub-task
  Components: Build System, Build System / CI
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0


We'll be adding another safeguard against developer mistakes which also parses 
the output of the shade plugin, like the license checker.

We should generalize this parsing such that both checks can use the same code.




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28201) Generalize utils around dependency-plugin

2022-06-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-28201:


 Summary: Generalize utils around dependency-plugin
 Key: FLINK-28201
 URL: https://issues.apache.org/jira/browse/FLINK-28201
 Project: Flink
  Issue Type: Sub-task
  Components: Build System, Build System / CI
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0


We'll be adding another safeguard against developer mistakes which also parses 
the output of the dependency plugin, like the scala suffix checker.

We should generalize this parsing such that both checks can use the same code.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[VOTE] Release 1.15.1, release candidate #1

2022-06-22 Thread David Anderson
Hi everyone,

Please review and vote on release candidate #1 for version 1.15.1, as
follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:

* JIRA release notes [1],
* the official Apache source release and binary convenience releases to be
deployed to dist.apache.org [2], which are signed with the key with
fingerprint E982F098 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.15.1-rc1" [5],
* website pull request listing the new release and adding announcement blog
post [6].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
David

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=
12351546
[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.15.1-rc1/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1511/
[5] https://github.com/apache/flink/tree/release-1.15.1-rc1
[6] https://github.com/apache/flink-web/pull/554


[jira] [Created] (FLINK-28200) Add Table Store Hive reader documentation

2022-06-22 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-28200:


 Summary: Add Table Store Hive reader documentation
 Key: FLINK-28200
 URL: https://issues.apache.org/jira/browse/FLINK-28200
 Project: Flink
  Issue Type: Sub-task
  Components: Table Store
Reporter: Jingsong Lee
 Fix For: table-store-0.2.0






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28199) Failures on YARNHighAvailabilityITCase.testClusterClientRetrieval and YARNHighAvailabilityITCase.testKillYarnSessionClusterEntrypoint

2022-06-22 Thread Martijn Visser (Jira)
Martijn Visser created FLINK-28199:
--

 Summary: Failures on 
YARNHighAvailabilityITCase.testClusterClientRetrieval and 
YARNHighAvailabilityITCase.testKillYarnSessionClusterEntrypoint
 Key: FLINK-28199
 URL: https://issues.apache.org/jira/browse/FLINK-28199
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.16.0
Reporter: Martijn Visser


{code:java}
Jun 22 08:57:50 [ERROR] Errors: 
Jun 22 08:57:50 [ERROR]   YARNHighAvailabilityITCase.testClusterClientRetrieval 
» Timeout testClusterCli...
Jun 22 08:57:50 [ERROR]   
YARNHighAvailabilityITCase.testKillYarnSessionClusterEntrypoint:156->YarnTestBase.runTest:288->lambda$testKillYarnSessionClusterEntrypoint$0:182->waitForJobTermination:325
 » Execution
Jun 22 08:57:50 [INFO] 
Jun 22 08:57:50 [ERROR] Tests run: 27, Failures: 0, Errors: 2, Skipped: 0
{code}

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=37037=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=29523



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28198) CassandraConnectorITCase#testRaiseCassandraRequestsTimeouts fails with timeout

2022-06-22 Thread Martijn Visser (Jira)
Martijn Visser created FLINK-28198:
--

 Summary: 
CassandraConnectorITCase#testRaiseCassandraRequestsTimeouts fails with timeout
 Key: FLINK-28198
 URL: https://issues.apache.org/jira/browse/FLINK-28198
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Cassandra
Affects Versions: 1.16.0
Reporter: Martijn Visser


{code:java}
Jun 22 07:57:37 [ERROR] 
org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase.testRaiseCassandraRequestsTimeouts
  Time elapsed: 12.067 s  <<< ERROR!
Jun 22 07:57:37 com.datastax.driver.core.exceptions.OperationTimedOutException: 
[/172.17.0.1:59915] Timed out waiting for server response
Jun 22 07:57:37 at 
com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:43)
Jun 22 07:57:37 at 
com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:25)
Jun 22 07:57:37 at 
com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35)
Jun 22 07:57:37 at 
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:293)
Jun 22 07:57:37 at 
com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:58)
Jun 22 07:57:37 at 
com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:39)
{code}

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=37037=logs=4eda0b4a-bd0d-521a-0916-8285b9be9bb5=2ff6d5fa-53a6-53ac-bff7-fa524ea361a9=13736



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28197) Flink didn't deletes the YARN application files when the submit is failed in application mode

2022-06-22 Thread zlzhang0122 (Jira)
zlzhang0122 created FLINK-28197:
---

 Summary: Flink didn't deletes the YARN application files when the 
submit is failed in application mode
 Key: FLINK-28197
 URL: https://issues.apache.org/jira/browse/FLINK-28197
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.14.2
Reporter: zlzhang0122
 Fix For: 1.16.0


When users submit a Flink job to YARN and the submit is failed in yarn 
Application Mode, the YARN application files such as Flink binaries, libraries, 
etc. won't be delete and will exists permanently unless users delete manually.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28196) Rename hadoop.version property

2022-06-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-28196:


 Summary: Rename hadoop.version property
 Key: FLINK-28196
 URL: https://issues.apache.org/jira/browse/FLINK-28196
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0


Maven 3.8.5 had a change (as I understand it for consistency purposes) where 
properties set on the command-line are also applied to upstream dependencies.

See https://issues.apache.org/jira/browse/MNG-7417

In other words, since Hadoop has a {{hadoop.version}} property in their parent 
pom, when we set this CI it not only sets _our_ property, but also the one from 
Hadoop.

We should prefix our property with "flink" to prevent such conflicts.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28195) Annotate Python3.6 as deprecated in PyFlink 1.16

2022-06-22 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-28195:


 Summary: Annotate Python3.6 as deprecated in PyFlink 1.16
 Key: FLINK-28195
 URL: https://issues.apache.org/jira/browse/FLINK-28195
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Affects Versions: 1.16.0
Reporter: Huang Xingbo
Assignee: Huang Xingbo
 Fix For: 1.16.0


Python 3.6 extended support end on 23 December 2021. We plan that PyFlink 1.16 
will be the last version support Python3.6.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: Re: [ANNOUNCE] New Apache Flink Committers: Qingsheng Ren, Shengkai Fang

2022-06-22 Thread Jiangang Liu
Congratulations!

Best,
Jiangang Liu

Mason Chen  于2022年6月22日周三 00:37写道:

> Awesome work Qingsheng and Shengkai!
>
> Best,
> Mason
>
> On Tue, Jun 21, 2022 at 4:53 AM Zhipeng Zhang 
> wrote:
>
> > Congratulations, Qingsheng and ShengKai.
> >
> > Yang Wang  于2022年6月21日周二 19:43写道:
> >
> > > Congratulations, Qingsheng and ShengKai.
> > >
> > >
> > > Best,
> > > Yang
> > >
> > > Benchao Li  于2022年6月21日周二 19:33写道:
> > >
> > > > Congratulations!
> > > >
> > > > weijie guo  于2022年6月21日周二 13:44写道:
> > > >
> > > > > Congratulations, Qingsheng and ShengKai!
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Weijie
> > > > >
> > > > >
> > > > > Yuan Mei  于2022年6月21日周二 13:07写道:
> > > > >
> > > > > > Congrats Qingsheng and ShengKai!
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Yuan
> > > > > >
> > > > > > On Tue, Jun 21, 2022 at 11:27 AM Terry Wang 
> > > > wrote:
> > > > > >
> > > > > > > Congratulations, Qingsheng and ShengKai!
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best,
> > > > Benchao Li
> > > >
> > >
> >
> >
> > --
> > best,
> > Zhipeng
> >
>


[jira] [Created] (FLINK-28194) Remove workaround around avro sql jar

2022-06-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-28194:


 Summary: Remove workaround around avro sql jar
 Key: FLINK-28194
 URL: https://issues.apache.org/jira/browse/FLINK-28194
 Project: Flink
  Issue Type: Technical Debt
  Components: API / Python
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0


Because of FLINK-17417 flink-python contains a workaround that manually 
assembles a sort-of avro sql jar.
Rely on the sql-avro jar instead and remove the workaround.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-06-22 Thread Jingsong Li
Qingsheng Thanks for the update,

Looks good to me!

Best,
Jingsong

On Wed, Jun 22, 2022 at 5:00 PM Qingsheng Ren  wrote:
>
> Hi Jingsong,
>
> 1. Updated and thanks for the reminder!
>
> 2. We could do so for implementation but as public interface I prefer not to 
> introduce another layer and expose too much since this FLIP is already a huge 
> one with bunch of classes and interfaces.
>
> Best,
> Qingsheng
>
> > On Jun 22, 2022, at 11:16, Jingsong Li  wrote:
> >
> > Thanks Qingsheng and all.
> >
> > I like this design.
> >
> > Some comments:
> >
> > 1. LookupCache implements Serializable?
> >
> > 2. Minor: After FLIP-234 [1], there should be many connectors that
> > implement both PartialCachingLookupProvider and
> > PartialCachingAsyncLookupProvider. Can we extract a common interface
> > for `LookupCache getCache();` to ensure consistency?
> >
> > [1] 
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> >
> > Best,
> > Jingsong
> >
> > On Tue, Jun 21, 2022 at 4:09 PM Qingsheng Ren  wrote:
> >>
> >> Hi devs,
> >>
> >> I’d like to push FLIP-221 forward a little bit. Recently we had some 
> >> offline discussions and updated the FLIP. Here’s the diff compared to the 
> >> previous version:
> >>
> >> 1. (Async)LookupFunctionProvider is designed as a base interface for 
> >> constructing lookup functions.
> >> 2. From the LookupFunction we extend PartialCaching / 
> >> FullCachingLookupProvider for partial and full caching mode.
> >> 3. Introduce CacheReloadTrigger for specifying reload stratrgy in full 
> >> caching mode, and provide 2 default implementations (Periodic / 
> >> TimedCacheReloadTrigger)
> >>
> >> Looking forward to your replies~
> >>
> >> Best,
> >> Qingsheng
> >>
> >>> On Jun 2, 2022, at 17:15, Qingsheng Ren  wrote:
> >>>
> >>> Hi Becket,
> >>>
> >>> Thanks for your feedback!
> >>>
> >>> 1. An alternative way is to let the implementation of cache to decide
> >>> whether to store a missing key in the cache instead of the framework.
> >>> This sounds more reasonable and makes the LookupProvider interface
> >>> cleaner. I can update the FLIP and clarify in the JavaDoc of
> >>> LookupCache#put that the cache should decide whether to store an empty
> >>> collection.
> >>>
> >>> 2. Initially the builder pattern is for the extensibility of
> >>> LookupProvider interfaces that we could need to add more
> >>> configurations in the future. We can remove the builder now as we have
> >>> resolved the issue in 1. As for the builder in DefaultLookupCache I
> >>> prefer to keep it because we have a lot of arguments in the
> >>> constructor.
> >>>
> >>> 3. I think this might overturn the overall design. I agree with
> >>> Becket's idea that the API design should be layered considering
> >>> extensibility and it'll be great to have one unified interface
> >>> supporting both partial, full and even mixed custom strategies, but we
> >>> have some issues to resolve. The original purpose of treating full
> >>> caching separately is that we'd like to reuse the ability of
> >>> ScanRuntimeProvider. Developers just need to hand over Source /
> >>> SourceFunction / InputFormat so that the framework could be able to
> >>> compose the underlying topology and control the reload (maybe in a
> >>> distributed way). Under your design we leave the reload operation
> >>> totally to the CacheStrategy and I think it will be hard for
> >>> developers to reuse the source in the initializeCache method.
> >>>
> >>> Best regards,
> >>>
> >>> Qingsheng
> >>>
> >>> On Thu, Jun 2, 2022 at 1:50 PM Becket Qin  wrote:
> 
>  Thanks for updating the FLIP, Qingsheng. A few more comments:
> 
>  1. I am still not sure about what is the use case for cacheMissingKey().
>  More specifically, when would users want to have getCache() return a
>  non-empty value and cacheMissingKey() returns false?
> 
>  2. The builder pattern. Usually the builder pattern is used when there 
>  are
>  a lot of variations of constructors. For example, if a class has three
>  variables and all of them are optional, so there could potentially be 
>  many
>  combinations of the variables. But in this FLIP, I don't see such case.
>  What is the reason we have builders for all the classes?
> 
>  3. Should the caching strategy be excluded from the top level provider 
>  API?
>  Technically speaking, the Flink framework should only have two interfaces
>  to deal with:
>    A) LookupFunction
>    B) AsyncLookupFunction
>  Orthogonally, we *believe* there are two different strategies people can 
>  do
>  caching. Note that the Flink framework does not care what is the caching
>  strategy here.
>    a) partial caching
>    b) full caching
> 
>  Putting them together, we end up with 3 combinations that we think are
>  valid:
> Aa) 

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-06-22 Thread Qingsheng Ren
Hi Jingsong,

1. Updated and thanks for the reminder!

2. We could do so for implementation but as public interface I prefer not to 
introduce another layer and expose too much since this FLIP is already a huge 
one with bunch of classes and interfaces.

Best,
Qingsheng

> On Jun 22, 2022, at 11:16, Jingsong Li  wrote:
> 
> Thanks Qingsheng and all.
> 
> I like this design.
> 
> Some comments:
> 
> 1. LookupCache implements Serializable?
> 
> 2. Minor: After FLIP-234 [1], there should be many connectors that
> implement both PartialCachingLookupProvider and
> PartialCachingAsyncLookupProvider. Can we extract a common interface
> for `LookupCache getCache();` to ensure consistency?
> 
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> 
> Best,
> Jingsong
> 
> On Tue, Jun 21, 2022 at 4:09 PM Qingsheng Ren  wrote:
>> 
>> Hi devs,
>> 
>> I’d like to push FLIP-221 forward a little bit. Recently we had some offline 
>> discussions and updated the FLIP. Here’s the diff compared to the previous 
>> version:
>> 
>> 1. (Async)LookupFunctionProvider is designed as a base interface for 
>> constructing lookup functions.
>> 2. From the LookupFunction we extend PartialCaching / 
>> FullCachingLookupProvider for partial and full caching mode.
>> 3. Introduce CacheReloadTrigger for specifying reload stratrgy in full 
>> caching mode, and provide 2 default implementations (Periodic / 
>> TimedCacheReloadTrigger)
>> 
>> Looking forward to your replies~
>> 
>> Best,
>> Qingsheng
>> 
>>> On Jun 2, 2022, at 17:15, Qingsheng Ren  wrote:
>>> 
>>> Hi Becket,
>>> 
>>> Thanks for your feedback!
>>> 
>>> 1. An alternative way is to let the implementation of cache to decide
>>> whether to store a missing key in the cache instead of the framework.
>>> This sounds more reasonable and makes the LookupProvider interface
>>> cleaner. I can update the FLIP and clarify in the JavaDoc of
>>> LookupCache#put that the cache should decide whether to store an empty
>>> collection.
>>> 
>>> 2. Initially the builder pattern is for the extensibility of
>>> LookupProvider interfaces that we could need to add more
>>> configurations in the future. We can remove the builder now as we have
>>> resolved the issue in 1. As for the builder in DefaultLookupCache I
>>> prefer to keep it because we have a lot of arguments in the
>>> constructor.
>>> 
>>> 3. I think this might overturn the overall design. I agree with
>>> Becket's idea that the API design should be layered considering
>>> extensibility and it'll be great to have one unified interface
>>> supporting both partial, full and even mixed custom strategies, but we
>>> have some issues to resolve. The original purpose of treating full
>>> caching separately is that we'd like to reuse the ability of
>>> ScanRuntimeProvider. Developers just need to hand over Source /
>>> SourceFunction / InputFormat so that the framework could be able to
>>> compose the underlying topology and control the reload (maybe in a
>>> distributed way). Under your design we leave the reload operation
>>> totally to the CacheStrategy and I think it will be hard for
>>> developers to reuse the source in the initializeCache method.
>>> 
>>> Best regards,
>>> 
>>> Qingsheng
>>> 
>>> On Thu, Jun 2, 2022 at 1:50 PM Becket Qin  wrote:
 
 Thanks for updating the FLIP, Qingsheng. A few more comments:
 
 1. I am still not sure about what is the use case for cacheMissingKey().
 More specifically, when would users want to have getCache() return a
 non-empty value and cacheMissingKey() returns false?
 
 2. The builder pattern. Usually the builder pattern is used when there are
 a lot of variations of constructors. For example, if a class has three
 variables and all of them are optional, so there could potentially be many
 combinations of the variables. But in this FLIP, I don't see such case.
 What is the reason we have builders for all the classes?
 
 3. Should the caching strategy be excluded from the top level provider API?
 Technically speaking, the Flink framework should only have two interfaces
 to deal with:
   A) LookupFunction
   B) AsyncLookupFunction
 Orthogonally, we *believe* there are two different strategies people can do
 caching. Note that the Flink framework does not care what is the caching
 strategy here.
   a) partial caching
   b) full caching
 
 Putting them together, we end up with 3 combinations that we think are
 valid:
Aa) PartialCachingLookupFunctionProvider
Ba) PartialCachingAsyncLookupFunctionProvider
Ab) FullCachingLookupFunctionProvider
 
 However, the caching strategy could actually be quite flexible. E.g. an
 initial full cache load followed by some partial updates. Also, I am not
 100% sure if the full caching will always use ScanTableSource. Including
 the 

Re: [DISCUSS] Flink Kubernetes Operator release cadence proposal

2022-06-22 Thread Márton Balassi
Thanks for the proposal, Matyas.

+1 for 2 month release cycles with the breakdown Gyula suggested.

@Yang: we could start tagging features with 1.1 / 1.2 version then, good
call.

On Wed, Jun 22, 2022 at 4:58 AM Yang Wang  wrote:

> +1 for 2 month release cycles.
>
> Since we have promised the backward compatibility for CRD, I think it is
> also reasonable for us to maintain the latest two minor versions with patch
> releases.
>
> Given that we only have 5~6 weeks for feature development, maybe we need to
> confirm down the features as soon as possible in the release cycle.
> Otherwise, we are at great risk of delay. If we are targeting the 1.1.0
> release to Aug 1. It is the time to determine which features we want to
> include in this release.
>
> I agree with Gyula that we need to continuously improve the test coverage,
> especially the e2e tests. The users are very welcome to share their
> production use cases
> and we could consider whether they could be covered by e2e tests. Benefit
> from this, we could ship a stable release more quickly and easily.
>
>
>
> Best,
> Yang
>
> Gyula Fóra  于2022年6月21日周二 19:57写道:
>
> > Hi Matyas!
> >
> > Thanks for starting the discussion. I think the 2 month release cycle
> > sounds reasonable.
> >
> > I think it's important for users to have frequent operator releases as
> > these affect all Flink jobs running in the environment. We should also
> only
> > adopt a schedule that we can most likely keep.
> > If we want to be successful with the proposed schedule we have to ensure
> > that each release has a relatively small scope of new features and we
> have
> > good test coverage.
> >
> > In addition to your suggestion I would like to add a feature-freeze for
> > bigger new features after 5 weeks (1 week before cutting the release
> > branch).
> >
> > So in practice for 1.1.0 this would look like:
> >
> > - Jun 6 : 1.0.0 was released
> > - July 11: Feature Freeze
> > - July 18: Cut release-1.1 branch
> > - Aug 1: Target 1.1.0 release date
> >
> > Cheers,
> > Gyula
> >
> > On Tue, Jun 21, 2022 at 9:04 AM Őrhidi Mátyás 
> > wrote:
> >
> > > Hi Devs,
> > >
> > > After the successful Kubernetes Operator 1.0.0 release, which is
> > > considered to be the first production grade one, it is probably a good
> > time
> > > now to also agree on a predictable release cadence for the Operator
> too,
> > > similarly to the time-based release plan we have for the Flink core
> > project.
> > >
> > > Given that the Operator itself is not strictly bound to Flink versions
> it
> > > can be upgraded independently from the runtime versions it manages. It
> > > would benefit the community to have frequent minor releases until the
> > > majority of the roadmap items are complete that also encourages users
> to
> > > upgrade regularly in reasonable boundaries. Based on some offline
> > > discussion with Gyula Fora I would like to propose the following
> > > operational model for Operator releases:
> > >
> > > - time-based release cadence with 2 month release cycles ( This would
> > give
> > > us roughly 6 weeks pure dev time and leave 2 weeks for the release
> > process
> > > to finish)
> > > - on-demand patch releases for critical issues only
> > > - support the current and previous minor releases with bug fixes
> > >
> > > I am looking forward to your feedback and suggestions on this topic.
> Once
> > > we have an agreement I will formalize it on a Wiki page.
> > >
> > > Thanks,
> > > Matyas
> > >
> >
>


[jira] [Created] (FLINK-28193) Enable to identify whether a job vertex contains source/sink operators

2022-06-22 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-28193:
---

 Summary: Enable to identify whether a job vertex contains 
source/sink operators
 Key: FLINK-28193
 URL: https://issues.apache.org/jira/browse/FLINK-28193
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Zhu Zhu
 Fix For: 1.16.0


Speculative execution does not support sources/sinks in the first version. 
Therefore, it will not create speculation instances for vertices which contains 
source/sink operators.

Note that a job vertex with no input/output does not mean it is a source/sink 
vertex. Multi-input sources can have input. And it's possible that the vertex 
with no output edge does not contain any sink operator. Besides that, a new 
sink with topology can spread the sink logic into multiple job vertices 
connected with job edges.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: Re: [DISCUSS] FLIP-240: Introduce "ANALYZE TABLE" Syntax

2022-06-22 Thread Jing Ge
sounds good to me. Thanks!

Best regards,
Jing

On Fri, Jun 17, 2022 at 5:37 AM godfrey he  wrote:

> Hi, Jing.
>
> Thanks for the feedback.
>
> >When will the converted SELECT statement of the ANALYZE TABLE be
> > submitted? right after the CREATE TABLE?
> The SELECT  job will be submitted only when `ANALYZE TABLE` is executed,
> and there is nothing to do with CREATE TABLE. Because the `ANALYZE TABLE`
> is triggered manually as needed.
>
> >Will it be submitted periodically to keep the statistical data
> >up-to-date, since the data might be mutable?
> the `ANALYZE TABLE` is triggered manually as needed.
> I will update the doc.
>
> >It might not be strong enough to avoid human error
> > I would suggest using FOR ALL PARTITIONS explicitly
> > just like FOR ALL COLUMNS.
> Agree, specifying `PARTITION` explicitly is more friendly
> and safe. I prefer to use `PARTITION(ds, hr)` without
> specific partition value, hive has the similar syntax.
> WDYT ?
>
> Best,
> Godfrey
>
> Jing Ge  于2022年6月16日周四 03:53写道:
> >
> > Hi Godfrey,
> >
> > Thanks for driving this! There are some areas where I couldn't find
> enough
> > information in the FLIP, just wondering if I could get more
> > explanation from you w.r.t. the following questions:
> >
> > 1. When will the converted SELECT statement of the ANALYZE TABLE be
> > submitted? right after the CREATE TABLE?
> >
> > 2. Will it be submitted periodically to keep the statistical data
> > up-to-date, since the data might be mutable?
> >
> > 3. " If no partition is specified, the statistics will be gathered for
> all
> > partitions"  - I think this is fine for multi-level partitions, e.g.
> PARTITION
> > (ds='2022-06-01') means two partitions: PARTITION (ds='2022-06-01', hr=1)
> > and PARTITION (ds='2022-06-01', hr=2), because it will save a lot of code
> > and therefore help developer work more efficiently. If we use this rule
> for
> > top level partitions, It might not be strong enough to avoid human
> > error, e.g. developer might trigger huge selection on the table with many
> > partitions, when he forgot to write the partition in the ANALYZE TABLE
> > script. In this case, I would suggest using FOR ALL PARTITIONS explicitly
> > just like FOR ALL COLUMNS.
> >
> > Best regards,
> > Jing
> >
> >
> > On Wed, Jun 15, 2022 at 10:16 AM godfrey he  wrote:
> >
> > > Hi Jark,
> > >
> > > Thanks for the inputs.
> > >
> > > >Do we need to provide DESC EXTENDED  statement like
> Spark[1]
> > > to
> > > >show statistic for table/partition/columns?
> > > We do have supported `DESC EXTENDED` syntax, but currently only table
> > > schema
> > > will be display, I think we just need a JIRA to support it.
> > >
> > > > is it possible to ignore execution mode and force using batch mode
> for
> > > the statement?
> > > As I replied above, The semantics of `ANALYZE TABLE` does not
> > > distinguish batch and streaming,
> > > It works for both batch and streaming, but the result of unbounded
> > > sources is meaningless.
> > > Currently, I throw exception for streaming mode,
> > > and we can support streaming mode with bounded source in the future.
> > >
> > > Best,
> > > Godfrey
> > >
> > > Jark Wu  于2022年6月14日周二 17:56写道:
> > > >
> > > > Hi Godfrey, thanks for starting this discussion, this is a great
> feature
> > > > for batch users.
> > > >
> > > > The FLIP looks good to me in general.
> > > >
> > > > I only have 2 comments:
> > > >
> > > > 1) How do users know whether the given table or partition contains
> > > required
> > > > statistics?
> > > > Do we need to provide DESC EXTENDED  statement like
> Spark[1]
> > > to
> > > > show statistic for table/partition/columns?
> > > >
> > > > 2) If ANALYZE TABLE can only run in batch mode, is it possible to
> ignore
> > > > execution mode
> > > > and force using batch mode for the statement? From my perspective,
> > > ANALYZE
> > > > TABLE
> > > > is an auxiliary statement similar to SHOW TABLES but heavier, which
> > > doesn't
> > > > care about
> > > > environment execution mode.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > [1]:
> > > >
> > >
> https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-aux-analyze-table.html
> > > >
> > > > On Tue, 14 Jun 2022 at 13:52, Jing Ge  wrote:
> > > >
> > > > > Hi 华宗
> > > > >
> > > > > 退订请发送任意消息至dev-unsubscr...@flink.apache.org
> > > > > In order to unsubscribe, please send an email to
> > > > > dev-unsubscr...@flink.apache.org
> > > > >
> > > > > Thanks
> > > > >
> > > > > Best regards,
> > > > > Jing
> > > > >
> > > > >
> > > > > On Tue, Jun 14, 2022 at 2:05 AM 华宗  wrote:
> > > > >
> > > > > > 退订
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > At 2022-06-13 22:44:24, "cao zou"  wrote:
> > > > > > >Hi godfrey, thanks for your detail explanation.
> > > > > > >After explaining and glancing over the FLIP-231, I think it is
> > > > 

[DISCUSS] FLIP-244: Support IterativeCondition with Accumulator in CEP Pattern

2022-06-22 Thread md peng
Hi everyone,

IterativeCondition defines a user-defined condition that decides if an
element should be accepted in the pattern or not. The condition iterates
over the previously accepted elements in the pattern and decides to accept
a new element or not based on some statistic over elements. In certain
accumulation scenarios, for example filtering goods with more than 1,000
orders within 10 minutes, accumulation operation needs to perform in
IterativeCondition. The accumulation behaivor causes the repeated
calculation of the accumulation state, because an accumulation state may
execute multiple transitions with condition and each condition invoker will
be accumulated once.

I would like to start a discussion about FLIP-244[1], in which
AccumulationStateCondition is proposed to define the IterativeCondition
with accumulation and filter the accumulation state with accumulator. The
accumulation state is consistent within the lifecycle of a matching NFA, on
other words, user doesn't need to pay attention to when the accumulation
state is initialized and cleaned up.

Please take a look at the FLIP page [1] to get more details. Any feedback
about the FLIP-244 would be appreciated!

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-244%3A+Support+IterativeCondition+with+Accumulator+in+CEP+Pattern

Best regards,

Mingde Peng


[jira] [Created] (FLINK-28192) RescaleBucketITCase is not stable

2022-06-22 Thread Jane Chan (Jira)
Jane Chan created FLINK-28192:
-

 Summary: RescaleBucketITCase is not stable
 Key: FLINK-28192
 URL: https://issues.apache.org/jira/browse/FLINK-28192
 Project: Flink
  Issue Type: Bug
  Components: Table Store
Affects Versions: table-store-0.2.0
Reporter: Jane Chan
 Fix For: table-store-0.2.0
 Attachments: image-2022-06-22-15-06-14-271.png

[https://github.com/apache/flink-table-store/runs/6996213019?check_suite_focus=true]

The job's status is not stable

!image-2022-06-22-15-06-14-271.png|width=760,height=88!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [ANNOUNCE] Apache Flink 1.14.5 released

2022-06-22 Thread Martijn Visser
Thank you Xingbo and our community for creating this release!
Op wo 22 jun. 2022 om 05:51 schreef Xingbo Huang :

> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.14.5, which is the fourth bugfix release for the Apache Flink 1.14
> series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the improvements
> for this bugfix release:
> https://flink.apache.org/news/2022/06/22/release-1.14.5.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351388
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Regards,
> Xingbo
>