Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-13 Thread Jingsong Li
Hi Yingjie,

Thanks for your explanation. I have no more questions. +1

On Tue, Dec 14, 2021 at 3:31 PM Yingjie Cao  wrote:
>
> Hi Jingsong,
>
> Thanks for your feedback.
>
> >>> My question is, what is the maximum parallelism a job can have with the 
> >>> default configuration? (Does this break out of the box)
>
> Yes, you are right, these two options are related to network memory and 
> framework off-heap memory. Generally, these changes will not break out of the 
> box experience, but for some extreme cases, for example, there are too many 
> ResultPartitions per task, users may need to increase network memory to avoid 
> "insufficient network buffer" error. For framework off-head, I believe that 
> user do not need to change the default value.
>
> In fact, I have a basic goal when changing these config values: when running 
> TPCDS of medium parallelism with the default value, all queries must pass 
> without any error and at the same time, the performance can be improved. I 
> think if we achieve this goal, most common use cases can be covered.
>
> Currently, for the default configuration, the exclusive buffers required at 
> input gate side is still parallelism relevant (though since 1.14, we can 
> decouple the network buffer consumption from parallelism by setting a config 
> value, it has slight performance influence on streaming jobs), which means 
> that no large parallelism can be supported by the default configuration. 
> Roughly, I would say the default value can support jobs of several hundreds 
> of parallelism.
>
> >>> I do feel that this correspondence is a bit difficult to control at the 
> >>> moment, and it would be best if a rough table could be provided.
>
> I think this is a good suggestion, we can provide those suggestions in the 
> document.
>
> Best,
> Yingjie
>
> Jingsong Li  于2021年12月14日周二 14:39写道:
>>
>> Hi  Yingjie,
>>
>> +1 for this FLIP. I'm pretty sure this will greatly improve the ease
>> of batch jobs.
>>
>> Looks like "taskmanager.memory.framework.off-heap.batch-shuffle.size"
>> and "taskmanager.network.sort-shuffle.min-buffers" are related to
>> network memory and framework.off-heap.size.
>>
>> My question is, what is the maximum parallelism a job can have with
>> the default configuration? (Does this break out of the box)
>>
>> How much network memory and framework.off-heap.size are required for
>> how much parallelism in the default configuration?
>>
>> I do feel that this correspondence is a bit difficult to control at
>> the moment, and it would be best if a rough table could be provided.
>>
>> Best,
>> Jingsong
>>
>> On Tue, Dec 14, 2021 at 2:16 PM Yingjie Cao  wrote:
>> >
>> > Hi Jiangang,
>> >
>> > Thanks for your suggestion.
>> >
>> > >>> The config can affect the memory usage. Will the related memory 
>> > >>> configs be changed?
>> >
>> > I think we will not change the default network memory settings. My best 
>> > expectation is that the default value can work for most cases (though may 
>> > not the best) and for other cases, user may need to tune the memory 
>> > settings.
>> >
>> > >>> Can you share the tpcds results for different configs? Although we 
>> > >>> change the default values, it is helpful to change them for different 
>> > >>> users. In this case, the experience can help a lot.
>> >
>> > I did not keep all previous TPCDS results, but from the results, I can 
>> > tell that on HDD, always using the sort-shuffle is a good choice. For 
>> > small jobs, using sort-shuffle may not bring much performance gain, this 
>> > may because that all shuffle data can be cached in memory (page cache), 
>> > this is the case if the cluster have enough resources. However, if the 
>> > whole cluster is under heavy burden or you are running large scale jobs, 
>> > the performance of those small jobs can also be influenced. For 
>> > large-scale jobs, the configurations suggested to be tuned are 
>> > taskmanager.network.sort-shuffle.min-buffers and 
>> > taskmanager.memory.framework.off-heap.batch-shuffle.size, you can increase 
>> > these values for large-scale batch jobs.
>> >
>> > BTW, I am still running TPCDS tests these days and I can share these 
>> > results soon.
>> >
>> > Best,
>> > Yingjie
>> >
>> > 刘建刚  于2021年12月10日周五 18:30写道:
>> >>
>> >> Glad to see the suggestion. In our test, we found that small jobs with 
>> >> the changing configs can not improve the performance much just as your 
>> >> test. I have some suggestions:
>> >>
>> >> The config can affect the memory usage. Will the related memory configs 
>> >> be changed?
>> >> Can you share the tpcds results for different configs? Although we change 
>> >> the default values, it is helpful to change them for different users. In 
>> >> this case, the experience can help a lot.
>> >>
>> >> Best,
>> >> Liu Jiangang
>> >>
>> >> Yun Gao  于2021年12月10日周五 17:20写道:
>> >>>
>> >>> Hi Yingjie,
>> >>>
>> >>> Very thanks for drafting the FLIP and initiating the discussion!
>> >>>
>> >>> May I have a double 

Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2021-12-13 Thread Konstantin Knauf
Hi Nicholas,

I understand that a Rule contains more than the Pattern. Still, I favor
DynamicPattern[Holder] over Rule, because the term "Rule" does not exist in
Flink's CEP implementation so far and "dynamic" seems to be the important
bit here.

Cheers,

Konstantin

On Tue, Dec 14, 2021 at 4:46 AM Nicholas Jiang 
wrote:

> Hi DianFu,
>
>  Thanks for your feedback of the FLIP.
>
>  About the mentioned question for the `getLatestRules`, IMO, this
> doesn't need to rename into `getRuleChanges` because this method is used
> for getting the total amount of the latest rules which has been updated
> once.
>
>  About the CEP.rule method, the CEP.dynamicPattern renaming is
> confusing for users. The dynamic pattern only creates the PatternStream not
> the DataStream. From the concept, a dynamic pattern is also a pattern, not
> contains the PatternProcessFunction. If renaming the CEP.rule into
> CEP.dynamicPattern, the return value of the method couldn't include the
> PatternProcessFunction, only returns the PatternStream. I think the
> difference between the Rule and the Pattern is that Rule contains the
> PatternProcessFunction, but the Pattern or DynamicPattern doesn't contain
> the function.
>
> Best
> Nicholas Jiang
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-13 Thread Yingjie Cao
Hi Jingsong,

Thanks for your feedback.

>>> My question is, what is the maximum parallelism a job can have with the
default configuration? (Does this break out of the box)

Yes, you are right, these two options are related to network memory and
framework off-heap memory. Generally, these changes will not break out of
the box experience, but for some extreme cases, for example, there are too
many ResultPartitions per task, users may need to increase network memory
to avoid "insufficient network buffer" error. For framework off-head, I
believe that user do not need to change the default value.

In fact, I have a basic goal when changing these config values: when
running TPCDS of medium parallelism with the default value, all queries
must pass without any error and at the same time, the performance can be
improved. I think if we achieve this goal, most common use cases can be
covered.

Currently, for the default configuration, the exclusive buffers required at
input gate side is still parallelism relevant (though since 1.14, we
can decouple the network buffer consumption from parallelism by setting a
config value, it has slight performance influence on streaming jobs), which
means that no large parallelism can be supported by the default
configuration. Roughly, I would say the default value can support jobs of
several hundreds of parallelism.

>>> I do feel that this correspondence is a bit difficult to control at the
moment, and it would be best if a rough table could be provided.

I think this is a good suggestion, we can provide those suggestions in the
document.

Best,
Yingjie

Jingsong Li  于2021年12月14日周二 14:39写道:

> Hi  Yingjie,
>
> +1 for this FLIP. I'm pretty sure this will greatly improve the ease
> of batch jobs.
>
> Looks like "taskmanager.memory.framework.off-heap.batch-shuffle.size"
> and "taskmanager.network.sort-shuffle.min-buffers" are related to
> network memory and framework.off-heap.size.
>
> My question is, what is the maximum parallelism a job can have with
> the default configuration? (Does this break out of the box)
>
> How much network memory and framework.off-heap.size are required for
> how much parallelism in the default configuration?
>
> I do feel that this correspondence is a bit difficult to control at
> the moment, and it would be best if a rough table could be provided.
>
> Best,
> Jingsong
>
> On Tue, Dec 14, 2021 at 2:16 PM Yingjie Cao 
> wrote:
> >
> > Hi Jiangang,
> >
> > Thanks for your suggestion.
> >
> > >>> The config can affect the memory usage. Will the related memory
> configs be changed?
> >
> > I think we will not change the default network memory settings. My best
> expectation is that the default value can work for most cases (though may
> not the best) and for other cases, user may need to tune the memory
> settings.
> >
> > >>> Can you share the tpcds results for different configs? Although we
> change the default values, it is helpful to change them for different
> users. In this case, the experience can help a lot.
> >
> > I did not keep all previous TPCDS results, but from the results, I can
> tell that on HDD, always using the sort-shuffle is a good choice. For small
> jobs, using sort-shuffle may not bring much performance gain, this may
> because that all shuffle data can be cached in memory (page cache), this is
> the case if the cluster have enough resources. However, if the whole
> cluster is under heavy burden or you are running large scale jobs, the
> performance of those small jobs can also be influenced. For large-scale
> jobs, the configurations suggested to be tuned are
> taskmanager.network.sort-shuffle.min-buffers and
> taskmanager.memory.framework.off-heap.batch-shuffle.size, you can increase
> these values for large-scale batch jobs.
> >
> > BTW, I am still running TPCDS tests these days and I can share these
> results soon.
> >
> > Best,
> > Yingjie
> >
> > 刘建刚  于2021年12月10日周五 18:30写道:
> >>
> >> Glad to see the suggestion. In our test, we found that small jobs with
> the changing configs can not improve the performance much just as your
> test. I have some suggestions:
> >>
> >> The config can affect the memory usage. Will the related memory configs
> be changed?
> >> Can you share the tpcds results for different configs? Although we
> change the default values, it is helpful to change them for different
> users. In this case, the experience can help a lot.
> >>
> >> Best,
> >> Liu Jiangang
> >>
> >> Yun Gao  于2021年12月10日周五 17:20写道:
> >>>
> >>> Hi Yingjie,
> >>>
> >>> Very thanks for drafting the FLIP and initiating the discussion!
> >>>
> >>> May I have a double confirmation for
> taskmanager.network.sort-shuffle.min-parallelism that
> >>> since other frameworks like Spark have used sort-based shuffle for all
> the cases, does our
> >>> current circumstance still have difference with them?
> >>>
> >>> Best,
> >>> Yun
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> From:Yingjie Cao 
> 

[jira] [Created] (FLINK-25294) Incorrect cloudpickle import

2021-12-13 Thread arya (Jira)
arya created FLINK-25294:


 Summary: Incorrect cloudpickle import
 Key: FLINK-25294
 URL: https://issues.apache.org/jira/browse/FLINK-25294
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.14.0
Reporter: arya


In flink-python/pyflink/fn_execution/coder_impl_fast.pyx line 30
{code:python}
from cloudpickle import cloudpickle
{code}
should simply be
{code:python}
import cloudpickle{code}
or else I get AttributeError: module 'cloudpickle.cloudpickle' has no attribute 
'dumps' when using keyed states

I assume this is left over from when cloudpickle was incorrectly packaged, then 
fixed in [FLINK-14556|https://issues.apache.org/jira/browse/FLINK-14556], so 
this might have been a problem since 1.10.0



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-13 Thread Jingsong Li
Hi  Yingjie,

+1 for this FLIP. I'm pretty sure this will greatly improve the ease
of batch jobs.

Looks like "taskmanager.memory.framework.off-heap.batch-shuffle.size"
and "taskmanager.network.sort-shuffle.min-buffers" are related to
network memory and framework.off-heap.size.

My question is, what is the maximum parallelism a job can have with
the default configuration? (Does this break out of the box)

How much network memory and framework.off-heap.size are required for
how much parallelism in the default configuration?

I do feel that this correspondence is a bit difficult to control at
the moment, and it would be best if a rough table could be provided.

Best,
Jingsong

On Tue, Dec 14, 2021 at 2:16 PM Yingjie Cao  wrote:
>
> Hi Jiangang,
>
> Thanks for your suggestion.
>
> >>> The config can affect the memory usage. Will the related memory configs 
> >>> be changed?
>
> I think we will not change the default network memory settings. My best 
> expectation is that the default value can work for most cases (though may not 
> the best) and for other cases, user may need to tune the memory settings.
>
> >>> Can you share the tpcds results for different configs? Although we change 
> >>> the default values, it is helpful to change them for different users. In 
> >>> this case, the experience can help a lot.
>
> I did not keep all previous TPCDS results, but from the results, I can tell 
> that on HDD, always using the sort-shuffle is a good choice. For small jobs, 
> using sort-shuffle may not bring much performance gain, this may because that 
> all shuffle data can be cached in memory (page cache), this is the case if 
> the cluster have enough resources. However, if the whole cluster is under 
> heavy burden or you are running large scale jobs, the performance of those 
> small jobs can also be influenced. For large-scale jobs, the configurations 
> suggested to be tuned are taskmanager.network.sort-shuffle.min-buffers and 
> taskmanager.memory.framework.off-heap.batch-shuffle.size, you can increase 
> these values for large-scale batch jobs.
>
> BTW, I am still running TPCDS tests these days and I can share these results 
> soon.
>
> Best,
> Yingjie
>
> 刘建刚  于2021年12月10日周五 18:30写道:
>>
>> Glad to see the suggestion. In our test, we found that small jobs with the 
>> changing configs can not improve the performance much just as your test. I 
>> have some suggestions:
>>
>> The config can affect the memory usage. Will the related memory configs be 
>> changed?
>> Can you share the tpcds results for different configs? Although we change 
>> the default values, it is helpful to change them for different users. In 
>> this case, the experience can help a lot.
>>
>> Best,
>> Liu Jiangang
>>
>> Yun Gao  于2021年12月10日周五 17:20写道:
>>>
>>> Hi Yingjie,
>>>
>>> Very thanks for drafting the FLIP and initiating the discussion!
>>>
>>> May I have a double confirmation for 
>>> taskmanager.network.sort-shuffle.min-parallelism that
>>> since other frameworks like Spark have used sort-based shuffle for all the 
>>> cases, does our
>>> current circumstance still have difference with them?
>>>
>>> Best,
>>> Yun
>>>
>>>
>>>
>>>
>>> --
>>> From:Yingjie Cao 
>>> Send Time:2021 Dec. 10 (Fri.) 16:17
>>> To:dev ; user ; user-zh 
>>> 
>>> Subject:Re: [DISCUSS] Change some default config values of blocking shuffle
>>>
>>> Hi dev & users:
>>>
>>> I have created a FLIP [1] for it, feedbacks are highly appreciated.
>>>
>>> Best,
>>> Yingjie
>>>
>>> [1] 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-199%3A+Change+some+default+config+values+of+blocking+shuffle+for+better+usability
>>> Yingjie Cao  于2021年12月3日周五 17:02写道:
>>>
>>> Hi dev & users,
>>>
>>> We propose to change some default values of blocking shuffle to improve the 
>>> user out-of-box experience (not influence streaming). The default values we 
>>> want to change are as follows:
>>>
>>> 1. Data compression 
>>> (taskmanager.network.blocking-shuffle.compression.enabled): Currently, the 
>>> default value is 'false'.  Usually, data compression can reduce both disk 
>>> and network IO which is good for performance. At the same time, it can save 
>>> storage space. We propose to change the default value to true.
>>>
>>> 2. Default shuffle implementation 
>>> (taskmanager.network.sort-shuffle.min-parallelism): Currently, the default 
>>> value is 'Integer.MAX', which means by default, Flink jobs will always use 
>>> hash-shuffle. In fact, for high parallelism, sort-shuffle is better for 
>>> both stability and performance. So we propose to reduce the default value 
>>> to a proper smaller one, for example, 128. (We tested 128, 256, 512 and 
>>> 1024 with a tpc-ds and 128 is the best one.)
>>>
>>> 3. Read buffer of sort-shuffle 
>>> (taskmanager.memory.framework.off-heap.batch-shuffle.size): Currently, the 
>>> default value is '32M'. Previously, when choosing the default value, both 
>>> ‘32M' and '64M' 

[jira] [Created] (FLINK-25293) Option to let fail if KafkaSource keeps failing to commit offset

2021-12-13 Thread rerorero (Jira)
rerorero created FLINK-25293:


 Summary: Option to let fail if KafkaSource keeps failing to commit 
offset
 Key: FLINK-25293
 URL: https://issues.apache.org/jira/browse/FLINK-25293
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kafka
Affects Versions: 1.14.0
 Environment: Flink 1.14.0
Reporter: rerorero


Is it possible to let KafkaSource fail if it keeps failing to commit offset?

 

I faced an issue where KafkaSource keeps failing and never recover, while it's 
logging like these logs:

 
{code:java}
2021-12-08 22:18:34,155 INFO  
org.apache.kafka.clients.consumer.internals.AbstractCoordinator [] - [Consumer 
clientId=dbz-mercari-contact-tool-jp-cg-1, 
groupId=dbz-mercari-contact-tool-jp-cg] Group coordinator 
b4-pkc-xmj7g.asia-northeast1.gcp.confluent.cloud:9092 (id: 2147483643 rack: 
null) is unavailable or invalid due to cause: null.isDisconnected: true. 
Rediscovery will be attempted.
2021-12-08 22:18:34,157 WARN  
org.apache.flink.connector.kafka.source.reader.KafkaSourceReader [] - Failed to 
commit consumer offsets for checkpoint 13 {code}
 

This is happening not just once, but a couple of times a week. I found [other 
people reporting the same 
thing](https://lists.apache.org/thread/8l4f2yb4qwysdn1cj1wjk99tfb79kgs2). This 
could possibly be a problem with the Kafka client. It can be resolved by 
restarting the Flink Job.

 

However, Flink Kafka connector doesn't provide an automatic way to save this 
situation. KafkaSource [keeps retrying forever when a retriable error 
occurs](https://github.com/apache/flink/blob/afb29d92c4e76ec6a453459c3d8a08304efec549/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaSourceReader.java#L144-L148],
 even if it is not retriable actually.

Since it sends metrics of the number of times a commit fails, it could be 
automated by monitoring it and restarting the job, but that would mean we need 
to have a new process to manage.

Does it make sense to have KafkaSource have the option like, let the source 
task fail if it keeps failing to commit an offset more than X times?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25292) Azure failed due to unable to fetch some archives

2021-12-13 Thread Yun Gao (Jira)
Yun Gao created FLINK-25292:
---

 Summary: Azure failed due to unable to fetch some archives
 Key: FLINK-25292
 URL: https://issues.apache.org/jira/browse/FLINK-25292
 Project: Flink
  Issue Type: Bug
  Components: Build System / Azure Pipelines
Affects Versions: 1.13.3
Reporter: Yun Gao


{code:java}
/bin/bash --noprofile --norc /__w/_temp/ba0f8961-8595-4ace-b13f-d60e17df8803.sh
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  libio-pty-perl libipc-run-perl
Suggested packages:
  libtime-duration-perl libtimedate-perl
The following NEW packages will be installed:
  libio-pty-perl libipc-run-perl moreutils
0 upgraded, 3 newly installed, 0 to remove and 0 not upgraded.
Need to get 177 kB of archives.
After this operation, 573 kB of additional disk space will be used.
Err:1 http://archive.ubuntu.com/ubuntu xenial/main amd64 libio-pty-perl amd64 
1:1.08-1.1build1
  Could not connect to archive.ubuntu.com:80 (91.189.88.152), connection timed 
out [IP: 91.189.88.152 80]
Err:2 http://archive.ubuntu.com/ubuntu xenial/main amd64 libipc-run-perl all 
0.94-1
  Unable to connect to archive.ubuntu.com:http: [IP: 91.189.88.152 80]
Err:3 http://archive.ubuntu.com/ubuntu xenial/universe amd64 moreutils amd64 
0.57-1
  Unable to connect to archive.ubuntu.com:http: [IP: 91.189.88.152 80]
E: Failed to fetch 
http://archive.ubuntu.com/ubuntu/pool/main/libi/libio-pty-perl/libio-pty-perl_1.08-1.1build1_amd64.deb
  Could not connect to archive.ubuntu.com:80 (91.189.88.152), connection timed 
out [IP: 91.189.88.152 80]

E: Failed to fetch 
http://archive.ubuntu.com/ubuntu/pool/main/libi/libipc-run-perl/libipc-run-perl_0.94-1_all.deb
  Unable to connect to archive.ubuntu.com:http: [IP: 91.189.88.152 80]

E: Failed to fetch 
http://archive.ubuntu.com/ubuntu/pool/universe/m/moreutils/moreutils_0.57-1_amd64.deb
  Unable to connect to archive.ubuntu.com:http: [IP: 91.189.88.152 80]

E: Unable to fetch some archives, maybe run apt-get update or try with 
--fix-missing?
Running command './tools/ci/test_controller.sh kafka/gelly' with a timeout of 
234 minutes.
./tools/azure-pipelines/uploading_watchdog.sh: line 76: ts: command not found
The STDIO streams did not close within 10 seconds of the exit event from 
process '/bin/bash'. This may indicate a child process inherited the STDIO 
streams and has not yet exited.
##[error]Bash exited with code '141'.
 {code}
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=28064=logs=72d4811f-9f0d-5fd0-014a-0bc26b72b642=e424005a-b16e-540f-196d-da062cc19bdf=13



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-13 Thread Yingjie Cao
Hi Jiangang,

Thanks for your suggestion.

>>> The config can affect the memory usage. Will the related memory configs
be changed?

I think we will not change the default network memory settings. My best
expectation is that the default value can work for most cases (though may
not the best) and for other cases, user may need to tune the memory
settings.

>>> Can you share the tpcds results for different configs? Although we
change the default values, it is helpful to change them for different
users. In this case, the experience can help a lot.

I did not keep all previous TPCDS results, but from the results, I can tell
that on HDD, always using the sort-shuffle is a good choice. For small
jobs, using sort-shuffle may not bring much performance gain, this may
because that all shuffle data can be cached in memory (page cache), this is
the case if the cluster have enough resources. However, if the whole
cluster is under heavy burden or you are running large scale jobs, the
performance of those small jobs can also be influenced. For large-scale
jobs, the configurations suggested to be tuned are
taskmanager.network.sort-shuffle.min-buffers and
taskmanager.memory.framework.off-heap.batch-shuffle.size, you can increase
these values for large-scale batch jobs.

BTW, I am still running TPCDS tests these days and I can share these
results soon.

Best,
Yingjie

刘建刚  于2021年12月10日周五 18:30写道:

> Glad to see the suggestion. In our test, we found that small jobs with the
> changing configs can not improve the performance much just as your test. I
> have some suggestions:
>
>- The config can affect the memory usage. Will the related memory
>configs be changed?
>- Can you share the tpcds results for different configs? Although we
>change the default values, it is helpful to change them for different
>users. In this case, the experience can help a lot.
>
> Best,
> Liu Jiangang
>
> Yun Gao  于2021年12月10日周五 17:20写道:
>
>> Hi Yingjie,
>>
>> Very thanks for drafting the FLIP and initiating the discussion!
>>
>> May I have a double confirmation for
>> taskmanager.network.sort-shuffle.min-parallelism that
>> since other frameworks like Spark have used sort-based shuffle for all
>> the cases, does our
>> current circumstance still have difference with them?
>>
>> Best,
>> Yun
>>
>>
>>
>>
>> --
>> From:Yingjie Cao 
>> Send Time:2021 Dec. 10 (Fri.) 16:17
>> To:dev ; user ; user-zh <
>> user...@flink.apache.org>
>> Subject:Re: [DISCUSS] Change some default config values of blocking
>> shuffle
>>
>> Hi dev & users:
>>
>> I have created a FLIP [1] for it, feedbacks are highly appreciated.
>>
>> Best,
>> Yingjie
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-199%3A+Change+some+default+config+values+of+blocking+shuffle+for+better+usability
>> Yingjie Cao  于2021年12月3日周五 17:02写道:
>>
>> Hi dev & users,
>>
>> We propose to change some default values of blocking shuffle to improve
>> the user out-of-box experience (not influence streaming). The default
>> values we want to change are as follows:
>>
>> 1. Data compression
>> (taskmanager.network.blocking-shuffle.compression.enabled): Currently, the
>> default value is 'false'.  Usually, data compression can reduce both disk
>> and network IO which is good for performance. At the same time, it can save
>> storage space. We propose to change the default value to true.
>>
>> 2. Default shuffle implementation
>> (taskmanager.network.sort-shuffle.min-parallelism): Currently, the default
>> value is 'Integer.MAX', which means by default, Flink jobs will always use
>> hash-shuffle. In fact, for high parallelism, sort-shuffle is better for
>> both stability and performance. So we propose to reduce the default value
>> to a proper smaller one, for example, 128. (We tested 128, 256, 512 and
>> 1024 with a tpc-ds and 128 is the best one.)
>>
>> 3. Read buffer of sort-shuffle
>> (taskmanager.memory.framework.off-heap.batch-shuffle.size): Currently, the
>> default value is '32M'. Previously, when choosing the default value, both
>> ‘32M' and '64M' are OK for tests and we chose the smaller one in a cautious
>> way. However, recently, it is reported in the mailing list that the default
>> value is not enough which caused a buffer request timeout issue. We already
>> created a ticket to improve the behavior. At the same time, we propose to
>> increase this default value to '64M' which can also help.
>>
>> 4. Sort buffer size of sort-shuffle
>> (taskmanager.network.sort-shuffle.min-buffers): Currently, the default
>> value is '64' which means '64' network buffers (32k per buffer by default).
>> This default value is quite modest and the performance can be influenced.
>> We propose to increase this value to a larger one, for example, 512 (the
>> default TM and network buffer configuration can serve more than 10 result
>> partitions concurrently).
>>
>> We already tested these default values together with 

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-13 Thread Yingjie Cao
Hi Yun,

Thanks for your feedback.

I think setting taskmanager.network.sort-shuffle.min-parallelism to 1 and
using sort-shuffle for all cases by default is a good suggestion. I am not
choosing this value mainly because two reasons:

1. The first one is that it increases the usage of network memory which may
cause "insufficient network buffer" exception and user may have to increase
the total network buffers.
2. There are several (not many) TPCDS queries suffers some performance
regression on SSD.

For the first issue, I will test more settings on tpcds and see the
influence. For the second issue, I will try to find the cause and solve it
in 1.15.

I am open for your suggestion, but I still need some more tests and
analysis to guarantee that it works well.

Best,
Yingjie

Yun Gao  于2021年12月10日周五 17:19写道:

> Hi Yingjie,
>
> Very thanks for drafting the FLIP and initiating the discussion!
>
> May I have a double confirmation for 
> taskmanager.network.sort-shuffle.min-parallelism
> that
> since other frameworks like Spark have used sort-based shuffle for all the
> cases, does our
> current circumstance still have difference with them?
>
> Best,
> Yun
>
>
>
> --
> From:Yingjie Cao 
> Send Time:2021 Dec. 10 (Fri.) 16:17
> To:dev ; user ; user-zh <
> user...@flink.apache.org>
> Subject:Re: [DISCUSS] Change some default config values of blocking shuffle
>
> Hi dev & users:
>
> I have created a FLIP [1] for it, feedbacks are highly appreciated.
>
> Best,
> Yingjie
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-199%3A+Change+some+default+config+values+of+blocking+shuffle+for+better+usability
>
> Yingjie Cao  于2021年12月3日周五 17:02写道:
> Hi dev & users,
>
> We propose to change some default values of blocking shuffle to improve
> the user out-of-box experience (not influence streaming). The default
> values we want to change are as follows:
>
> 1. Data compression
> (taskmanager.network.blocking-shuffle.compression.enabled): Currently, the
> default value is 'false'.  Usually, data compression can reduce both disk
> and network IO which is good for performance. At the same time, it can save
> storage space. We propose to change the default value to true.
>
> 2. Default shuffle implementation
> (taskmanager.network.sort-shuffle.min-parallelism): Currently, the default
> value is 'Integer.MAX', which means by default, Flink jobs will always use
> hash-shuffle. In fact, for high parallelism, sort-shuffle is better for
> both stability and performance. So we propose to reduce the default value
> to a proper smaller one, for example, 128. (We tested 128, 256, 512 and
> 1024 with a tpc-ds and 128 is the best one.)
>
> 3. Read buffer of sort-shuffle
> (taskmanager.memory.framework.off-heap.batch-shuffle.size): Currently, the
> default value is '32M'. Previously, when choosing the default value, both
> ‘32M' and '64M' are OK for tests and we chose the smaller one in a cautious
> way. However, recently, it is reported in the mailing list that the default
> value is not enough which caused a buffer request timeout issue. We already
> created a ticket to improve the behavior. At the same time, we propose to
> increase this default value to '64M' which can also help.
>
> 4. Sort buffer size of sort-shuffle
> (taskmanager.network.sort-shuffle.min-buffers): Currently, the default
> value is '64' which means '64' network buffers (32k per buffer by default).
> This default value is quite modest and the performance can be influenced.
> We propose to increase this value to a larger one, for example, 512 (the
> default TM and network buffer configuration can serve more than 10
> result partitions concurrently).
>
> We already tested these default values together with tpc-ds benchmark in a
> cluster and both the performance and stability improved a lot. These
> changes can help to improve the out-of-box experience of blocking shuffle.
> What do you think about these changes? Is there any concern? If there are
> no objections, I will make these changes soon.
>
> Best,
> Yingjie
>
>
>


[jira] [Created] (FLINK-25291) Add failure cases in DataStream source and sink suite of connector testing framework

2021-12-13 Thread Qingsheng Ren (Jira)
Qingsheng Ren created FLINK-25291:
-

 Summary: Add failure cases in DataStream source and sink suite of 
connector testing framework
 Key: FLINK-25291
 URL: https://issues.apache.org/jira/browse/FLINK-25291
 Project: Flink
  Issue Type: Sub-task
  Components: Test Infrastructure
Reporter: Qingsheng Ren
 Fix For: 1.15.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25290) Add table source and sink test suite in connector testing framework

2021-12-13 Thread Qingsheng Ren (Jira)
Qingsheng Ren created FLINK-25290:
-

 Summary: Add table source and sink test suite in connector testing 
framework
 Key: FLINK-25290
 URL: https://issues.apache.org/jira/browse/FLINK-25290
 Project: Flink
  Issue Type: Sub-task
Reporter: Qingsheng Ren
 Fix For: 1.15.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25288) Add savepoint and metric cases in DataStream source suite of connector testing framework

2021-12-13 Thread Qingsheng Ren (Jira)
Qingsheng Ren created FLINK-25288:
-

 Summary: Add savepoint and metric cases in DataStream source suite 
of connector testing framework
 Key: FLINK-25288
 URL: https://issues.apache.org/jira/browse/FLINK-25288
 Project: Flink
  Issue Type: Sub-task
  Components: Test Infrastructure
Reporter: Qingsheng Ren
 Fix For: 1.15.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25289) Add DataStream sink test suite in connector testing framework

2021-12-13 Thread Qingsheng Ren (Jira)
Qingsheng Ren created FLINK-25289:
-

 Summary: Add DataStream sink test suite in connector testing 
framework
 Key: FLINK-25289
 URL: https://issues.apache.org/jira/browse/FLINK-25289
 Project: Flink
  Issue Type: Sub-task
  Components: Test Infrastructure
Reporter: Qingsheng Ren






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25287) Refactor connector testing framework interfaces

2021-12-13 Thread Qingsheng Ren (Jira)
Qingsheng Ren created FLINK-25287:
-

 Summary: Refactor connector testing framework interfaces
 Key: FLINK-25287
 URL: https://issues.apache.org/jira/browse/FLINK-25287
 Project: Flink
  Issue Type: Sub-task
  Components: Test Infrastructure
Reporter: Qingsheng Ren
 Fix For: 1.15.0


A refactor in connector testing framework interfaces is required to support 
more test scenarios and cases such as sinks and Table / SQL API.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25286) Improve connector testing framework to support more scenarios

2021-12-13 Thread Qingsheng Ren (Jira)
Qingsheng Ren created FLINK-25286:
-

 Summary: Improve connector testing framework to support more 
scenarios
 Key: FLINK-25286
 URL: https://issues.apache.org/jira/browse/FLINK-25286
 Project: Flink
  Issue Type: Improvement
  Components: Test Infrastructure
Reporter: Qingsheng Ren
 Fix For: 1.15.0


Currently connector testing framework only support tests for DataStream 
sources, and available scenarios are quite limited by current interface design. 

This ticket proposes to made improvements to connector testing framework for 
supporting more test scenarios, and add test suites for sink and Table/SQL API.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [Proposal] It is hoped that Apache APISIX and Apache Flink will carry out diversified community cooperation

2021-12-13 Thread Ming Wen
Hi, yeliang,
Apache ML does not support pictures and attachments, so we can not see
the architecture diagram.
I think it will work between Flink and APISIX from your description.

Thanks,
Ming Wen, Apache APISIX PMC Chair
Twitter: _WenMing


yeliang wang  于2021年12月14日周二 12:05写道:

> Hi, community,
>
> My name is Yeliang Wang, and I am Apache APISIX Committer.
>
> In my knowledge field, Apache Flink as a distributed real-time processing
> engine for stream data and batch data with high throughput and low latency,
> has a very rich application scenario in enterprises.
>
> For example, in combination with openresty or nginx, there are two
> application scenarios:
>
>1. client -> gateway -> kafka -> flink,This is the mainstream
>application scenario (data analysis scenario) in the industry.
>2. client -> gateway -> flink(REST API),Flink has some rest APIs,
>which can be used to complete current limiting, authentication and other
>scenarios.
>
> [image: 飞书20211214-105403.png]
> Compared with OpenResty or Nginx, Apache APISIX provides rich traffic
> management features like Load Balancing, Dynamic Upstream, Canary Release,
> Circuit Breaking, Authentication, Observability, and more... so Apache
> APISIX already supports Kafka on plugin(
> https://apisix.apache.org/docs/apisix/plugins/kafka-logger/ )
>
> so, we to develop more community cooperation with Apache Flink.
>
>1. Hold a wonderful technical meetup together
>2. Collaborative output technology blog to share with more people
>3. Carry out publicity activities on the official website and media
>channels together
>4. Jointly develop Flink plugin support to APISIX
>
>
> I believe in doing so, it can not only meet the diversified needs of
> users, but also enrich the surrounding ecology of Apache Flink and Apache
> APISIX.
>
> Wait for more discussion.
>
> Thanks,
> Github: wang-yeliang, Twitter: @WYeliang
>


[Proposal] It is hoped that Apache APISIX and Apache Flink will carry out diversified community cooperation

2021-12-13 Thread yeliang wang
Hi, community,

My name is Yeliang Wang, and I am Apache APISIX Committer.

In my knowledge field, Apache Flink as a distributed real-time processing
engine for stream data and batch data with high throughput and low latency,
has a very rich application scenario in enterprises.

For example, in combination with openresty or nginx, there are two
application scenarios:

   1. client -> gateway -> kafka -> flink,This is the mainstream
   application scenario (data analysis scenario) in the industry.
   2. client -> gateway -> flink(REST API),Flink has some rest APIs, which
   can be used to complete current limiting, authentication and other
   scenarios.

[image: 飞书20211214-105403.png]
Compared with OpenResty or Nginx, Apache APISIX provides rich traffic
management features like Load Balancing, Dynamic Upstream, Canary Release,
Circuit Breaking, Authentication, Observability, and more... so Apache
APISIX already supports Kafka on plugin(
https://apisix.apache.org/docs/apisix/plugins/kafka-logger/ )

so, we to develop more community cooperation with Apache Flink.

   1. Hold a wonderful technical meetup together
   2. Collaborative output technology blog to share with more people
   3. Carry out publicity activities on the official website and media
   channels together
   4. Jointly develop Flink plugin support to APISIX


I believe in doing so, it can not only meet the diversified needs of users,
but also enrich the surrounding ecology of Apache Flink and Apache APISIX.

Wait for more discussion.

Thanks,
Github: wang-yeliang, Twitter: @WYeliang


Re: [VOTE] Release 1.11.5/1.12.6/1.13.4/1.14.1, release candidate #1

2021-12-13 Thread Yun Tang
Hi Chesnay,

Thanks a lot for driving these emergency patch releases!

I just noticed that current flink-1.11.4 offers python files on mac os [1]. Is 
it okay to release Flink-1.11.5 and flink-1.12.6 without those python binaries 
on mac os?


[1] https://pypi.org/project/apache-flink/1.11.4/#files

Best
Yun Tang

From: Zhu Zhu 
Sent: Tuesday, December 14, 2021 11:00
To: dev 
Subject: Re: [VOTE] Release 1.11.5/1.12.6/1.13.4/1.14.1, release candidate #1

+1 (binding)

- verified the differences of source releases to the corresponding latest
releases, there are only dependency updates and release version update
commits
- verified versions of log4j dependencies in the all binary releases are
2.15.0
- ran example jobs against all the binary releases, logs look good
- release notes and blogpost look good

Thanks,
Zhu

Xintong Song  于2021年12月14日周二 10:23写道:

> +1 (binding)
>
> - verified checksum and signature
> - verified that release candidates only contain the log4j dependency
> changes compared to previous releases.
> - release notes and blogpost LGTM
>
> Thanks a lot for driving these emergency patch releases, Chesnay!
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Dec 14, 2021 at 7:45 AM Chesnay Schepler 
> wrote:
>
> > I forgot to mention something important:
> >
> > The 1.11/1.12 releases do *NOT* contain flink-python releases for *mac*
> > due to compile problems.
> >
> > On 13/12/2021 20:28, Chesnay Schepler wrote:
> > > Hi everyone,
> > >
> > > This vote is for the emergency patch releases for 1.11, 1.12, 1.13 and
> > > 1.14 to address CVE-2021-44228.
> > > It covers all 4 releases as they contain the same changes (upgrading
> > > Log4j to 2.15.0) and were prepared simultaneously by the same person.
> > > (Hence, if something is broken, it likely applies to all releases)
> > >
> > > Please review and vote on the release candidate #1 for the versions
> > > 1.11.5, 1.12.6, 1.13.4 and 1.14.1, as follows:
> > > [ ] +1, Approve the releases
> > > [ ] -1, Do not approve the releases (please provide specific comments)
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source releases and binary convenience releases
> > > to be deployed to dist.apache.org [2], which are signed with the key
> > > with fingerprint C2EED7B111D464BA [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * *the jars for 1.13/1.14 are still being built*
> > > * source code tags [5],
> > > * website pull request listing the new releases and adding
> > > announcement blog post [6].
> > >
> > > The vote will be open for at least 24 hours. The minimum vote time has
> > > been shortened as the changes are minimal and the matter is urgent.
> > > It is adopted by majority approval, with at least 3 PMC affirmative
> > > votes.
> > >
> > > Thanks,
> > > Chesnay
> > >
> > > [1]
> > > 1.11:
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350327
> > > 1.12:
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350328
> > > 1.13:
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350686
> > > 1.14:
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350512
> > > [2]
> > > 1.11: https://dist.apache.org/repos/dist/dev/flink/flink-1.11.5-rc1/
> > > 1.12: https://dist.apache.org/repos/dist/dev/flink/flink-1.12.6-rc1/
> > > 1.13: https://dist.apache.org/repos/dist/dev/flink/flink-1.13.4-rc1/
> > > 1.14: https://dist.apache.org/repos/dist/dev/flink/flink-1.14.1-rc1/
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > > 1.11/1.12:
> > > https://repository.apache.org/content/repositories/orgapacheflink-1455
> > > 1.13:
> > > https://repository.apache.org/content/repositories/orgapacheflink-1457
> > > 1.14:
> > > https://repository.apache.org/content/repositories/orgapacheflink-1456
> > > [5]
> > > 1.11: https://github.com/apache/flink/releases/tag/release-1.11.5-rc1
> > > 1.12: https://github.com/apache/flink/releases/tag/release-1.12.6-rc1
> > > 1.13: https://github.com/apache/flink/releases/tag/release-1.13.4-rc1
> > > 1.14: https://github.com/apache/flink/releases/tag/release-1.14.1-rc1
> > > [6] https://github.com/apache/flink-web/pull/489
> > >
> >
>


Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2021-12-13 Thread Nicholas Jiang
Hi DianFu,
 
 Thanks for your feedback of the FLIP. 
 
 About the mentioned question for the `getLatestRules`, IMO, this doesn't 
need to rename into `getRuleChanges` because this method is used for getting 
the total amount of the latest rules which has been updated once. 
 
 About the CEP.rule method, the CEP.dynamicPattern renaming is confusing 
for users. The dynamic pattern only creates the PatternStream not the 
DataStream. From the concept, a dynamic pattern is also a pattern, not contains 
the PatternProcessFunction. If renaming the CEP.rule into CEP.dynamicPattern, 
the return value of the method couldn't include the  PatternProcessFunction, 
only returns the PatternStream. I think the difference between the Rule and the 
Pattern is that Rule contains the PatternProcessFunction, but the Pattern or 
DynamicPattern doesn't contain the function.

Best
Nicholas Jiang


Re: [DISCUSS] Immediate dedicated Flink releases for log4j vulnerability

2021-12-13 Thread Prasanna kumar
It would be good if docker images are released too .

Prasanna.

On Mon, 13 Dec 2021, 16:16 Jing Zhang,  wrote:

> +1 for the quick release.
>
> Till Rohrmann  于2021年12月13日周一 17:54写道:
>
> > +1
> >
> > Cheers,
> > Till
> >
> > On Mon, Dec 13, 2021 at 10:42 AM Jing Ge  wrote:
> >
> > > +1
> > >
> > > As I suggested to publish the blog post last week asap, users have been
> > > keen to have such urgent releases. Many thanks for it.
> > >
> > >
> > >
> > > On Mon, Dec 13, 2021 at 8:29 AM Konstantin Knauf 
> > > wrote:
> > >
> > > > +1
> > > >
> > > > I didn't think this was necessary when I published the blog post on
> > > Friday,
> > > > but this has made higher waves than I expected over the weekend.
> > > >
> > > >
> > > >
> > > > On Mon, Dec 13, 2021 at 8:23 AM Yuan Mei 
> > wrote:
> > > >
> > > > > +1 for quick release.
> > > > >
> > > > > On Mon, Dec 13, 2021 at 2:55 PM Martijn Visser <
> > mart...@ververica.com>
> > > > > wrote:
> > > > >
> > > > > > +1 to address the issue like this
> > > > > >
> > > > > > On Mon, 13 Dec 2021 at 07:46, Jingsong Li <
> jingsongl...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > +1 for fixing it in these versions and doing quick releases.
> > Looks
> > > > good
> > > > > > to
> > > > > > > me.
> > > > > > >
> > > > > > > Best,
> > > > > > > Jingsong
> > > > > > >
> > > > > > > On Mon, Dec 13, 2021 at 2:18 PM Becket Qin <
> becket@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > +1. The solution sounds good to me. There have been a lot of
> > > > > inquiries
> > > > > > > > about how to react to this.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > On Mon, Dec 13, 2021 at 12:40 PM Prasanna kumar <
> > > > > > > > prasannakumarram...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > 1+ for making Updates for 1.12.5 .
> > > > > > > > > We are looking for fix in 1.12 version.
> > > > > > > > > Please notify once the fix is done.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Dec 13, 2021 at 9:45 AM Leonard Xu <
> > xbjt...@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 for the quick release and the special vote period 24h.
> > > > > > > > > >
> > > > > > > > > > > 2021年12月13日 上午11:49,Dian Fu 
> 写道:
> > > > > > > > > > >
> > > > > > > > > > > +1 for the proposal and creating a quick release.
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Dian
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Dec 13, 2021 at 11:15 AM Kyle Bendickson <
> > > > > > k...@tabular.io>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> +1 to doing a release for this widely publicized
> > > > > vulnerability.
> > > > > > > > > > >>
> > > > > > > > > > >> In my experience, users will often update to the
> latest
> > > > minor
> > > > > > > patch
> > > > > > > > > > version
> > > > > > > > > > >> without much fuss. Plus, users have also likely heard
> > > about
> > > > > this
> > > > > > > and
> > > > > > > > > > will
> > > > > > > > > > >> appreciate a simple fix (updating their version where
> > > > > possible).
> > > > > > > > > > >>
> > > > > > > > > > >> The work-around will need to still be noted for users
> > who
> > > > > can’t
> > > > > > > > > upgrade
> > > > > > > > > > for
> > > > > > > > > > >> whatever reason (EMR hasn’t caught up, etc).
> > > > > > > > > > >>
> > > > > > > > > > >> I also agree with your assessment to apply a patch on
> > each
> > > > of
> > > > > > > those
> > > > > > > > > > >> previous versions with only the log4j commit, so that
> > they
> > > > > don’t
> > > > > > > need
> > > > > > > > > > to be
> > > > > > > > > > >> as rigorously tested.
> > > > > > > > > > >>
> > > > > > > > > > >> Best,
> > > > > > > > > > >> Kyle (GitHub @kbendick)
> > > > > > > > > > >>
> > > > > > > > > > >> On Sun, Dec 12, 2021 at 2:23 PM Stephan Ewen <
> > > > > se...@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > > >>
> > > > > > > > > > >>> Hi all!
> > > > > > > > > > >>>
> > > > > > > > > > >>> Without doubt, you heard about the log4j
> vulnerability
> > > [1].
> > > > > > > > > > >>>
> > > > > > > > > > >>> There is an advisory blog post on how to mitigate
> this
> > in
> > > > > > Apache
> > > > > > > > > Flink
> > > > > > > > > > >> [2],
> > > > > > > > > > >>> which involves setting a config option and restarting
> > the
> > > > > > > processes.
> > > > > > > > > > That
> > > > > > > > > > >>> is fortunately a relatively simple fix.
> > > > > > > > > > >>>
> > > > > > > > > > >>> Despite this workaround, I think we should do an
> > > immediate
> > > > > > > release
> > > > > > > > > with
> > > > > > > > > > >> the
> > > > > > > > > > >>> updated dependency. Meaning not waiting for the next
> > bug
> > > > fix
> > > > > > > releases
> > > > > > > > > > >>> coming in a few weeks, but releasing asap.
> > > > > > > > > > >>> The mood I perceive in the industry is 

Re: [VOTE] Release 1.11.5/1.12.6/1.13.4/1.14.1, release candidate #1

2021-12-13 Thread Zhu Zhu
+1 (binding)

- verified the differences of source releases to the corresponding latest
releases, there are only dependency updates and release version update
commits
- verified versions of log4j dependencies in the all binary releases are
2.15.0
- ran example jobs against all the binary releases, logs look good
- release notes and blogpost look good

Thanks,
Zhu

Xintong Song  于2021年12月14日周二 10:23写道:

> +1 (binding)
>
> - verified checksum and signature
> - verified that release candidates only contain the log4j dependency
> changes compared to previous releases.
> - release notes and blogpost LGTM
>
> Thanks a lot for driving these emergency patch releases, Chesnay!
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Dec 14, 2021 at 7:45 AM Chesnay Schepler 
> wrote:
>
> > I forgot to mention something important:
> >
> > The 1.11/1.12 releases do *NOT* contain flink-python releases for *mac*
> > due to compile problems.
> >
> > On 13/12/2021 20:28, Chesnay Schepler wrote:
> > > Hi everyone,
> > >
> > > This vote is for the emergency patch releases for 1.11, 1.12, 1.13 and
> > > 1.14 to address CVE-2021-44228.
> > > It covers all 4 releases as they contain the same changes (upgrading
> > > Log4j to 2.15.0) and were prepared simultaneously by the same person.
> > > (Hence, if something is broken, it likely applies to all releases)
> > >
> > > Please review and vote on the release candidate #1 for the versions
> > > 1.11.5, 1.12.6, 1.13.4 and 1.14.1, as follows:
> > > [ ] +1, Approve the releases
> > > [ ] -1, Do not approve the releases (please provide specific comments)
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source releases and binary convenience releases
> > > to be deployed to dist.apache.org [2], which are signed with the key
> > > with fingerprint C2EED7B111D464BA [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * *the jars for 1.13/1.14 are still being built*
> > > * source code tags [5],
> > > * website pull request listing the new releases and adding
> > > announcement blog post [6].
> > >
> > > The vote will be open for at least 24 hours. The minimum vote time has
> > > been shortened as the changes are minimal and the matter is urgent.
> > > It is adopted by majority approval, with at least 3 PMC affirmative
> > > votes.
> > >
> > > Thanks,
> > > Chesnay
> > >
> > > [1]
> > > 1.11:
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350327
> > > 1.12:
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350328
> > > 1.13:
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350686
> > > 1.14:
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350512
> > > [2]
> > > 1.11: https://dist.apache.org/repos/dist/dev/flink/flink-1.11.5-rc1/
> > > 1.12: https://dist.apache.org/repos/dist/dev/flink/flink-1.12.6-rc1/
> > > 1.13: https://dist.apache.org/repos/dist/dev/flink/flink-1.13.4-rc1/
> > > 1.14: https://dist.apache.org/repos/dist/dev/flink/flink-1.14.1-rc1/
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > > 1.11/1.12:
> > > https://repository.apache.org/content/repositories/orgapacheflink-1455
> > > 1.13:
> > > https://repository.apache.org/content/repositories/orgapacheflink-1457
> > > 1.14:
> > > https://repository.apache.org/content/repositories/orgapacheflink-1456
> > > [5]
> > > 1.11: https://github.com/apache/flink/releases/tag/release-1.11.5-rc1
> > > 1.12: https://github.com/apache/flink/releases/tag/release-1.12.6-rc1
> > > 1.13: https://github.com/apache/flink/releases/tag/release-1.13.4-rc1
> > > 1.14: https://github.com/apache/flink/releases/tag/release-1.14.1-rc1
> > > [6] https://github.com/apache/flink-web/pull/489
> > >
> >
>


Re: [VOTE] Release 1.11.5/1.12.6/1.13.4/1.14.1, release candidate #1

2021-12-13 Thread Xintong Song
+1 (binding)

- verified checksum and signature
- verified that release candidates only contain the log4j dependency
changes compared to previous releases.
- release notes and blogpost LGTM

Thanks a lot for driving these emergency patch releases, Chesnay!

Thank you~

Xintong Song



On Tue, Dec 14, 2021 at 7:45 AM Chesnay Schepler  wrote:

> I forgot to mention something important:
>
> The 1.11/1.12 releases do *NOT* contain flink-python releases for *mac*
> due to compile problems.
>
> On 13/12/2021 20:28, Chesnay Schepler wrote:
> > Hi everyone,
> >
> > This vote is for the emergency patch releases for 1.11, 1.12, 1.13 and
> > 1.14 to address CVE-2021-44228.
> > It covers all 4 releases as they contain the same changes (upgrading
> > Log4j to 2.15.0) and were prepared simultaneously by the same person.
> > (Hence, if something is broken, it likely applies to all releases)
> >
> > Please review and vote on the release candidate #1 for the versions
> > 1.11.5, 1.12.6, 1.13.4 and 1.14.1, as follows:
> > [ ] +1, Approve the releases
> > [ ] -1, Do not approve the releases (please provide specific comments)
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source releases and binary convenience releases
> > to be deployed to dist.apache.org [2], which are signed with the key
> > with fingerprint C2EED7B111D464BA [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * *the jars for 1.13/1.14 are still being built*
> > * source code tags [5],
> > * website pull request listing the new releases and adding
> > announcement blog post [6].
> >
> > The vote will be open for at least 24 hours. The minimum vote time has
> > been shortened as the changes are minimal and the matter is urgent.
> > It is adopted by majority approval, with at least 3 PMC affirmative
> > votes.
> >
> > Thanks,
> > Chesnay
> >
> > [1]
> > 1.11:
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350327
> > 1.12:
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350328
> > 1.13:
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350686
> > 1.14:
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350512
> > [2]
> > 1.11: https://dist.apache.org/repos/dist/dev/flink/flink-1.11.5-rc1/
> > 1.12: https://dist.apache.org/repos/dist/dev/flink/flink-1.12.6-rc1/
> > 1.13: https://dist.apache.org/repos/dist/dev/flink/flink-1.13.4-rc1/
> > 1.14: https://dist.apache.org/repos/dist/dev/flink/flink-1.14.1-rc1/
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> > 1.11/1.12:
> > https://repository.apache.org/content/repositories/orgapacheflink-1455
> > 1.13:
> > https://repository.apache.org/content/repositories/orgapacheflink-1457
> > 1.14:
> > https://repository.apache.org/content/repositories/orgapacheflink-1456
> > [5]
> > 1.11: https://github.com/apache/flink/releases/tag/release-1.11.5-rc1
> > 1.12: https://github.com/apache/flink/releases/tag/release-1.12.6-rc1
> > 1.13: https://github.com/apache/flink/releases/tag/release-1.13.4-rc1
> > 1.14: https://github.com/apache/flink/releases/tag/release-1.14.1-rc1
> > [6] https://github.com/apache/flink-web/pull/489
> >
>


[jira] [Created] (FLINK-25285) CoGroupedStreams has inner Maps without easy ways to set uid

2021-12-13 Thread Daniel Bosnic Hill (Jira)
Daniel Bosnic Hill created FLINK-25285:
--

 Summary: CoGroupedStreams has inner Maps without easy ways to set 
uid
 Key: FLINK-25285
 URL: https://issues.apache.org/jira/browse/FLINK-25285
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.14.0
Reporter: Daniel Bosnic Hill


I tried to use CoGroupedStreams w/ disableAutoGeneratedUIDs.  CoGroupedStreams 
creates two map operators without the ability to set uids on them.  These 
appear as "Map" in my operator graph.  I noticed that the 
CoGroupedStreams.apply function has two map calls without setting uids.  If I 
try to run with disableAutoGeneratedUIDs, I get the following error 
"java.lang.IllegalStateException: Auto generated UIDs have been disabled but no 
UID or hash has been assigned to operator Map".

>From Flink user group email thread "CoGroupedStreams and 
>disableAutoGeneratedUIDs".

https://github.com/apache/flink/blob/release-1.14/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/CoGroupedStreams.java#L379



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Release 1.11.5/1.12.6/1.13.4/1.14.1, release candidate #1

2021-12-13 Thread Chesnay Schepler

I forgot to mention something important:

The 1.11/1.12 releases do *NOT* contain flink-python releases for *mac* 
due to compile problems.


On 13/12/2021 20:28, Chesnay Schepler wrote:

Hi everyone,

This vote is for the emergency patch releases for 1.11, 1.12, 1.13 and 
1.14 to address CVE-2021-44228.
It covers all 4 releases as they contain the same changes (upgrading 
Log4j to 2.15.0) and were prepared simultaneously by the same person.

(Hence, if something is broken, it likely applies to all releases)

Please review and vote on the release candidate #1 for the versions 
1.11.5, 1.12.6, 1.13.4 and 1.14.1, as follows:

[ ] +1, Approve the releases
[ ] -1, Do not approve the releases (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source releases and binary convenience releases 
to be deployed to dist.apache.org [2], which are signed with the key 
with fingerprint C2EED7B111D464BA [3],

* all artifacts to be deployed to the Maven Central Repository [4],
    * *the jars for 1.13/1.14 are still being built*
* source code tags [5],
* website pull request listing the new releases and adding 
announcement blog post [6].


The vote will be open for at least 24 hours. The minimum vote time has 
been shortened as the changes are minimal and the matter is urgent.
It is adopted by majority approval, with at least 3 PMC affirmative 
votes.


Thanks,
Chesnay

[1]
1.11: 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350327
1.12: 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350328
1.13: 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350686
1.14: 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350512

[2]
1.11: https://dist.apache.org/repos/dist/dev/flink/flink-1.11.5-rc1/
1.12: https://dist.apache.org/repos/dist/dev/flink/flink-1.12.6-rc1/
1.13: https://dist.apache.org/repos/dist/dev/flink/flink-1.13.4-rc1/
1.14: https://dist.apache.org/repos/dist/dev/flink/flink-1.14.1-rc1/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4]
1.11/1.12: 
https://repository.apache.org/content/repositories/orgapacheflink-1455
1.13: 
https://repository.apache.org/content/repositories/orgapacheflink-1457
1.14: 
https://repository.apache.org/content/repositories/orgapacheflink-1456

[5]
1.11: https://github.com/apache/flink/releases/tag/release-1.11.5-rc1
1.12: https://github.com/apache/flink/releases/tag/release-1.12.6-rc1
1.13: https://github.com/apache/flink/releases/tag/release-1.13.4-rc1
1.14: https://github.com/apache/flink/releases/tag/release-1.14.1-rc1
[6] https://github.com/apache/flink-web/pull/489



Re: [VOTE] Release 1.11.5/1.12.6/1.13.4/1.14.1, release candidate #1

2021-12-13 Thread Chesnay Schepler

Update: All jars are now available.

On 13/12/2021 20:28, Chesnay Schepler wrote:

Hi everyone,

This vote is for the emergency patch releases for 1.11, 1.12, 1.13 and 
1.14 to address CVE-2021-44228.
It covers all 4 releases as they contain the same changes (upgrading 
Log4j to 2.15.0) and were prepared simultaneously by the same person.

(Hence, if something is broken, it likely applies to all releases)

Please review and vote on the release candidate #1 for the versions 
1.11.5, 1.12.6, 1.13.4 and 1.14.1, as follows:

[ ] +1, Approve the releases
[ ] -1, Do not approve the releases (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source releases and binary convenience releases 
to be deployed to dist.apache.org [2], which are signed with the key 
with fingerprint C2EED7B111D464BA [3],

* all artifacts to be deployed to the Maven Central Repository [4],
    * *the jars for 1.13/1.14 are still being built*
* source code tags [5],
* website pull request listing the new releases and adding 
announcement blog post [6].


The vote will be open for at least 24 hours. The minimum vote time has 
been shortened as the changes are minimal and the matter is urgent.
It is adopted by majority approval, with at least 3 PMC affirmative 
votes.


Thanks,
Chesnay

[1]
1.11: 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350327
1.12: 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350328
1.13: 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350686
1.14: 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350512

[2]
1.11: https://dist.apache.org/repos/dist/dev/flink/flink-1.11.5-rc1/
1.12: https://dist.apache.org/repos/dist/dev/flink/flink-1.12.6-rc1/
1.13: https://dist.apache.org/repos/dist/dev/flink/flink-1.13.4-rc1/
1.14: https://dist.apache.org/repos/dist/dev/flink/flink-1.14.1-rc1/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4]
1.11/1.12: 
https://repository.apache.org/content/repositories/orgapacheflink-1455
1.13: 
https://repository.apache.org/content/repositories/orgapacheflink-1457
1.14: 
https://repository.apache.org/content/repositories/orgapacheflink-1456

[5]
1.11: https://github.com/apache/flink/releases/tag/release-1.11.5-rc1
1.12: https://github.com/apache/flink/releases/tag/release-1.12.6-rc1
1.13: https://github.com/apache/flink/releases/tag/release-1.13.4-rc1
1.14: https://github.com/apache/flink/releases/tag/release-1.14.1-rc1
[6] https://github.com/apache/flink-web/pull/489





[jira] [Created] (FLINK-25284) Support nulls in DataGen

2021-12-13 Thread Sergey Nuyanzin (Jira)
Sergey Nuyanzin created FLINK-25284:
---

 Summary: Support nulls in DataGen
 Key: FLINK-25284
 URL: https://issues.apache.org/jira/browse/FLINK-25284
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Reporter: Sergey Nuyanzin


Currently it is impossible to specify that some values should be null sometimes.
It would be nice to have some property something like {{null-rate}} telling how 
often there should be {{null}} value generated



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Release 1.11.5/1.12.6/1.13.4/1.14.1, release candidate #1

2021-12-13 Thread Stephan Ewen
+1 (binding)

 - Verified that commit history is identical to previous release (except
dependency upgrade and release version commit)
 - Verified that the source releases reference updated dependency and
binary releases contain updated dependency
 - Blog post looks good
 - ran bundled examples against 1.14.1 binary release, worked as expected.

On Mon, Dec 13, 2021 at 9:22 PM Seth Wiesman  wrote:

> +1 (non-binding)
>
> - Checked Log4J version and updated license preambles on all releases
> - Verified signatures on sources
> - Reviewed blog post
>
> Seth
>
> On Mon, Dec 13, 2021 at 1:42 PM Jing Ge  wrote:
>
> > +1   LGTM. Many thanks for your effort!
> >
> > On Mon, Dec 13, 2021 at 8:28 PM Chesnay Schepler 
> > wrote:
> >
> > > Hi everyone,
> > >
> > > This vote is for the emergency patch releases for 1.11, 1.12, 1.13 and
> > > 1.14 to address CVE-2021-44228.
> > > It covers all 4 releases as they contain the same changes (upgrading
> > > Log4j to 2.15.0) and were prepared simultaneously by the same person.
> > > (Hence, if something is broken, it likely applies to all releases)
> > >
> > > Please review and vote on the release candidate #1 for the versions
> > > 1.11.5, 1.12.6, 1.13.4 and 1.14.1, as follows:
> > > [ ] +1, Approve the releases
> > > [ ] -1, Do not approve the releases (please provide specific comments)
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source releases and binary convenience releases
> to
> > > be deployed to dist.apache.org [2], which are signed with the key with
> > > fingerprint C2EED7B111D464BA [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > >  * *the jars for 1.13/1.14 are still being built*
> > > * source code tags [5],
> > > * website pull request listing the new releases and adding announcement
> > > blog post [6].
> > >
> > > The vote will be open for at least 24 hours. The minimum vote time has
> > > been shortened as the changes are minimal and the matter is urgent.
> > > It is adopted by majority approval, with at least 3 PMC affirmative
> > votes.
> > >
> > > Thanks,
> > > Chesnay
> > >
> > > [1]
> > > 1.11:
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350327
> > > 1.12:
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350328
> > > 1.13:
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350686
> > > 1.14:
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350512
> > > [2]
> > > 1.11: https://dist.apache.org/repos/dist/dev/flink/flink-1.11.5-rc1/
> > > 1.12: https://dist.apache.org/repos/dist/dev/flink/flink-1.12.6-rc1/
> > > 1.13: https://dist.apache.org/repos/dist/dev/flink/flink-1.13.4-rc1/
> > > 1.14: https://dist.apache.org/repos/dist/dev/flink/flink-1.14.1-rc1/
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > > 1.11/1.12:
> > > https://repository.apache.org/content/repositories/orgapacheflink-1455
> > > 1.13:
> > > https://repository.apache.org/content/repositories/orgapacheflink-1457
> > > 1.14:
> > > https://repository.apache.org/content/repositories/orgapacheflink-1456
> > > [5]
> > > 1.11: https://github.com/apache/flink/releases/tag/release-1.11.5-rc1
> > > 1.12: https://github.com/apache/flink/releases/tag/release-1.12.6-rc1
> > > 1.13: https://github.com/apache/flink/releases/tag/release-1.13.4-rc1
> > > 1.14: https://github.com/apache/flink/releases/tag/release-1.14.1-rc1
> > > [6] https://github.com/apache/flink-web/pull/489
> > >
> >
>


Re: [VOTE] Release 1.11.5/1.12.6/1.13.4/1.14.1, release candidate #1

2021-12-13 Thread Seth Wiesman
+1 (non-binding)

- Checked Log4J version and updated license preambles on all releases
- Verified signatures on sources
- Reviewed blog post

Seth

On Mon, Dec 13, 2021 at 1:42 PM Jing Ge  wrote:

> +1   LGTM. Many thanks for your effort!
>
> On Mon, Dec 13, 2021 at 8:28 PM Chesnay Schepler 
> wrote:
>
> > Hi everyone,
> >
> > This vote is for the emergency patch releases for 1.11, 1.12, 1.13 and
> > 1.14 to address CVE-2021-44228.
> > It covers all 4 releases as they contain the same changes (upgrading
> > Log4j to 2.15.0) and were prepared simultaneously by the same person.
> > (Hence, if something is broken, it likely applies to all releases)
> >
> > Please review and vote on the release candidate #1 for the versions
> > 1.11.5, 1.12.6, 1.13.4 and 1.14.1, as follows:
> > [ ] +1, Approve the releases
> > [ ] -1, Do not approve the releases (please provide specific comments)
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source releases and binary convenience releases to
> > be deployed to dist.apache.org [2], which are signed with the key with
> > fingerprint C2EED7B111D464BA [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> >  * *the jars for 1.13/1.14 are still being built*
> > * source code tags [5],
> > * website pull request listing the new releases and adding announcement
> > blog post [6].
> >
> > The vote will be open for at least 24 hours. The minimum vote time has
> > been shortened as the changes are minimal and the matter is urgent.
> > It is adopted by majority approval, with at least 3 PMC affirmative
> votes.
> >
> > Thanks,
> > Chesnay
> >
> > [1]
> > 1.11:
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350327
> > 1.12:
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350328
> > 1.13:
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350686
> > 1.14:
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350512
> > [2]
> > 1.11: https://dist.apache.org/repos/dist/dev/flink/flink-1.11.5-rc1/
> > 1.12: https://dist.apache.org/repos/dist/dev/flink/flink-1.12.6-rc1/
> > 1.13: https://dist.apache.org/repos/dist/dev/flink/flink-1.13.4-rc1/
> > 1.14: https://dist.apache.org/repos/dist/dev/flink/flink-1.14.1-rc1/
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> > 1.11/1.12:
> > https://repository.apache.org/content/repositories/orgapacheflink-1455
> > 1.13:
> > https://repository.apache.org/content/repositories/orgapacheflink-1457
> > 1.14:
> > https://repository.apache.org/content/repositories/orgapacheflink-1456
> > [5]
> > 1.11: https://github.com/apache/flink/releases/tag/release-1.11.5-rc1
> > 1.12: https://github.com/apache/flink/releases/tag/release-1.12.6-rc1
> > 1.13: https://github.com/apache/flink/releases/tag/release-1.13.4-rc1
> > 1.14: https://github.com/apache/flink/releases/tag/release-1.14.1-rc1
> > [6] https://github.com/apache/flink-web/pull/489
> >
>


[jira] [Created] (FLINK-25283) End-to-end application modules create oversized jars

2021-12-13 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-25283:


 Summary: End-to-end application modules create oversized jars
 Key: FLINK-25283
 URL: https://issues.apache.org/jira/browse/FLINK-25283
 Project: Flink
  Issue Type: Technical Debt
  Components: Tests
Affects Versions: 1.13.0
Reporter: Chesnay Schepler
 Fix For: 1.15.0


Various modules that create jars for e2e tests (e.g., 
flink-streaming-kinesis-test) create oversized jars (100mb+) because they 
bundle their entire dependency tree, including many parts of Flink and even 
test resources.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Release 1.11.5/1.12.6/1.13.4/1.14.1, release candidate #1

2021-12-13 Thread Jing Ge
+1   LGTM. Many thanks for your effort!

On Mon, Dec 13, 2021 at 8:28 PM Chesnay Schepler  wrote:

> Hi everyone,
>
> This vote is for the emergency patch releases for 1.11, 1.12, 1.13 and
> 1.14 to address CVE-2021-44228.
> It covers all 4 releases as they contain the same changes (upgrading
> Log4j to 2.15.0) and were prepared simultaneously by the same person.
> (Hence, if something is broken, it likely applies to all releases)
>
> Please review and vote on the release candidate #1 for the versions
> 1.11.5, 1.12.6, 1.13.4 and 1.14.1, as follows:
> [ ] +1, Approve the releases
> [ ] -1, Do not approve the releases (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source releases and binary convenience releases to
> be deployed to dist.apache.org [2], which are signed with the key with
> fingerprint C2EED7B111D464BA [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
>  * *the jars for 1.13/1.14 are still being built*
> * source code tags [5],
> * website pull request listing the new releases and adding announcement
> blog post [6].
>
> The vote will be open for at least 24 hours. The minimum vote time has
> been shortened as the changes are minimal and the matter is urgent.
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Chesnay
>
> [1]
> 1.11:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350327
> 1.12:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350328
> 1.13:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350686
> 1.14:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350512
> [2]
> 1.11: https://dist.apache.org/repos/dist/dev/flink/flink-1.11.5-rc1/
> 1.12: https://dist.apache.org/repos/dist/dev/flink/flink-1.12.6-rc1/
> 1.13: https://dist.apache.org/repos/dist/dev/flink/flink-1.13.4-rc1/
> 1.14: https://dist.apache.org/repos/dist/dev/flink/flink-1.14.1-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
> 1.11/1.12:
> https://repository.apache.org/content/repositories/orgapacheflink-1455
> 1.13:
> https://repository.apache.org/content/repositories/orgapacheflink-1457
> 1.14:
> https://repository.apache.org/content/repositories/orgapacheflink-1456
> [5]
> 1.11: https://github.com/apache/flink/releases/tag/release-1.11.5-rc1
> 1.12: https://github.com/apache/flink/releases/tag/release-1.12.6-rc1
> 1.13: https://github.com/apache/flink/releases/tag/release-1.13.4-rc1
> 1.14: https://github.com/apache/flink/releases/tag/release-1.14.1-rc1
> [6] https://github.com/apache/flink-web/pull/489
>


[VOTE] Release 1.11.5/1.12.6/1.13.4/1.14.1, release candidate #1

2021-12-13 Thread Chesnay Schepler

Hi everyone,

This vote is for the emergency patch releases for 1.11, 1.12, 1.13 and 
1.14 to address CVE-2021-44228.
It covers all 4 releases as they contain the same changes (upgrading 
Log4j to 2.15.0) and were prepared simultaneously by the same person.

(Hence, if something is broken, it likely applies to all releases)

Please review and vote on the release candidate #1 for the versions 
1.11.5, 1.12.6, 1.13.4 and 1.14.1, as follows:

[ ] +1, Approve the releases
[ ] -1, Do not approve the releases (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source releases and binary convenience releases to 
be deployed to dist.apache.org [2], which are signed with the key with 
fingerprint C2EED7B111D464BA [3],

* all artifacts to be deployed to the Maven Central Repository [4],
    * *the jars for 1.13/1.14 are still being built*
* source code tags [5],
* website pull request listing the new releases and adding announcement 
blog post [6].


The vote will be open for at least 24 hours. The minimum vote time has 
been shortened as the changes are minimal and the matter is urgent.

It is adopted by majority approval, with at least 3 PMC affirmative votes.

Thanks,
Chesnay

[1]
1.11: 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350327
1.12: 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350328
1.13: 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350686
1.14: 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350512

[2]
1.11: https://dist.apache.org/repos/dist/dev/flink/flink-1.11.5-rc1/
1.12: https://dist.apache.org/repos/dist/dev/flink/flink-1.12.6-rc1/
1.13: https://dist.apache.org/repos/dist/dev/flink/flink-1.13.4-rc1/
1.14: https://dist.apache.org/repos/dist/dev/flink/flink-1.14.1-rc1/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4]
1.11/1.12: 
https://repository.apache.org/content/repositories/orgapacheflink-1455

1.13: https://repository.apache.org/content/repositories/orgapacheflink-1457
1.14: https://repository.apache.org/content/repositories/orgapacheflink-1456
[5]
1.11: https://github.com/apache/flink/releases/tag/release-1.11.5-rc1
1.12: https://github.com/apache/flink/releases/tag/release-1.12.6-rc1
1.13: https://github.com/apache/flink/releases/tag/release-1.13.4-rc1
1.14: https://github.com/apache/flink/releases/tag/release-1.14.1-rc1
[6] https://github.com/apache/flink-web/pull/489


Re: [DISCUSS] FLIP-190: Support Version Upgrades for Table API & SQL Programs

2021-12-13 Thread Marios Trivyzas
@Timo Walther 

> But the question is why a user wants to run COMPILE multiple times. If
> it is during development, then running EXECUTE (or just the statement
> itself) without calling COMPILE should be sufficient. The file can also
> manually be deleted if necessary.

Sorry for the delayed response, yep its sounds like not necessary, and if
one,
for any reason needs to force compile only specific plans, the config
option can be used.

Thank you,
Marios

On Thu, Dec 2, 2021 at 5:42 PM Timo Walther  wrote:

> Response to Marios's feedback:
>
>  > there should be some good logging in place when the upgrade is taking
> place
>
> Yes, I agree. I added this part to the FLIP.
>
>  > config option instead that doesn't provide the flexibility to
> overwrite certain plans
>
> One can set the config option also around sections of the
> multi-statement SQL script.
>
> SET 'table.plan.force-recompile'='true';
>
> COMPILE ...
>
> SET 'table.plan.force-recompile'='false';
>
> But the question is why a user wants to run COMPILE multiple times. If
> it is during development, then running EXECUTE (or just the statement
> itself) without calling COMPILE should be sufficient. The file can also
> manually be deleted if necessary.
>
> What do you think?
>
> Regards,
> Timo
>
>
>
> On 02.12.21 16:09, Timo Walther wrote:
> > Hi Till,
> >
> > Yes, you might have to. But not a new plan from the SQL query but a
> > migration from the old plan to the new plan. This will not happen often.
> > But we need a way to evolve the format of the JSON plan itself.
> >
> > Maybe this confuses a bit, so let me clarify it again: Mostly ExecNode
> > versions and operator state layouts will evolve. Not the plan files,
> > those will be pretty stable. But also not infinitely.
> >
> > Regards,
> > Timo
> >
> >
> > On 02.12.21 16:01, Till Rohrmann wrote:
> >> Then for migrating from Flink 1.10 to 1.12, I might have to create a new
> >> plan using Flink 1.11 in order to migrate from Flink 1.11 to 1.12,
> right?
> >>
> >> Cheers,
> >> Till
> >>
> >> On Thu, Dec 2, 2021 at 3:39 PM Timo Walther  wrote:
> >>
> >>> Response to Till's feedback:
> >>>
> >>>   > compiled plan won't be changed after being written initially
> >>>
> >>> This is not entirely correct. We give guarantees for keeping the query
> >>> up and running. We reserve us the right to force plan migrations. In
> >>> this case, the plan might not be created from the SQL statement but
> from
> >>> the old plan. I have added an example in section 10.1.1. In general,
> >>> both persisted entities "plan" and "savepoint" can evolve independently
> >>> from each other.
> >>>
> >>> Thanks,
> >>> Timo
> >>>
> >>> On 02.12.21 15:10, Timo Walther wrote:
>  Response to Godfrey's feedback:
> 
>    > "EXPLAIN PLAN EXECUTE STATEMENT SET BEGIN ... END" is missing.
> 
>  Thanks for the hint. I added a dedicated section 7.1.3.
> 
> 
>    > it's hard to maintain the supported versions for
>  "supportedPlanChanges" and "supportedSavepointChanges"
> 
>  Actually, I think we are mostly on the same page.
> 
>  The annotation does not need to be updated for every Flink version. As
>  the name suggests it is about "Changes" (in other words:
>  incompatibilities) that require some kind of migration. Either plan
>  migration (= PlanChanges) or savepoint migration (=SavepointChanges,
>  using operator migration or savepoint migration).
> 
>  Let's assume we introduced two ExecNodes A and B in Flink 1.15.
> 
>  The annotations are:
> 
>  @ExecNodeMetadata(name=A, supportedPlanChanges=1.15,
>  supportedSavepointChanges=1.15)
> 
>  @ExecNodeMetadata(name=B, supportedPlanChanges=1.15,
>  supportedSavepointChanges=1.15)
> 
>  We change an operator state of B in Flink 1.16.
> 
>  We perform the change in the operator of B in a way to support both
>  state layouts. Thus, no need for a new ExecNode version.
> 
>  The annotations in 1.16 are:
> 
>  @ExecNodeMetadata(name=A, supportedPlanChanges=1.15,
>  supportedSavepointChanges=1.15)
> 
>  @ExecNodeMetadata(name=B, supportedPlanChanges=1.15,
>  supportedSavepointChanges=1.15, 1.16)
> 
>  So the versions in the annotations are "start version"s.
> 
>  I don't think we need end versions? End version would mean that we
> drop
>  the ExecNode from the code base?
> 
>  Please check the section 10.1.1 again. I added a more complex example.
> 
> 
>  Thanks,
>  Timo
> 
> 
> 
>  On 01.12.21 16:29, Timo Walther wrote:
> > Response to Francesco's feedback:
> >
> >   > *Proposed changes #6*: Other than defining this rule of thumb, we
> > must also make sure that compiling plans with these objects that
> > cannot be serialized in the plan must fail hard
> >
> > Yes, I totally agree. We will fail hard with a helpful exception. Any
> > mistake 

Re: [DISCUSS] FLIP-191: Extend unified Sink interface to support small file compaction

2021-12-13 Thread Fabian Paul
Hi all,

After a lot of discussions with different, we received very fruitful
feedback and reworked the ideas behind this FLIP. Initially, we had
the impression that the compaction problem is solvable by a single
topology that we can reuse across different sinks. We now have a
better understanding that different external systems require different
compaction mechanism i.e. Hive requires compaction before finally
registering the file in the metastore or Iceberg compacts the files
after they have been registered and just lazily compacts them.

Considering all these different views we came up with a design that
builds upon what @guowei@gmail.com and @yungao...@aliyun.com have
proposed at the beginning. We allow inserting custom topologies before
and after the SinkWriters and Committers. Furthermore, we do not see
it as a downside. The Sink interfaces that will expose the DataStream
to the user reside in flink-streaming-java in contrast to the basic
Sink interfaces that reside fin flink-core deem it to be only used by
expert users.

Moreover, we also wanted to remove the global committer from the
unified Sink interfaces and replace it with a custom post-commit
topology. Unfortunately, we cannot do it without breaking the Sink
interface since the GlobalCommittables are part of the parameterized
Sink interface. Thus, we propose building a new Sink V2 interface
consisting of composable interfaces that do not offer the
GlobalCommitter anymore. We will implement a utility to extend a Sink
with post topology that mimics the behavior of the GlobalCommitter.
The new Sink V2 provides the same sort of methods as the Sink V1
interface, so a migration of sinks that do not use the GlobalCommitter
should be very easy.
We plan to keep the existing Sink V1 interfaces to not break
externally built sinks. As part of this FLIP, we migrate all the
connectors inside of the main repository to the new Sink V2 API.

The FLIP document is also updated and includes the proposed changes.

Looking forward to your feedback,
Fabian

https://cwiki.apache.org/confluence/display/FLINK/FLIP-191%3A+Extend+unified+Sink+interface+to+support+small+file+compaction


On Thu, Dec 2, 2021 at 10:15 AM Roman Khachatryan  wrote:
>
> Thanks for clarifying (I was initially confused by merging state files
> rather than output files).
>
> > At some point, Flink will definitely have some WAL adapter that can turn 
> > any sink into an exactly-once sink (with some caveats). For now, we keep 
> > that as an orthogonal solution as it has a rather high price (bursty 
> > workload with high latency). Ideally, we can keep the compaction 
> > asynchronously...
>
> Yes, that would be something like a WAL. I agree that it would have a
> different set of trade-offs.
>
>
> Regards,
> Roman
>
> On Mon, Nov 29, 2021 at 3:33 PM Arvid Heise  wrote:
> >>
> >> > One way to avoid write-read-merge is by wrapping SinkWriter with
> >> > another one, which would buffer input elements in a temporary storage
> >> > (e.g. local file) until a threshold is reached; after that, it would
> >> > invoke the original SinkWriter. And if a checkpoint barrier comes in
> >> > earlier, it would send written data to some aggregator.
> >>
> >> I think perhaps this seems to be a kind of WAL method? Namely we first
> >> write the elements to some WAL logs and persist them on checkpoint
> >> (in snapshot or remote FS), or we directly write WAL logs to the remote
> >> FS eagerly.
> >>
> > At some point, Flink will definitely have some WAL adapter that can turn 
> > any sink into an exactly-once sink (with some caveats). For now, we keep 
> > that as an orthogonal solution as it has a rather high price (bursty 
> > workload with high latency). Ideally, we can keep the compaction 
> > asynchronously...
> >
> > On Mon, Nov 29, 2021 at 8:52 AM Yun Gao  
> > wrote:
> >>
> >> Hi,
> >>
> >> @Roman very sorry for the late response for a long time,
> >>
> >> > Merging artifacts from multiple checkpoints would apparently
> >> require multiple concurrent checkpoints
> >>
> >> I think it might not need concurrent checkpoints: suppose some
> >> operators (like the committer aggregator in the option 2) maintains
> >> the list of files to merge, it could stores the lists of files to merge
> >> in the states, then after several checkpoints are done and we have
> >> enough files, we could merge all the files in the list.
> >>
> >> > Asynchronous merging in an aggregator would require some resolution
> >> > logic on recovery, so that a merged artifact can be used if the
> >> > original one was deleted. Otherwise, wouldn't recovery fail because
> >> > some artifacts are missing?
> >> > We could also defer deletion until the "compacted" checkpoint is
> >> > subsumed - but isn't it too late, as it will be deleted anyways once
> >> > subsumed?
> >>
> >> I think logically we could delete the original files once the "compacted" 
> >> checkpoint
> >> (which finish merging the compacted files and record it in the 

[jira] [Created] (FLINK-25282) Move runtime dependencies from table-planner to table-runtime

2021-12-13 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25282:
---

 Summary: Move runtime dependencies from table-planner to 
table-runtime
 Key: FLINK-25282
 URL: https://issues.apache.org/jira/browse/FLINK-25282
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


There are several runtime dependencies (e.g. functions used in codegen) that 
are shipped by table-planner and calcite-core. We should move these 
dependencies to runtime



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25281) Azure failed due to python tests "tox check failed"

2021-12-13 Thread Yun Gao (Jira)
Yun Gao created FLINK-25281:
---

 Summary: Azure failed due to python tests "tox check failed"
 Key: FLINK-25281
 URL: https://issues.apache.org/jira/browse/FLINK-25281
 Project: Flink
  Issue Type: Bug
  Components: API / Python, Build System / Azure Pipelines
Affects Versions: 1.14.0
Reporter: Yun Gao


{code:java}
Dec 13 03:03:08 pip_test_code.py success!
Dec 13 03:03:09 ___ summary 

Dec 13 03:03:09   py36-cython: commands succeeded
Dec 13 03:03:09 ERROR:   py37-cython: commands failed
Dec 13 03:03:09   py38-cython: commands succeeded
Dec 13 03:03:09 tox checks... [FAILED]
Dec 13 03:03:09 Process exited with EXIT CODE: 1.
Dec 13 03:03:09 Trying to KILL watchdog (2760).
/__w/1/s/tools/ci/watchdog.sh: line 100:  2760 Terminated  watchdog
Dec 13 03:03:09 Searching for .dump, .dumpstream and related files in '/__w/1/s'
The STDIO streams did not close within 10 seconds of the exit event from 
process '/bin/bash'. This may indicate a child process inherited the STDIO 
streams and has not yet exited.
##[error]Bash exited with code '1'.
Finishing: Test - python
 {code}
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=28010=logs=bdd9ea51-4de2-506a-d4d9-f3930e4d2355=dd50312f-73b5-56b5-c172-4d81d03e2ef1=24236



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25280) KafkaPartitionSplitReaderTest failed on azure due to Offsets out of range with no configured reset policy for partitions

2021-12-13 Thread Yun Gao (Jira)
Yun Gao created FLINK-25280:
---

 Summary: KafkaPartitionSplitReaderTest failed on azure due to 
Offsets out of range with no configured reset policy for partitions
 Key: FLINK-25280
 URL: https://issues.apache.org/jira/browse/FLINK-25280
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.14.0
Reporter: Yun Gao


{code:java}
Dec 13 03:30:12 at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
 Dec 13 03:30:12 at java.util.ArrayList.forEach(ArrayList.java:1259) Dec 13 
03:30:12 at 
org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
 Dec 13 03:30:12 at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:143)
 Dec 13 03:30:12 at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
 Dec 13 03:30:12 at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
 Dec 13 03:30:12 at 
org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) Dec 
13 03:30:12 at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
 Dec 13 03:30:12 at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
 Dec 13 03:30:12 at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
 Dec 13 03:30:12 at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
 Dec 13 03:30:12 at 
org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:32)
 Dec 13 03:30:12 at 
org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
 Dec 13 03:30:12 at 
org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:51)
 Dec 13 03:30:12 at 
org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:220)
 Dec 13 03:30:12 at 
org.junit.platform.launcher.core.DefaultLauncher.lambda$execute$6(DefaultLauncher.java:188)
 Dec 13 03:30:12 at 
org.junit.platform.launcher.core.DefaultLauncher.withInterceptedStreams(DefaultLauncher.java:202)
 Dec 13 03:30:12 at 
org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:181)
 Dec 13 03:30:12 at 
org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:128)
 Dec 13 03:30:12 at 
org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invokeAllTests(JUnitPlatformProvider.java:150)
 Dec 13 03:30:12 at 
org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invoke(JUnitPlatformProvider.java:124)
 Dec 13 03:30:12 at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
 Dec 13 03:30:12 at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
 Dec 13 03:30:12 at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) 
Dec 13 03:30:12 at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) Dec 
13 03:30:12
{code}
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=28010=logs=c5612577-f1f7-5977-6ff6-7432788526f7=ffa8837a-b445-534e-cdf4-db364cf8235d=7010]

testNumBytesInCounter and testPendingRecordsGauge also failed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25279) KafkaSourceLegacyITCase.testKeyValueSupport failed on azure due to topic already exists

2021-12-13 Thread Yun Gao (Jira)
Yun Gao created FLINK-25279:
---

 Summary: KafkaSourceLegacyITCase.testKeyValueSupport failed on 
azure due to topic already exists
 Key: FLINK-25279
 URL: https://issues.apache.org/jira/browse/FLINK-25279
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.15.0
Reporter: Yun Gao


{code:java}
Dec 12 03:00:12 [ERROR] Failures: 
Dec 12 03:00:12 [ERROR]   
KafkaSourceLegacyITCase.testKeyValueSupport:58->KafkaConsumerTestBase.runKeyValueTest:1528->KafkaTestBase.createTestTopic:222
 Create test topic : keyvaluetest failed, 
org.apache.kafka.common.errors.TopicExistsException: Topic 'keyvaluetest' 
already exists.
Dec 12 03:00:12 [INFO] 
Dec 12 03:00:12 [ERROR] Tests run: 177, Failures: 1, Errors: 0, Skipped: 0
 {code}
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=27995=logs=72d4811f-9f0d-5fd0-014a-0bc26b72b642=e424005a-b16e-540f-196d-da062cc19bdf=35697



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25278) Azure failed due to unable to transfer jar from confluent maven repo

2021-12-13 Thread Yun Gao (Jira)
Yun Gao created FLINK-25278:
---

 Summary: Azure failed due to unable to transfer jar from confluent 
maven repo
 Key: FLINK-25278
 URL: https://issues.apache.org/jira/browse/FLINK-25278
 Project: Flink
  Issue Type: Bug
  Components: Build System / Azure Pipelines
Affects Versions: 1.13.3
Reporter: Yun Gao


{code:java}
Dec 12 00:46:45 [ERROR] Failed to execute goal on project 
flink-avro-confluent-registry: Could not resolve dependencies for project 
org.apache.flink:flink-avro-confluent-registry:jar:1.13-SNAPSHOT: Could not 
transfer artifact io.confluent:common-utils:jar:5.5.2 from/to confluent 
(https://packages.confluent.io/maven/): transfer failed for 
https://packages.confluent.io/maven/io/confluent/common-utils/5.5.2/common-utils-5.5.2.jar:
 Connection reset -> [Help 1]
Dec 12 00:46:45 [ERROR] 
Dec 12 00:46:45 [ERROR] To see the full stack trace of the errors, re-run Maven 
with the -e switch.
Dec 12 00:46:45 [ERROR] Re-run Maven using the -X switch to enable full debug 
logging.
Dec 12 00:46:45 [ERROR] 
Dec 12 00:46:45 [ERROR] For more information about the errors and possible 
solutions, please read the following articles:
Dec 12 00:46:45 [ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
Dec 12 00:46:45 [ERROR] 
Dec 12 00:46:45 [ERROR] After correcting the problems, you can resume the build 
with the command
Dec 12 00:46:45 [ERROR]   mvn  -rf :flink-avro-confluent-registry
 {code}
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=27994=logs=a5ef94ef-68c2-57fd-3794-dc108ed1c495=9c1ddabe-d186-5a2c-5fcc-f3cafb3ec699=8812



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25277) Introduce explicit shutdown signalling between TaskManager and JobManager

2021-12-13 Thread Niklas Semmler (Jira)
Niklas Semmler created FLINK-25277:
--

 Summary: Introduce explicit shutdown signalling between 
TaskManager and JobManager 
 Key: FLINK-25277
 URL: https://issues.apache.org/jira/browse/FLINK-25277
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.14.0, 1.13.0
Reporter: Niklas Semmler
 Fix For: 1.15.0


We need to introduce shutdown signalling between TaskManager and JobManager for 
fast & graceful shutdown in reactive scheduler mode.

In Flink 1.14 and earlier versions, the JobManager tracks the availability of a 
TaskManager using a hearbeat. This heartbeat is by default configured with an 
interval of 10 seconds and a timeout of 50 seconds [1]. Hence, the shutdown of 
a TaskManager is recognized only after about 50-60 seconds. This works fine for 
the static scheduling mode, where a TaskManager only disappears as part of a 
cluster shutdown or a job failure. However, in the reactive scheduler mode 
(FLINK-10407), TaskManagers are regularly added and removed from a running job. 
Here, the heartbeat-mechanisms incurs additional delays.

To remove these delays, we add an explicit shutdown signal from the TaskManager 
to the JobManager. Additionally, to avoid data loss in a running job, the 
TaskManager will wait for a shutdown confirmation from the JobManager before 
shutting down.

 

[1]https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#heartbeat-timeout



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25276) Support native and incremental savepoints

2021-12-13 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-25276:
--

 Summary: Support native and incremental savepoints
 Key: FLINK-25276
 URL: https://issues.apache.org/jira/browse/FLINK-25276
 Project: Flink
  Issue Type: New Feature
Reporter: Piotr Nowojski


Motivation. Currently with non incremental canonical format savepoints, with 
very large state, both taking and recovery from savepoints can take very long 
time. Providing options to take native format and incremental savepoint would 
alleviate this problem.

In the past the main challenge lied in the ownership semantic and files clean 
up of such incremental savepoints. However with FLINK-25154 implemented some of 
those concerns can be solved. Incremental savepoint could leverage "force full 
snapshot" mode provided by FLINK-25192, to duplicate/copy all of the savepoint 
files out of the Flink's ownership scope.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25275) Weighted KeyGroup assignment

2021-12-13 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-25275:
--

 Summary: Weighted KeyGroup assignment
 Key: FLINK-25275
 URL: https://issues.apache.org/jira/browse/FLINK-25275
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Network
Affects Versions: 1.14.0
Reporter: Piotr Nowojski


Currently key groups are split into key group ranges naively in the simplest 
way. Key groups are split into equally sized continuous ranges (number of 
ranges = parallelism = number of keygroups / size of single keygroup). Flink 
could avoid data skew between key groups, by assigning them to tasks based on 
their "weight". "Weight" could be defined as frequency of an access for the 
given key group. 

Arbitrary, non-continuous, key group assignment (for example TM1 is processing 
kg1 and kg3 while TM2 is processing only kg2) would require extensive changes 
to the state backends for example. However the data skew could be mitigated to 
some extent by creating key group ranges in a more clever way, while keeping 
the key group range continuity. For example TM1 processes range [kg1, kg9], 
while TM2 just [kg10, kg11].

[This branch shows a PoC of such 
approach.|https://github.com/pnowojski/flink/commits/antiskew]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2021-12-13 Thread Dian Fu
Thanks Yunfeng for bringing up this discussion. I have seen that there were
many users asking for this feature in the past. So big +1 for this proposal.

Regarding this FLIP, I have the following question:
- Does it make sense to rename the method `getLatestRules` into
`getRuleChanges`? I assume we want to know the change history of the rules,
not only the latest status.

Regarding Table API & SQL support:
-  Generally I agree that we should consider both Table API and DataStream
API for new features. It would avoid introducing changes which will be
changed immediately.  However, I think it depends. For this feature, I just
suspect that it could be supported in SQL in reality. There are so many
things which may be dynamically changed, e.g. the rule definitions, the
partitioning columns, etc. If we want to change them dynamically, we may
need to make a lot of extensions to the MATCH_RECOGNIZE statement which
comes from SQL standard. Personally I tend to limit the scope of dynamic
changing patterns to only DataStream API at least for now.

Regarding CEP.rule vs CEP.dynamicPatterns:
- I'm also slightly preferring CEP.dynamicPatterns which is more
descriptive. Personally I think that it's difficult to tell the difference
between CEP.rule and CEP.pattern unless users read the documentation
carefully.

Regards,
Dian


On Mon, Dec 13, 2021 at 5:41 PM Nicholas Jiang 
wrote:

> Hi Konstantin,
>
>About the renaming for the Rule, I mean that the difference between the
> Rule and Pattern is that the Rule not only contains the Pattern, but also
> how to match the Pattern, and how to process after matched. If renaming
> DynamicPattern, I'm concerned that this name couldn't represent how to
> match the Pattern, and how to process after matched but the Rule could
> explain this. Therefore I prefer to rename the Rule not the DynamicPattern.
>
> Best,
> Nicholas Jiang
>
>
> On 2021/12/13 08:23:23 Konstantin Knauf wrote:
> > Hi Nicholas,
> >
> > I am not sure I understand your question about renaming. I think the most
> > important member of the current Rule class is the Pattern, the
> KeySelector
> > and PatternProcessFunction are more auxiliary if you will. That's why, I
> > think, it would be ok to rename Rule to DynamicPatternHolder although it
> > contains more than just a Pattern.
> >
> > Cheers,
> >
> > Konstantin
> >
> > On Mon, Dec 13, 2021 at 9:16 AM Nicholas Jiang  >
> > wrote:
> >
> > > Hi Konstantin,
> > >
> > >Thanks for your feedback. The point that add a timestamp to each
> rule
> > > that determines the start time from which the rule makes sense to me.
> At
> > > present, The timestamp is current time at default, so no timestamp
> field
> > > represents the start time from which the rule. And about the renaming
> rule,
> > > your suggestion looks good to me and no any new concept introduces. But
> > > does this introduce Rule concept or reuse the Pattern concept for the
> > > DynamicPattern renaming?
> > >
> > > Best,
> > > Nicholas Jiang
> > >
> > > On 2021/12/13 07:45:04 Konstantin Knauf wrote:
> > > > Thanks, Yufeng, for starting this discussion. I think this will be a
> very
> > > > popular feature. I've seen a lot of users asking for this in the
> past.
> > > So,
> > > > generally big +1.
> > > >
> > > > I think we should have a rough idea on how to expose this feature in
> the
> > > > other APIs.
> > > >
> > > > Two ideas:
> > > >
> > > > 1. In order to make this more deterministic in case of reprocessing
> and
> > > > out-of-orderness, I am wondering if we can add a timestamp to each
> rule
> > > > that determines the start time from which the rule should be in
> effect.
> > > > This can be an event or a processing time depending on the
> > > characteristics
> > > > of the pipeline. The timestamp would default to Long.MIN_TIMESTAMP
> if not
> > > > provided, which means effectively immediately. This could also be a
> > > follow
> > > > up, if you think it will make the implementation too complicated
> > > initially.
> > > >
> > > > 2. I am wondering, if we should name Rule->DynamicPatternHolder or
> so and
> > > > CEP.rule-> CEP.dynamicPatterns instead (other classes
> correspondingly)?
> > > > Rule is quite ambiguous and DynamicPattern seems more descriptive to
> me.
> > > >
> > > > On Mon, Dec 13, 2021 at 4:30 AM Nicholas Jiang <
> nicholasji...@apache.org
> > > >
> > > > wrote:
> > > >
> > > > > Hi Martijn,
> > > > >
> > > > >IMO, in this FLIP, we only need to introduce the general design
> of
> > > the
> > > > > Table API/SQL level. As for the design details, you can create a
> new
> > > FLIP.
> > > > > And do we need to take into account the support for Batch mode if
> you
> > > > > expand the MATCH_RECOGNIZE function? About the dynamic rule engine
> > > design,
> > > > > do you have any comments? This core of the FLIP is about the
> multiple
> > > rule
> > > > > and dynamic rule changing mechanism.
> > > > >
> > > > > Best,
> > > > > Nicholas Jiang
> > > > >
> > > >
> > > >
> > > > 

Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2021-12-13 Thread Yunfeng Zhou
Hi Konstantin,

Thanks for your suggestion. For the first idea, I agree that adding a
timestamp field and making users able to schedule a rule is a useful
feature. This might not require too much implementation work and I believe
it can be achieved in this FLIP.

As for the second idea, Rule is a concept that contains pattern and the two
are different. I think we can consider the following example use cases:
- while one pattern requires input data to be grouped by user id, another
pattern might require data to be grouped by item id
- users might want to process matched results differently, according to
which pattern the result matches to.
In these use cases, simply making patterns dynamically change is not
enough. Therefore I introduced the concept "rule" to include pattern and
all other functions around it, like key selector and the pattern process
function, and by dynamically changing rules the example cases above can be
supported. Thus maybe rule could be a better naming in the functions that
this flip propose to achieve.

Best regards,
Yunfeng

On Mon, Dec 13, 2021 at 3:45 PM Konstantin Knauf  wrote:

> Thanks, Yufeng, for starting this discussion. I think this will be a very
> popular feature. I've seen a lot of users asking for this in the past. So,
> generally big +1.
>
> I think we should have a rough idea on how to expose this feature in the
> other APIs.
>
> Two ideas:
>
> 1. In order to make this more deterministic in case of reprocessing and
> out-of-orderness, I am wondering if we can add a timestamp to each rule
> that determines the start time from which the rule should be in effect.
> This can be an event or a processing time depending on the characteristics
> of the pipeline. The timestamp would default to Long.MIN_TIMESTAMP if not
> provided, which means effectively immediately. This could also be a follow
> up, if you think it will make the implementation too complicated initially.
>
> 2. I am wondering, if we should name Rule->DynamicPatternHolder or so and
> CEP.rule-> CEP.dynamicPatterns instead (other classes correspondingly)?
> Rule is quite ambiguous and DynamicPattern seems more descriptive to me.
>
> On Mon, Dec 13, 2021 at 4:30 AM Nicholas Jiang 
> wrote:
>
> > Hi Martijn,
> >
> >IMO, in this FLIP, we only need to introduce the general design of the
> > Table API/SQL level. As for the design details, you can create a new
> FLIP.
> > And do we need to take into account the support for Batch mode if you
> > expand the MATCH_RECOGNIZE function? About the dynamic rule engine
> design,
> > do you have any comments? This core of the FLIP is about the multiple
> rule
> > and dynamic rule changing mechanism.
> >
> > Best,
> > Nicholas Jiang
> >
>
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>


[jira] [Created] (FLINK-25274) Use ResolvedSchema in DataGen instead of TableSchema

2021-12-13 Thread Sergey Nuyanzin (Jira)
Sergey Nuyanzin created FLINK-25274:
---

 Summary: Use ResolvedSchema in DataGen instead of TableSchema
 Key: FLINK-25274
 URL: https://issues.apache.org/jira/browse/FLINK-25274
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Reporter: Sergey Nuyanzin


{{TableSchema}} is deprecated 
It is recommended to use {{ResolvedSchema}} and {{Schema}} in {{TableSchema}} 
javadoc



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Immediate dedicated Flink releases for log4j vulnerability

2021-12-13 Thread Jing Zhang
+1 for the quick release.

Till Rohrmann  于2021年12月13日周一 17:54写道:

> +1
>
> Cheers,
> Till
>
> On Mon, Dec 13, 2021 at 10:42 AM Jing Ge  wrote:
>
> > +1
> >
> > As I suggested to publish the blog post last week asap, users have been
> > keen to have such urgent releases. Many thanks for it.
> >
> >
> >
> > On Mon, Dec 13, 2021 at 8:29 AM Konstantin Knauf 
> > wrote:
> >
> > > +1
> > >
> > > I didn't think this was necessary when I published the blog post on
> > Friday,
> > > but this has made higher waves than I expected over the weekend.
> > >
> > >
> > >
> > > On Mon, Dec 13, 2021 at 8:23 AM Yuan Mei 
> wrote:
> > >
> > > > +1 for quick release.
> > > >
> > > > On Mon, Dec 13, 2021 at 2:55 PM Martijn Visser <
> mart...@ververica.com>
> > > > wrote:
> > > >
> > > > > +1 to address the issue like this
> > > > >
> > > > > On Mon, 13 Dec 2021 at 07:46, Jingsong Li 
> > > > wrote:
> > > > >
> > > > > > +1 for fixing it in these versions and doing quick releases.
> Looks
> > > good
> > > > > to
> > > > > > me.
> > > > > >
> > > > > > Best,
> > > > > > Jingsong
> > > > > >
> > > > > > On Mon, Dec 13, 2021 at 2:18 PM Becket Qin  >
> > > > wrote:
> > > > > > >
> > > > > > > +1. The solution sounds good to me. There have been a lot of
> > > > inquiries
> > > > > > > about how to react to this.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jiangjie (Becket) Qin
> > > > > > >
> > > > > > > On Mon, Dec 13, 2021 at 12:40 PM Prasanna kumar <
> > > > > > > prasannakumarram...@gmail.com> wrote:
> > > > > > >
> > > > > > > > 1+ for making Updates for 1.12.5 .
> > > > > > > > We are looking for fix in 1.12 version.
> > > > > > > > Please notify once the fix is done.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Dec 13, 2021 at 9:45 AM Leonard Xu <
> xbjt...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 for the quick release and the special vote period 24h.
> > > > > > > > >
> > > > > > > > > > 2021年12月13日 上午11:49,Dian Fu  写道:
> > > > > > > > > >
> > > > > > > > > > +1 for the proposal and creating a quick release.
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Dian
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 13, 2021 at 11:15 AM Kyle Bendickson <
> > > > > k...@tabular.io>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> +1 to doing a release for this widely publicized
> > > > vulnerability.
> > > > > > > > > >>
> > > > > > > > > >> In my experience, users will often update to the latest
> > > minor
> > > > > > patch
> > > > > > > > > version
> > > > > > > > > >> without much fuss. Plus, users have also likely heard
> > about
> > > > this
> > > > > > and
> > > > > > > > > will
> > > > > > > > > >> appreciate a simple fix (updating their version where
> > > > possible).
> > > > > > > > > >>
> > > > > > > > > >> The work-around will need to still be noted for users
> who
> > > > can’t
> > > > > > > > upgrade
> > > > > > > > > for
> > > > > > > > > >> whatever reason (EMR hasn’t caught up, etc).
> > > > > > > > > >>
> > > > > > > > > >> I also agree with your assessment to apply a patch on
> each
> > > of
> > > > > > those
> > > > > > > > > >> previous versions with only the log4j commit, so that
> they
> > > > don’t
> > > > > > need
> > > > > > > > > to be
> > > > > > > > > >> as rigorously tested.
> > > > > > > > > >>
> > > > > > > > > >> Best,
> > > > > > > > > >> Kyle (GitHub @kbendick)
> > > > > > > > > >>
> > > > > > > > > >> On Sun, Dec 12, 2021 at 2:23 PM Stephan Ewen <
> > > > se...@apache.org>
> > > > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >>> Hi all!
> > > > > > > > > >>>
> > > > > > > > > >>> Without doubt, you heard about the log4j vulnerability
> > [1].
> > > > > > > > > >>>
> > > > > > > > > >>> There is an advisory blog post on how to mitigate this
> in
> > > > > Apache
> > > > > > > > Flink
> > > > > > > > > >> [2],
> > > > > > > > > >>> which involves setting a config option and restarting
> the
> > > > > > processes.
> > > > > > > > > That
> > > > > > > > > >>> is fortunately a relatively simple fix.
> > > > > > > > > >>>
> > > > > > > > > >>> Despite this workaround, I think we should do an
> > immediate
> > > > > > release
> > > > > > > > with
> > > > > > > > > >> the
> > > > > > > > > >>> updated dependency. Meaning not waiting for the next
> bug
> > > fix
> > > > > > releases
> > > > > > > > > >>> coming in a few weeks, but releasing asap.
> > > > > > > > > >>> The mood I perceive in the industry is pretty much
> > panicky
> > > > over
> > > > > > this,
> > > > > > > > > >> and I
> > > > > > > > > >>> expect we will see many requests for a patched release
> > and
> > > > many
> > > > > > > > > >> discussions
> > > > > > > > > >>> why the workaround alone would not be enough due to
> > certain
> > > > > > > > guidelines.
> > > > > > > > > >>>
> > > > > > > > > >>> I suggest that we preempt those discussions and create
> > > > releases
> > > > > > the
> > > > > > > > > >>> 

Re: [DISCUSS] Immediate dedicated Flink releases for log4j vulnerability

2021-12-13 Thread Ada Luna
+1
I need 1.12.6, thanks

Till Rohrmann  于2021年12月13日周一 17:54写道:
>
> +1
>
> Cheers,
> Till
>
> On Mon, Dec 13, 2021 at 10:42 AM Jing Ge  wrote:
>
> > +1
> >
> > As I suggested to publish the blog post last week asap, users have been
> > keen to have such urgent releases. Many thanks for it.
> >
> >
> >
> > On Mon, Dec 13, 2021 at 8:29 AM Konstantin Knauf 
> > wrote:
> >
> > > +1
> > >
> > > I didn't think this was necessary when I published the blog post on
> > Friday,
> > > but this has made higher waves than I expected over the weekend.
> > >
> > >
> > >
> > > On Mon, Dec 13, 2021 at 8:23 AM Yuan Mei  wrote:
> > >
> > > > +1 for quick release.
> > > >
> > > > On Mon, Dec 13, 2021 at 2:55 PM Martijn Visser 
> > > > wrote:
> > > >
> > > > > +1 to address the issue like this
> > > > >
> > > > > On Mon, 13 Dec 2021 at 07:46, Jingsong Li 
> > > > wrote:
> > > > >
> > > > > > +1 for fixing it in these versions and doing quick releases. Looks
> > > good
> > > > > to
> > > > > > me.
> > > > > >
> > > > > > Best,
> > > > > > Jingsong
> > > > > >
> > > > > > On Mon, Dec 13, 2021 at 2:18 PM Becket Qin 
> > > > wrote:
> > > > > > >
> > > > > > > +1. The solution sounds good to me. There have been a lot of
> > > > inquiries
> > > > > > > about how to react to this.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jiangjie (Becket) Qin
> > > > > > >
> > > > > > > On Mon, Dec 13, 2021 at 12:40 PM Prasanna kumar <
> > > > > > > prasannakumarram...@gmail.com> wrote:
> > > > > > >
> > > > > > > > 1+ for making Updates for 1.12.5 .
> > > > > > > > We are looking for fix in 1.12 version.
> > > > > > > > Please notify once the fix is done.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Dec 13, 2021 at 9:45 AM Leonard Xu 
> > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 for the quick release and the special vote period 24h.
> > > > > > > > >
> > > > > > > > > > 2021年12月13日 上午11:49,Dian Fu  写道:
> > > > > > > > > >
> > > > > > > > > > +1 for the proposal and creating a quick release.
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Dian
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 13, 2021 at 11:15 AM Kyle Bendickson <
> > > > > k...@tabular.io>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> +1 to doing a release for this widely publicized
> > > > vulnerability.
> > > > > > > > > >>
> > > > > > > > > >> In my experience, users will often update to the latest
> > > minor
> > > > > > patch
> > > > > > > > > version
> > > > > > > > > >> without much fuss. Plus, users have also likely heard
> > about
> > > > this
> > > > > > and
> > > > > > > > > will
> > > > > > > > > >> appreciate a simple fix (updating their version where
> > > > possible).
> > > > > > > > > >>
> > > > > > > > > >> The work-around will need to still be noted for users who
> > > > can’t
> > > > > > > > upgrade
> > > > > > > > > for
> > > > > > > > > >> whatever reason (EMR hasn’t caught up, etc).
> > > > > > > > > >>
> > > > > > > > > >> I also agree with your assessment to apply a patch on each
> > > of
> > > > > > those
> > > > > > > > > >> previous versions with only the log4j commit, so that they
> > > > don’t
> > > > > > need
> > > > > > > > > to be
> > > > > > > > > >> as rigorously tested.
> > > > > > > > > >>
> > > > > > > > > >> Best,
> > > > > > > > > >> Kyle (GitHub @kbendick)
> > > > > > > > > >>
> > > > > > > > > >> On Sun, Dec 12, 2021 at 2:23 PM Stephan Ewen <
> > > > se...@apache.org>
> > > > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >>> Hi all!
> > > > > > > > > >>>
> > > > > > > > > >>> Without doubt, you heard about the log4j vulnerability
> > [1].
> > > > > > > > > >>>
> > > > > > > > > >>> There is an advisory blog post on how to mitigate this in
> > > > > Apache
> > > > > > > > Flink
> > > > > > > > > >> [2],
> > > > > > > > > >>> which involves setting a config option and restarting the
> > > > > > processes.
> > > > > > > > > That
> > > > > > > > > >>> is fortunately a relatively simple fix.
> > > > > > > > > >>>
> > > > > > > > > >>> Despite this workaround, I think we should do an
> > immediate
> > > > > > release
> > > > > > > > with
> > > > > > > > > >> the
> > > > > > > > > >>> updated dependency. Meaning not waiting for the next bug
> > > fix
> > > > > > releases
> > > > > > > > > >>> coming in a few weeks, but releasing asap.
> > > > > > > > > >>> The mood I perceive in the industry is pretty much
> > panicky
> > > > over
> > > > > > this,
> > > > > > > > > >> and I
> > > > > > > > > >>> expect we will see many requests for a patched release
> > and
> > > > many
> > > > > > > > > >> discussions
> > > > > > > > > >>> why the workaround alone would not be enough due to
> > certain
> > > > > > > > guidelines.
> > > > > > > > > >>>
> > > > > > > > > >>> I suggest that we preempt those discussions and create
> > > > releases
> > > > > > the
> > > > > > > > > >>> following way:
> > > > > > > > > >>>
> > > > > > > > > >>>  - we take 

Re: [DISCUSS] FLIP-190: Support Version Upgrades for Table API & SQL Programs

2021-12-13 Thread Jing Zhang
Hi Timo,

+1 for the improvement too.
Please count me in when assign subtasks.

Best,
Jing Zhang

Timo Walther  于2021年12月13日周一 17:00写道:

> Hi everyone,
>
> *last call for feedback* on this FLIP. Otherwise I would start a VOTE by
> tomorrow.
>
> @Wenlong: Thanks for offering your help. Once the FLIP has been
> accepted. I will create a list of subtasks that we can split among
> contributors. Many can be implemented in parallel.
>
> Regards,
> Timo
>
>
> On 13.12.21 09:20, wenlong.lwl wrote:
> > Hi, Timo, +1 for the improvement too. Thanks for the great job.
> >
> > Looking forward to the next progress of the FLIP, I could help on the
> > development of some of the specific improvements.
> >
> > Best,
> > Wenlong
> >
> > On Mon, 13 Dec 2021 at 14:43, godfrey he  wrote:
> >
> >> Hi Timo,
> >>
> >> +1 for the improvement.
> >>
> >> Best,
> >> Godfrey
> >>
> >> Timo Walther  于2021年12月10日周五 20:37写道:
> >>>
> >>> Hi Wenlong,
> >>>
> >>> yes it will. Sorry for the confusion. This is a logical consequence of
> >>> the assumption:
> >>>
> >>> The JSON plan contains no implementation details (i.e. no classes) and
> >>> is fully declarative.
> >>>
> >>> I will add a remark.
> >>>
> >>> Thanks,
> >>> Timo
> >>>
> >>>
> >>> On 10.12.21 11:43, wenlong.lwl wrote:
>  hi, Timo, thanks for the explanation. I totally agree with what you
> >> said.
>  My actual question is: Will the version of an exec node be serialised
> >> in
>  the Json Plan? In my understanding, it is not in the former design. If
> >> it
>  is yes, my question is solved already.
> 
> 
>  Best,
>  Wenlong
> 
> 
>  On Fri, 10 Dec 2021 at 18:15, Timo Walther 
> wrote:
> 
> > Hi Wenlong,
> >
> > also thought about adding a `flinkVersion` field per ExecNode. But
> >> this
> > is not necessary, because the `version` of the ExecNode has the same
> > purpose.
> >
> > The plan version just encodes that:
> > "plan has been updated in Flink 1.17" / "plan is entirely valid for
> > Flink 1.17"
> >
> > The ExecNode version maps to `minStateVersion` to verify state
> > compatibility.
> >
> > So even if the plan version is 1.17, some ExecNodes use state layout
> >> of
> > 1.15.
> >
> > It is totally fine to only update the ExecNode to version 2 and not 3
> >> in
> > your example.
> >
> > Regards,
> > Timo
> >
> >
> >
> > On 10.12.21 06:02, wenlong.lwl wrote:
> >> Hi, Timo, thanks for updating the doc.
> >>
> >> I have a comment on plan migration:
> >> I think we may need to add a version field for every exec node when
> >> serialising. In earlier discussions, I think we have a conclusion
> >> that
> >> treating the version of plan as the version of node, but in this
> >> case it
> >> would be broken.
> >> Take the following example in FLIP into consideration, there is a
> bad
> > case:
> >> when in 1.17, we introduced an incompatible version 3 and dropped
> >> version
> >> 1, we can only update the version to 2, so the version should be per
> >> exec
> >> node.
> >>
> >> ExecNode version *1* is not supported anymore. Even though the state
> >> is
> >> actually compatible. The plan restore will fail with a helpful
> >> exception
> >> that forces users to perform plan migration.
> >>
> >> COMPILE PLAN '/mydir/plan_new.json' FROM '/mydir/plan_old.json';
> >>
> >> The plan migration will safely replace the old version *1* with *2.
> >> The
> >> JSON plan flinkVersion changes to 1.17.*
> >>
> >>
> >> Best,
> >>
> >> Wenlong
> >>
> >> On Thu, 9 Dec 2021 at 18:36, Timo Walther 
> >> wrote:
> >>
> >>> Hi Jing and Godfrey,
> >>>
> >>> I had another iteration over the document. There are two major
> >> changes:
> >>>
> >>> 1. Supported Flink Upgrade Versions
> >>>
> >>> I got the feedback via various channels that a step size of one
> >> minor
> >>> version is not very convenient. As you said, "because upgrading to
> >> a new
> >>> version is a time-consuming process". I rephrased this section:
> >>>
> >>> Upgrading usually involves work which is why many users perform
> this
> >>> task rarely (e.g. only once per year). Also skipping a versions is
> >>> common until a new feature has been introduced for which is it
> >> worth to
> >>> upgrade. We will support the upgrade to the most recent Flink
> >> version
> >>> from a set of previous versions. We aim to support upgrades from
> the
> >>> last 2-3 releases on a best-effort basis; maybe even more depending
> >> on
> >>> the maintenance overhead. However, in order to not grow the testing
> >>> matrix infinitely and to perform important refactoring if
> >> necessary, we
> >>> only guarantee upgrades with a step size of a single minor version
> >> (i.e.
> >>> a cascade of upgrades).
> >>>
> 

Re: [DISCUSS] Immediate dedicated Flink releases for log4j vulnerability

2021-12-13 Thread Till Rohrmann
+1

Cheers,
Till

On Mon, Dec 13, 2021 at 10:42 AM Jing Ge  wrote:

> +1
>
> As I suggested to publish the blog post last week asap, users have been
> keen to have such urgent releases. Many thanks for it.
>
>
>
> On Mon, Dec 13, 2021 at 8:29 AM Konstantin Knauf 
> wrote:
>
> > +1
> >
> > I didn't think this was necessary when I published the blog post on
> Friday,
> > but this has made higher waves than I expected over the weekend.
> >
> >
> >
> > On Mon, Dec 13, 2021 at 8:23 AM Yuan Mei  wrote:
> >
> > > +1 for quick release.
> > >
> > > On Mon, Dec 13, 2021 at 2:55 PM Martijn Visser 
> > > wrote:
> > >
> > > > +1 to address the issue like this
> > > >
> > > > On Mon, 13 Dec 2021 at 07:46, Jingsong Li 
> > > wrote:
> > > >
> > > > > +1 for fixing it in these versions and doing quick releases. Looks
> > good
> > > > to
> > > > > me.
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > > > On Mon, Dec 13, 2021 at 2:18 PM Becket Qin 
> > > wrote:
> > > > > >
> > > > > > +1. The solution sounds good to me. There have been a lot of
> > > inquiries
> > > > > > about how to react to this.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Mon, Dec 13, 2021 at 12:40 PM Prasanna kumar <
> > > > > > prasannakumarram...@gmail.com> wrote:
> > > > > >
> > > > > > > 1+ for making Updates for 1.12.5 .
> > > > > > > We are looking for fix in 1.12 version.
> > > > > > > Please notify once the fix is done.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Dec 13, 2021 at 9:45 AM Leonard Xu 
> > > > wrote:
> > > > > > >
> > > > > > > > +1 for the quick release and the special vote period 24h.
> > > > > > > >
> > > > > > > > > 2021年12月13日 上午11:49,Dian Fu  写道:
> > > > > > > > >
> > > > > > > > > +1 for the proposal and creating a quick release.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Dian
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Dec 13, 2021 at 11:15 AM Kyle Bendickson <
> > > > k...@tabular.io>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> +1 to doing a release for this widely publicized
> > > vulnerability.
> > > > > > > > >>
> > > > > > > > >> In my experience, users will often update to the latest
> > minor
> > > > > patch
> > > > > > > > version
> > > > > > > > >> without much fuss. Plus, users have also likely heard
> about
> > > this
> > > > > and
> > > > > > > > will
> > > > > > > > >> appreciate a simple fix (updating their version where
> > > possible).
> > > > > > > > >>
> > > > > > > > >> The work-around will need to still be noted for users who
> > > can’t
> > > > > > > upgrade
> > > > > > > > for
> > > > > > > > >> whatever reason (EMR hasn’t caught up, etc).
> > > > > > > > >>
> > > > > > > > >> I also agree with your assessment to apply a patch on each
> > of
> > > > > those
> > > > > > > > >> previous versions with only the log4j commit, so that they
> > > don’t
> > > > > need
> > > > > > > > to be
> > > > > > > > >> as rigorously tested.
> > > > > > > > >>
> > > > > > > > >> Best,
> > > > > > > > >> Kyle (GitHub @kbendick)
> > > > > > > > >>
> > > > > > > > >> On Sun, Dec 12, 2021 at 2:23 PM Stephan Ewen <
> > > se...@apache.org>
> > > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >>> Hi all!
> > > > > > > > >>>
> > > > > > > > >>> Without doubt, you heard about the log4j vulnerability
> [1].
> > > > > > > > >>>
> > > > > > > > >>> There is an advisory blog post on how to mitigate this in
> > > > Apache
> > > > > > > Flink
> > > > > > > > >> [2],
> > > > > > > > >>> which involves setting a config option and restarting the
> > > > > processes.
> > > > > > > > That
> > > > > > > > >>> is fortunately a relatively simple fix.
> > > > > > > > >>>
> > > > > > > > >>> Despite this workaround, I think we should do an
> immediate
> > > > > release
> > > > > > > with
> > > > > > > > >> the
> > > > > > > > >>> updated dependency. Meaning not waiting for the next bug
> > fix
> > > > > releases
> > > > > > > > >>> coming in a few weeks, but releasing asap.
> > > > > > > > >>> The mood I perceive in the industry is pretty much
> panicky
> > > over
> > > > > this,
> > > > > > > > >> and I
> > > > > > > > >>> expect we will see many requests for a patched release
> and
> > > many
> > > > > > > > >> discussions
> > > > > > > > >>> why the workaround alone would not be enough due to
> certain
> > > > > > > guidelines.
> > > > > > > > >>>
> > > > > > > > >>> I suggest that we preempt those discussions and create
> > > releases
> > > > > the
> > > > > > > > >>> following way:
> > > > > > > > >>>
> > > > > > > > >>>  - we take the latest already released versions from each
> > > > release
> > > > > > > > >> branch:
> > > > > > > > >>> ==> 1.14.0, 1.13.3, 1.12.5, 1.11.4
> > > > > > > > >>>  - we add a single commit to those that just updates the
> > > log4j
> > > > > > > > >> dependency
> > > > > > > > >>>  - we release those as 1.14.1, 1.13.4, 1.12.6, 1.11.5,
> etc.
> > > > > > > > >>>  - that way we 

Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2021-12-13 Thread Nicholas Jiang
Hi Konstantin,

   About the renaming for the Rule, I mean that the difference between the Rule 
and Pattern is that the Rule not only contains the Pattern, but also how to 
match the Pattern, and how to process after matched. If renaming 
DynamicPattern, I'm concerned that this name couldn't represent how to match 
the Pattern, and how to process after matched but the Rule could explain this. 
Therefore I prefer to rename the Rule not the DynamicPattern.

Best,
Nicholas Jiang


On 2021/12/13 08:23:23 Konstantin Knauf wrote:
> Hi Nicholas,
> 
> I am not sure I understand your question about renaming. I think the most
> important member of the current Rule class is the Pattern, the KeySelector
> and PatternProcessFunction are more auxiliary if you will. That's why, I
> think, it would be ok to rename Rule to DynamicPatternHolder although it
> contains more than just a Pattern.
> 
> Cheers,
> 
> Konstantin
> 
> On Mon, Dec 13, 2021 at 9:16 AM Nicholas Jiang 
> wrote:
> 
> > Hi Konstantin,
> >
> >Thanks for your feedback. The point that add a timestamp to each rule
> > that determines the start time from which the rule makes sense to me. At
> > present, The timestamp is current time at default, so no timestamp field
> > represents the start time from which the rule. And about the renaming rule,
> > your suggestion looks good to me and no any new concept introduces. But
> > does this introduce Rule concept or reuse the Pattern concept for the
> > DynamicPattern renaming?
> >
> > Best,
> > Nicholas Jiang
> >
> > On 2021/12/13 07:45:04 Konstantin Knauf wrote:
> > > Thanks, Yufeng, for starting this discussion. I think this will be a very
> > > popular feature. I've seen a lot of users asking for this in the past.
> > So,
> > > generally big +1.
> > >
> > > I think we should have a rough idea on how to expose this feature in the
> > > other APIs.
> > >
> > > Two ideas:
> > >
> > > 1. In order to make this more deterministic in case of reprocessing and
> > > out-of-orderness, I am wondering if we can add a timestamp to each rule
> > > that determines the start time from which the rule should be in effect.
> > > This can be an event or a processing time depending on the
> > characteristics
> > > of the pipeline. The timestamp would default to Long.MIN_TIMESTAMP if not
> > > provided, which means effectively immediately. This could also be a
> > follow
> > > up, if you think it will make the implementation too complicated
> > initially.
> > >
> > > 2. I am wondering, if we should name Rule->DynamicPatternHolder or so and
> > > CEP.rule-> CEP.dynamicPatterns instead (other classes correspondingly)?
> > > Rule is quite ambiguous and DynamicPattern seems more descriptive to me.
> > >
> > > On Mon, Dec 13, 2021 at 4:30 AM Nicholas Jiang  > >
> > > wrote:
> > >
> > > > Hi Martijn,
> > > >
> > > >IMO, in this FLIP, we only need to introduce the general design of
> > the
> > > > Table API/SQL level. As for the design details, you can create a new
> > FLIP.
> > > > And do we need to take into account the support for Batch mode if you
> > > > expand the MATCH_RECOGNIZE function? About the dynamic rule engine
> > design,
> > > > do you have any comments? This core of the FLIP is about the multiple
> > rule
> > > > and dynamic rule changing mechanism.
> > > >
> > > > Best,
> > > > Nicholas Jiang
> > > >
> > >
> > >
> > > --
> > >
> > > Konstantin Knauf
> > >
> > > https://twitter.com/snntrable
> > >
> > > https://github.com/knaufk
> > >
> >
> 
> 
> -- 
> 
> Konstantin Knauf
> 
> https://twitter.com/snntrable
> 
> https://github.com/knaufk
> 


Re: [DISCUSS] Immediate dedicated Flink releases for log4j vulnerability

2021-12-13 Thread Jing Ge
+1

As I suggested to publish the blog post last week asap, users have been
keen to have such urgent releases. Many thanks for it.



On Mon, Dec 13, 2021 at 8:29 AM Konstantin Knauf  wrote:

> +1
>
> I didn't think this was necessary when I published the blog post on Friday,
> but this has made higher waves than I expected over the weekend.
>
>
>
> On Mon, Dec 13, 2021 at 8:23 AM Yuan Mei  wrote:
>
> > +1 for quick release.
> >
> > On Mon, Dec 13, 2021 at 2:55 PM Martijn Visser 
> > wrote:
> >
> > > +1 to address the issue like this
> > >
> > > On Mon, 13 Dec 2021 at 07:46, Jingsong Li 
> > wrote:
> > >
> > > > +1 for fixing it in these versions and doing quick releases. Looks
> good
> > > to
> > > > me.
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Mon, Dec 13, 2021 at 2:18 PM Becket Qin 
> > wrote:
> > > > >
> > > > > +1. The solution sounds good to me. There have been a lot of
> > inquiries
> > > > > about how to react to this.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Mon, Dec 13, 2021 at 12:40 PM Prasanna kumar <
> > > > > prasannakumarram...@gmail.com> wrote:
> > > > >
> > > > > > 1+ for making Updates for 1.12.5 .
> > > > > > We are looking for fix in 1.12 version.
> > > > > > Please notify once the fix is done.
> > > > > >
> > > > > >
> > > > > > On Mon, Dec 13, 2021 at 9:45 AM Leonard Xu 
> > > wrote:
> > > > > >
> > > > > > > +1 for the quick release and the special vote period 24h.
> > > > > > >
> > > > > > > > 2021年12月13日 上午11:49,Dian Fu  写道:
> > > > > > > >
> > > > > > > > +1 for the proposal and creating a quick release.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Dian
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Dec 13, 2021 at 11:15 AM Kyle Bendickson <
> > > k...@tabular.io>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >> +1 to doing a release for this widely publicized
> > vulnerability.
> > > > > > > >>
> > > > > > > >> In my experience, users will often update to the latest
> minor
> > > > patch
> > > > > > > version
> > > > > > > >> without much fuss. Plus, users have also likely heard about
> > this
> > > > and
> > > > > > > will
> > > > > > > >> appreciate a simple fix (updating their version where
> > possible).
> > > > > > > >>
> > > > > > > >> The work-around will need to still be noted for users who
> > can’t
> > > > > > upgrade
> > > > > > > for
> > > > > > > >> whatever reason (EMR hasn’t caught up, etc).
> > > > > > > >>
> > > > > > > >> I also agree with your assessment to apply a patch on each
> of
> > > > those
> > > > > > > >> previous versions with only the log4j commit, so that they
> > don’t
> > > > need
> > > > > > > to be
> > > > > > > >> as rigorously tested.
> > > > > > > >>
> > > > > > > >> Best,
> > > > > > > >> Kyle (GitHub @kbendick)
> > > > > > > >>
> > > > > > > >> On Sun, Dec 12, 2021 at 2:23 PM Stephan Ewen <
> > se...@apache.org>
> > > > > > wrote:
> > > > > > > >>
> > > > > > > >>> Hi all!
> > > > > > > >>>
> > > > > > > >>> Without doubt, you heard about the log4j vulnerability [1].
> > > > > > > >>>
> > > > > > > >>> There is an advisory blog post on how to mitigate this in
> > > Apache
> > > > > > Flink
> > > > > > > >> [2],
> > > > > > > >>> which involves setting a config option and restarting the
> > > > processes.
> > > > > > > That
> > > > > > > >>> is fortunately a relatively simple fix.
> > > > > > > >>>
> > > > > > > >>> Despite this workaround, I think we should do an immediate
> > > > release
> > > > > > with
> > > > > > > >> the
> > > > > > > >>> updated dependency. Meaning not waiting for the next bug
> fix
> > > > releases
> > > > > > > >>> coming in a few weeks, but releasing asap.
> > > > > > > >>> The mood I perceive in the industry is pretty much panicky
> > over
> > > > this,
> > > > > > > >> and I
> > > > > > > >>> expect we will see many requests for a patched release and
> > many
> > > > > > > >> discussions
> > > > > > > >>> why the workaround alone would not be enough due to certain
> > > > > > guidelines.
> > > > > > > >>>
> > > > > > > >>> I suggest that we preempt those discussions and create
> > releases
> > > > the
> > > > > > > >>> following way:
> > > > > > > >>>
> > > > > > > >>>  - we take the latest already released versions from each
> > > release
> > > > > > > >> branch:
> > > > > > > >>> ==> 1.14.0, 1.13.3, 1.12.5, 1.11.4
> > > > > > > >>>  - we add a single commit to those that just updates the
> > log4j
> > > > > > > >> dependency
> > > > > > > >>>  - we release those as 1.14.1, 1.13.4, 1.12.6, 1.11.5, etc.
> > > > > > > >>>  - that way we don't need to do functional release tests,
> > > > because the
> > > > > > > >>> released code is identical to the previous release, except
> > for
> > > > the
> > > > > > > log4j
> > > > > > > >>> dependency
> > > > > > > >>>  - we can then continue the work on the upcoming bugfix
> > > releases
> > > > as
> > > > > > > >>> planned, without high pressure
> > > > > > > >>>
> > > > > 

Re: [DISCUSS] Immediate dedicated Flink releases for log4j vulnerability

2021-12-13 Thread Chesnay Schepler

I will start preparing the release candidates.

On 12/12/2021 23:23, Stephan Ewen wrote:

Hi all!

Without doubt, you heard about the log4j vulnerability [1].

There is an advisory blog post on how to mitigate this in Apache Flink [2],
which involves setting a config option and restarting the processes. That
is fortunately a relatively simple fix.

Despite this workaround, I think we should do an immediate release with the
updated dependency. Meaning not waiting for the next bug fix releases
coming in a few weeks, but releasing asap.
The mood I perceive in the industry is pretty much panicky over this, and I
expect we will see many requests for a patched release and many discussions
why the workaround alone would not be enough due to certain guidelines.

I suggest that we preempt those discussions and create releases the
following way:

   - we take the latest already released versions from each release branch:
  ==> 1.14.0, 1.13.3, 1.12.5, 1.11.4
   - we add a single commit to those that just updates the log4j dependency
   - we release those as 1.14.1, 1.13.4, 1.12.6, 1.11.5, etc.
   - that way we don't need to do functional release tests, because the
released code is identical to the previous release, except for the log4j
dependency
   - we can then continue the work on the upcoming bugfix releases as
planned, without high pressure

I would suggest creating those RCs immediately and release them with a
special voting period (24h or so).

WDYT?

Best,
Stephan

[1] https://nvd.nist.gov/vuln/detail/CVE-2021-44228
[2] https://flink.apache.org/2021/12/10/log4j-cve.html





Re: [DISCUSS] FLIP-190: Support Version Upgrades for Table API & SQL Programs

2021-12-13 Thread Timo Walther

Hi everyone,

*last call for feedback* on this FLIP. Otherwise I would start a VOTE by 
tomorrow.


@Wenlong: Thanks for offering your help. Once the FLIP has been 
accepted. I will create a list of subtasks that we can split among 
contributors. Many can be implemented in parallel.


Regards,
Timo


On 13.12.21 09:20, wenlong.lwl wrote:

Hi, Timo, +1 for the improvement too. Thanks for the great job.

Looking forward to the next progress of the FLIP, I could help on the
development of some of the specific improvements.

Best,
Wenlong

On Mon, 13 Dec 2021 at 14:43, godfrey he  wrote:


Hi Timo,

+1 for the improvement.

Best,
Godfrey

Timo Walther  于2021年12月10日周五 20:37写道:


Hi Wenlong,

yes it will. Sorry for the confusion. This is a logical consequence of
the assumption:

The JSON plan contains no implementation details (i.e. no classes) and
is fully declarative.

I will add a remark.

Thanks,
Timo


On 10.12.21 11:43, wenlong.lwl wrote:

hi, Timo, thanks for the explanation. I totally agree with what you

said.

My actual question is: Will the version of an exec node be serialised

in

the Json Plan? In my understanding, it is not in the former design. If

it

is yes, my question is solved already.


Best,
Wenlong


On Fri, 10 Dec 2021 at 18:15, Timo Walther  wrote:


Hi Wenlong,

also thought about adding a `flinkVersion` field per ExecNode. But

this

is not necessary, because the `version` of the ExecNode has the same
purpose.

The plan version just encodes that:
"plan has been updated in Flink 1.17" / "plan is entirely valid for
Flink 1.17"

The ExecNode version maps to `minStateVersion` to verify state
compatibility.

So even if the plan version is 1.17, some ExecNodes use state layout

of

1.15.

It is totally fine to only update the ExecNode to version 2 and not 3

in

your example.

Regards,
Timo



On 10.12.21 06:02, wenlong.lwl wrote:

Hi, Timo, thanks for updating the doc.

I have a comment on plan migration:
I think we may need to add a version field for every exec node when
serialising. In earlier discussions, I think we have a conclusion

that

treating the version of plan as the version of node, but in this

case it

would be broken.
Take the following example in FLIP into consideration, there is a bad

case:

when in 1.17, we introduced an incompatible version 3 and dropped

version

1, we can only update the version to 2, so the version should be per

exec

node.

ExecNode version *1* is not supported anymore. Even though the state

is

actually compatible. The plan restore will fail with a helpful

exception

that forces users to perform plan migration.

COMPILE PLAN '/mydir/plan_new.json' FROM '/mydir/plan_old.json';

The plan migration will safely replace the old version *1* with *2.

The

JSON plan flinkVersion changes to 1.17.*


Best,

Wenlong

On Thu, 9 Dec 2021 at 18:36, Timo Walther 

wrote:



Hi Jing and Godfrey,

I had another iteration over the document. There are two major

changes:


1. Supported Flink Upgrade Versions

I got the feedback via various channels that a step size of one

minor

version is not very convenient. As you said, "because upgrading to

a new

version is a time-consuming process". I rephrased this section:

Upgrading usually involves work which is why many users perform this
task rarely (e.g. only once per year). Also skipping a versions is
common until a new feature has been introduced for which is it

worth to

upgrade. We will support the upgrade to the most recent Flink

version

from a set of previous versions. We aim to support upgrades from the
last 2-3 releases on a best-effort basis; maybe even more depending

on

the maintenance overhead. However, in order to not grow the testing
matrix infinitely and to perform important refactoring if

necessary, we

only guarantee upgrades with a step size of a single minor version

(i.e.

a cascade of upgrades).

2. Annotation Design

I also adopted the multiple annotations design for the previous
supportPlanFormat. So no array of versions anymore. I reworked the
section, please have a look with updated examples:






https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489#FLIP190:SupportVersionUpgradesforTableAPI


I also got the feedback offline that `savepoint` might not be the

right

terminology for the annotation. I changed that to minPlanVersion and
minStateVersion.

Let me know what you think.

Regards,
Timo



On 09.12.21 08:44, Jing Zhang wrote:

Hi Timo,
Thanks a lot for driving this discussion.
I believe it could solve many problems what we are suffering in

upgrading.


I only have a little complain on the following point.


For simplification of the design, we assume that upgrades use a

step

size

of a single minor version. We don't guarantee skipping minor

versions

(e.g.

1.11 to
1.14).

In our internal production environment, we follow up with the

community's

latest stable release version almost once a year because upgrading

to a

new

version is a 

Re: [VOTE] FLIP-196: Source API stability guarantees

2021-12-13 Thread Chesnay Schepler

+1

On 10/12/2021 18:07, Till Rohrmann wrote:

Hi everyone,

I'd like to start a vote on FLIP-196: Source API stability guarantees [1]
which has been discussed in this thread [2].

The vote will be open for at least 72 hours unless there is an objection or
not enough votes.

[1] https://cwiki.apache.org/confluence/x/IJeqCw
[2] https://lists.apache.org/thread/gkczh583ovlo1fpj7l61cnr2zl695xkp

Cheers,
Till





Re: [VOTE] FLIP-197: API stability graduation process

2021-12-13 Thread Chesnay Schepler

+1

On 10/12/2021 18:09, Till Rohrmann wrote:

Hi everyone,

I'd like to start a vote on FLIP-197: API stability graduation process [1]
which has been discussed in this thread [2].

The vote will be open for at least 72 hours unless there is an objection or
not enough votes.

[1] https://cwiki.apache.org/confluence/x/J5eqCw
[2] https://lists.apache.org/thread/gjgr3b3w2c379ny7on3khjgwyjp2gyq1

Cheers,
Till





Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2021-12-13 Thread Konstantin Knauf
Hi Nicholas,

I am not sure I understand your question about renaming. I think the most
important member of the current Rule class is the Pattern, the KeySelector
and PatternProcessFunction are more auxiliary if you will. That's why, I
think, it would be ok to rename Rule to DynamicPatternHolder although it
contains more than just a Pattern.

Cheers,

Konstantin

On Mon, Dec 13, 2021 at 9:16 AM Nicholas Jiang 
wrote:

> Hi Konstantin,
>
>Thanks for your feedback. The point that add a timestamp to each rule
> that determines the start time from which the rule makes sense to me. At
> present, The timestamp is current time at default, so no timestamp field
> represents the start time from which the rule. And about the renaming rule,
> your suggestion looks good to me and no any new concept introduces. But
> does this introduce Rule concept or reuse the Pattern concept for the
> DynamicPattern renaming?
>
> Best,
> Nicholas Jiang
>
> On 2021/12/13 07:45:04 Konstantin Knauf wrote:
> > Thanks, Yufeng, for starting this discussion. I think this will be a very
> > popular feature. I've seen a lot of users asking for this in the past.
> So,
> > generally big +1.
> >
> > I think we should have a rough idea on how to expose this feature in the
> > other APIs.
> >
> > Two ideas:
> >
> > 1. In order to make this more deterministic in case of reprocessing and
> > out-of-orderness, I am wondering if we can add a timestamp to each rule
> > that determines the start time from which the rule should be in effect.
> > This can be an event or a processing time depending on the
> characteristics
> > of the pipeline. The timestamp would default to Long.MIN_TIMESTAMP if not
> > provided, which means effectively immediately. This could also be a
> follow
> > up, if you think it will make the implementation too complicated
> initially.
> >
> > 2. I am wondering, if we should name Rule->DynamicPatternHolder or so and
> > CEP.rule-> CEP.dynamicPatterns instead (other classes correspondingly)?
> > Rule is quite ambiguous and DynamicPattern seems more descriptive to me.
> >
> > On Mon, Dec 13, 2021 at 4:30 AM Nicholas Jiang  >
> > wrote:
> >
> > > Hi Martijn,
> > >
> > >IMO, in this FLIP, we only need to introduce the general design of
> the
> > > Table API/SQL level. As for the design details, you can create a new
> FLIP.
> > > And do we need to take into account the support for Batch mode if you
> > > expand the MATCH_RECOGNIZE function? About the dynamic rule engine
> design,
> > > do you have any comments? This core of the FLIP is about the multiple
> rule
> > > and dynamic rule changing mechanism.
> > >
> > > Best,
> > > Nicholas Jiang
> > >
> >
> >
> > --
> >
> > Konstantin Knauf
> >
> > https://twitter.com/snntrable
> >
> > https://github.com/knaufk
> >
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [DISCUSS] FLIP-190: Support Version Upgrades for Table API & SQL Programs

2021-12-13 Thread wenlong.lwl
Hi, Timo, +1 for the improvement too. Thanks for the great job.

Looking forward to the next progress of the FLIP, I could help on the
development of some of the specific improvements.

Best,
Wenlong

On Mon, 13 Dec 2021 at 14:43, godfrey he  wrote:

> Hi Timo,
>
> +1 for the improvement.
>
> Best,
> Godfrey
>
> Timo Walther  于2021年12月10日周五 20:37写道:
> >
> > Hi Wenlong,
> >
> > yes it will. Sorry for the confusion. This is a logical consequence of
> > the assumption:
> >
> > The JSON plan contains no implementation details (i.e. no classes) and
> > is fully declarative.
> >
> > I will add a remark.
> >
> > Thanks,
> > Timo
> >
> >
> > On 10.12.21 11:43, wenlong.lwl wrote:
> > > hi, Timo, thanks for the explanation. I totally agree with what you
> said.
> > > My actual question is: Will the version of an exec node be serialised
> in
> > > the Json Plan? In my understanding, it is not in the former design. If
> it
> > > is yes, my question is solved already.
> > >
> > >
> > > Best,
> > > Wenlong
> > >
> > >
> > > On Fri, 10 Dec 2021 at 18:15, Timo Walther  wrote:
> > >
> > >> Hi Wenlong,
> > >>
> > >> also thought about adding a `flinkVersion` field per ExecNode. But
> this
> > >> is not necessary, because the `version` of the ExecNode has the same
> > >> purpose.
> > >>
> > >> The plan version just encodes that:
> > >> "plan has been updated in Flink 1.17" / "plan is entirely valid for
> > >> Flink 1.17"
> > >>
> > >> The ExecNode version maps to `minStateVersion` to verify state
> > >> compatibility.
> > >>
> > >> So even if the plan version is 1.17, some ExecNodes use state layout
> of
> > >> 1.15.
> > >>
> > >> It is totally fine to only update the ExecNode to version 2 and not 3
> in
> > >> your example.
> > >>
> > >> Regards,
> > >> Timo
> > >>
> > >>
> > >>
> > >> On 10.12.21 06:02, wenlong.lwl wrote:
> > >>> Hi, Timo, thanks for updating the doc.
> > >>>
> > >>> I have a comment on plan migration:
> > >>> I think we may need to add a version field for every exec node when
> > >>> serialising. In earlier discussions, I think we have a conclusion
> that
> > >>> treating the version of plan as the version of node, but in this
> case it
> > >>> would be broken.
> > >>> Take the following example in FLIP into consideration, there is a bad
> > >> case:
> > >>> when in 1.17, we introduced an incompatible version 3 and dropped
> version
> > >>> 1, we can only update the version to 2, so the version should be per
> exec
> > >>> node.
> > >>>
> > >>> ExecNode version *1* is not supported anymore. Even though the state
> is
> > >>> actually compatible. The plan restore will fail with a helpful
> exception
> > >>> that forces users to perform plan migration.
> > >>>
> > >>> COMPILE PLAN '/mydir/plan_new.json' FROM '/mydir/plan_old.json';
> > >>>
> > >>> The plan migration will safely replace the old version *1* with *2.
> The
> > >>> JSON plan flinkVersion changes to 1.17.*
> > >>>
> > >>>
> > >>> Best,
> > >>>
> > >>> Wenlong
> > >>>
> > >>> On Thu, 9 Dec 2021 at 18:36, Timo Walther 
> wrote:
> > >>>
> >  Hi Jing and Godfrey,
> > 
> >  I had another iteration over the document. There are two major
> changes:
> > 
> >  1. Supported Flink Upgrade Versions
> > 
> >  I got the feedback via various channels that a step size of one
> minor
> >  version is not very convenient. As you said, "because upgrading to
> a new
> >  version is a time-consuming process". I rephrased this section:
> > 
> >  Upgrading usually involves work which is why many users perform this
> >  task rarely (e.g. only once per year). Also skipping a versions is
> >  common until a new feature has been introduced for which is it
> worth to
> >  upgrade. We will support the upgrade to the most recent Flink
> version
> >  from a set of previous versions. We aim to support upgrades from the
> >  last 2-3 releases on a best-effort basis; maybe even more depending
> on
> >  the maintenance overhead. However, in order to not grow the testing
> >  matrix infinitely and to perform important refactoring if
> necessary, we
> >  only guarantee upgrades with a step size of a single minor version
> (i.e.
> >  a cascade of upgrades).
> > 
> >  2. Annotation Design
> > 
> >  I also adopted the multiple annotations design for the previous
> >  supportPlanFormat. So no array of versions anymore. I reworked the
> >  section, please have a look with updated examples:
> > 
> > 
> > 
> > >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489#FLIP190:SupportVersionUpgradesforTableAPI
> > 
> >  I also got the feedback offline that `savepoint` might not be the
> right
> >  terminology for the annotation. I changed that to minPlanVersion and
> >  minStateVersion.
> > 
> >  Let me know what you think.
> > 
> >  Regards,
> >  Timo
> > 
> > 
> > 
> >  On 09.12.21 08:44, Jing Zhang 

Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2021-12-13 Thread Nicholas Jiang
Hi Konstantin,

   Thanks for your feedback. The point that add a timestamp to each rule that 
determines the start time from which the rule makes sense to me. At present, 
The timestamp is current time at default, so no timestamp field represents the 
start time from which the rule. And about the renaming rule, your suggestion 
looks good to me and no any new concept introduces. But does this introduce 
Rule concept or reuse the Pattern concept for the DynamicPattern renaming?

Best,
Nicholas Jiang

On 2021/12/13 07:45:04 Konstantin Knauf wrote:
> Thanks, Yufeng, for starting this discussion. I think this will be a very
> popular feature. I've seen a lot of users asking for this in the past. So,
> generally big +1.
> 
> I think we should have a rough idea on how to expose this feature in the
> other APIs.
> 
> Two ideas:
> 
> 1. In order to make this more deterministic in case of reprocessing and
> out-of-orderness, I am wondering if we can add a timestamp to each rule
> that determines the start time from which the rule should be in effect.
> This can be an event or a processing time depending on the characteristics
> of the pipeline. The timestamp would default to Long.MIN_TIMESTAMP if not
> provided, which means effectively immediately. This could also be a follow
> up, if you think it will make the implementation too complicated initially.
> 
> 2. I am wondering, if we should name Rule->DynamicPatternHolder or so and
> CEP.rule-> CEP.dynamicPatterns instead (other classes correspondingly)?
> Rule is quite ambiguous and DynamicPattern seems more descriptive to me.
> 
> On Mon, Dec 13, 2021 at 4:30 AM Nicholas Jiang 
> wrote:
> 
> > Hi Martijn,
> >
> >IMO, in this FLIP, we only need to introduce the general design of the
> > Table API/SQL level. As for the design details, you can create a new FLIP.
> > And do we need to take into account the support for Batch mode if you
> > expand the MATCH_RECOGNIZE function? About the dynamic rule engine design,
> > do you have any comments? This core of the FLIP is about the multiple rule
> > and dynamic rule changing mechanism.
> >
> > Best,
> > Nicholas Jiang
> >
> 
> 
> -- 
> 
> Konstantin Knauf
> 
> https://twitter.com/snntrable
> 
> https://github.com/knaufk
>