[VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #2

2020-03-27 Thread Tzu-Li (Gordon) Tai
Hi everyone,

Please review and vote on the release candidate #2 for the version 2.0.0 of
Apache Flink Stateful Functions,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

**Testing Guideline**

You can find here [1] a doc that we can use for collaborating testing
efforts.
The listed testing tasks in the doc also serve as a guideline in what to
test for this release.
If you wish to take ownership of a testing task, simply put your name down
in the "Checked by" field of the task.

**Release Overview**

As an overview, the release consists of the following:
a) Stateful Functions canonical source distribution, to be deployed to the
release repository at dist.apache.org
b) Stateful Functions Python SDK distributions to be deployed to PyPI
c) Maven artifacts to be deployed to the Maven Central Repository

**Staging Areas to Review**

The staging areas containing the above mentioned artifacts are as follows,
for your review:
* All artifacts for a) and b) can be found in the corresponding dev
repository at dist.apache.org [2]
* All artifacts for c) can be found at the Apache Nexus Repository [3]

All artifacts are singed with the
key 1C1E2394D3194E1944613488F320986D35C33D6A [4]

Other links for your review:
* JIRA release notes [5]
* source code tag "release-2.0.0-rc2" [6] [7]

**Extra Remarks**

* Part of the release is also official Docker images for Stateful
Functions. This can be a separate process, since the creation of those
relies on the fact that we have distribution jars already deployed to
Maven. I will follow-up with this after these artifacts are officially
released.
In the meantime, there is this discussion [8] ongoing about where to host
the StateFun Dockerfiles.
* The Flink Website and blog post is also being worked on (by Marta) as
part of the release, to incorporate the new Stateful Functions project. We
can follow up with a link to those changes afterwards in this vote thread,
but that would not block you to test and cast your votes already.
* Since the Flink website changes are still being worked on, you will not
yet be able to find the Stateful Functions docs from there. Here are the
links [9] [10].

**Vote Duration**

The vote will be open for at least 72 hours *(target end date is next
Tuesday, April 31).*
It is adopted by majority approval, with at least 3 PMC affirmative votes.

Thanks,
Gordon

[1]
https://docs.google.com/document/d/1P9yjwSbPQtul0z2AXMnVolWQbzhxs68suJvzR6xMjcs/edit?usp=sharing
[2] https://dist.apache.org/repos/dist/dev/flink/flink-statefun-2.0.0-rc2/
[3] https://repository.apache.org/content/repositories/orgapacheflink-1340/
[4] https://dist.apache.org/repos/dist/release/flink/KEYS
[5]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346878
[6]
https://gitbox.apache.org/repos/asf?p=flink-statefun.git;a=commit;h=14ce58048a3dda792f2329cf14d30aa952f6cb24
[7] https://github.com/apache/flink-statefun/tree/release-2.0.0-rc2
[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Creating-a-new-repo-to-host-Stateful-Functions-Dockerfiles-td39342.html
[9] https://ci.apache.org/projects/flink/flink-statefun-docs-master/
[10] https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/

TIP: You can create a `settings.xml` file with these contents:

"""

  
flink-statefun-2.0.0
  
  

  flink-statefun-2.0.0
  

  flink-statefun-2.0.0
  
https://repository.apache.org/content/repositories/orgapacheflink-1340/



  archetype
  
https://repository.apache.org/content/repositories/orgapacheflink-1340/


  

  

"""

And reference that in you maven commands via `--settings
path/to/settings.xml`.
This is useful for creating a quickstart based on the staged release and
for building against the staged jars.


Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #1

2020-03-27 Thread Tzu-Li (Gordon) Tai
This vote is cancelled.

Please see the voting thread for the new candidate, RC2:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Apache-Flink-Stateful-Functions-Release-2-0-0-release-candidate-2-td39379.html

On Fri, Mar 27, 2020 at 2:39 PM Tzu-Li (Gordon) Tai 
wrote:

> -1
>
> Already discovered that the source distribution NOTICE file is missing
> mentions for font-awesome.
> The source of that is bundled under "docs/page/font-awesome/fonts", and is
> licensed with SIL OFL 1.1 license, which makes it a requirement to be
> listed in the NOTICE file.
>
> I'll open a new RC2 with only the changes to the source NOTICE and LICENSE
> files.
>
>
>
> On Fri, Mar 27, 2020 at 10:37 AM Tzu-Li (Gordon) Tai 
> wrote:
>
>> Also, here is the documentation for Stateful Functions for those who were
>> wondering:
>> master - https://ci.apache.org/projects/flink/flink-statefun-docs-master/
>> release-2.0 -
>> https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/
>>
>> This is not yet visible directly from the Flink website, since the
>> efforts for incorporating Stateful Functions in the website is still
>> ongoing.
>>
>> On Fri, Mar 27, 2020 at 12:48 AM Tzu-Li (Gordon) Tai 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Please review and vote on the release candidate #0 for the version 2.0.0
>>> of Apache Flink Stateful Functions,
>>> as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>> **Testing Guideline**
>>>
>>> You can find here [1] a doc that we can use for collaborating testing
>>> efforts.
>>> The listed testing tasks in the doc also serve as a guideline in what to
>>> test for this release.
>>> If you wish to take ownership of a testing task, simply put your name
>>> down in the "Checked by" field of the task.
>>>
>>> **Release Overview**
>>>
>>> As an overview, the release consists of the following:
>>> a) Stateful Functions canonical source distribution, to be deployed to
>>> the release repository at dist.apache.org
>>> b) Stateful Functions Python SDK distributions to be deployed to PyPI
>>> c) Maven artifacts to be deployed to the Maven Central Repository
>>>
>>> **Staging Areas to Review**
>>>
>>> The staging areas containing the above mentioned artifacts are as
>>> follows, for your review:
>>> * All artifacts for a) and b) can be found in the corresponding dev
>>> repository at dist.apache.org [2]
>>> * All artifacts for c) can be found at the Apache Nexus Repository [3]
>>>
>>> All artifacts are singed with the
>>> key 1C1E2394D3194E1944613488F320986D35C33D6A [4]
>>>
>>> Other links for your review:
>>> * JIRA release notes [5]
>>> * source code tag "release-2.0.0-rc0" [6] [7]
>>>
>>> **Extra Remarks**
>>>
>>> * Part of the release is also official Docker images for Stateful
>>> Functions. This can be a separate process, since the creation of those
>>> relies on the fact that we have distribution jars already deployed to
>>> Maven. I will follow-up with this after these artifacts are officially
>>> released.
>>> In the meantime, there is this discussion [8] ongoing about where to
>>> host the StateFun Dockerfiles.
>>> * The Flink Website and blog post is also being worked on (by Marta) as
>>> part of the release, to incorporate the new Stateful Functions project. We
>>> can follow up with a link to those changes afterwards in this vote thread,
>>> but that would not block you to test and cast your votes already.
>>>
>>> **Vote Duration**
>>>
>>> The vote will be open for at least 72 hours *(target end date is next
>>> Tuesday, April 31).*
>>> It is adopted by majority approval, with at least 3 PMC
>>> affirmative votes.
>>>
>>> Thanks,
>>> Gordon
>>>
>>> [1]
>>> https://docs.google.com/document/d/1P9yjwSbPQtul0z2AXMnVolWQbzhxs68suJvzR6xMjcs/edit?usp=sharing
>>> [2]
>>> https://dist.apache.org/repos/dist/dev/flink/flink-statefun-2.0.0-rc1/
>>> [3]
>>> https://repository.apache.org/content/repositories/orgapacheflink-1339/
>>> [4] https://dist.apache.org/repos/dist/release/flink/KEYS
>>> [5]
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346878
>>> [6]
>>> https://gitbox.apache.org/repos/asf?p=flink-statefun.git;a=commit;h=ebd7ca866f7d11fa43c7a5bb36861ee1b24b0980
>>> [7] https://github.com/apache/flink-statefun/tree/release-2.0.0-rc1
>>> [8]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Creating-a-new-repo-to-host-Stateful-Functions-Dockerfiles-td39342.html
>>>
>>> TIP: You can create a `settings.xml` file with these contents:
>>>
>>> """
>>> 
>>>   
>>> flink-statefun-2.0.0
>>>   
>>>   
>>> 
>>>   flink-statefun-2.0.0
>>>   
>>> 
>>>   flink-statefun-2.0.0
>>>   
>>> https://repository.apache.org/content/repositories/orgapacheflink-1339/
>>> 
>>> 
>>> 
>>>   archetype
>>>   
>>> https://repository.apache.org/content/repositories/orgapacheflink-1339/
>>> 
>>>

Re: [DISCUSS] FLIP-119: Pipelined Region Scheduling

2020-03-27 Thread Yangze Guo
Thanks for updating!

+1 for supporting the pipelined region scheduling. Although we could
not prevent resource deadlock in all scenarios, it is really a big
step.

The design generally LGTM.

One minor thing I want to make sure. If I understand correctly, the
blocking edge will not be consumable before the upstream is finished.
Without it, when the failure occurs in the upstream region, there is
still possible to have a resource deadlock. I don't know whether it is
an explicit protocol now. But after this FLIP, I think it should not
be broken.
I'm also wondering could we execute the upstream and downstream
regions at the same time if we have enough resources. It can shorten
the running time of large job. We should not break the protocol of
blocking edge. But if it is possible to change the data exchange mode
of two regions dynamically?

Best,
Yangze Guo

On Fri, Mar 27, 2020 at 1:15 PM Zhu Zhu  wrote:
>
> Thanks for reporting this Yangze.
> I have update the permission to those images. Everyone are able to view them 
> now.
>
> Thanks,
> Zhu Zhu
>
> Yangze Guo  于2020年3月27日周五 上午11:25写道:
>>
>> Thanks for driving this discussion, Zhu Zhu & Gary.
>>
>> I found that the image link in this FLIP is not working well. When I
>> open that link, Google doc told me that I have no access privilege.
>> Could you take a look at that issue?
>>
>> Best,
>> Yangze Guo
>>
>> On Fri, Mar 27, 2020 at 1:38 AM Gary Yao  wrote:
>> >
>> > Hi community,
>> >
>> > In the past releases, we have been working on refactoring Flink's scheduler
>> > with the goal of making the scheduler extensible [1]. We have rolled out
>> > most of the intended refactoring in Flink 1.10, and we think it is now time
>> > to leverage our newly introduced abstractions to implement a new resource
>> > optimized scheduling strategy: Pipelined Region Scheduling.
>> >
>> > This scheduling strategy aims at:
>> >
>> > * avoidance of resource deadlocks when running batch jobs
>> >
>> > * tunable with respect to resource consumption and throughput
>> >
>> > More details can be found in the Wiki [2]. We are looking forward to your
>> > feedback.
>> >
>> > Best,
>> >
>> > Zhu Zhu & Gary
>> >
>> > [1] https://issues.apache.org/jira/browse/FLINK-10429
>> >
>> > [2]
>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-119+Pipelined+Region+Scheduling


[jira] [Created] (FLINK-16824) Creating Temporal Table Function via DDL

2020-03-27 Thread Konstantin Knauf (Jira)
Konstantin Knauf created FLINK-16824:


 Summary: Creating Temporal Table Function via DDL
 Key: FLINK-16824
 URL: https://issues.apache.org/jira/browse/FLINK-16824
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: Konstantin Knauf


Currently, a Temporal Table Function can only be created via the Table API or 
indirectly via the configuration file of the SQL Client. 

It would be great, if this was also possible in pure SQL via a DDL statement. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Creating a new repo to host Stateful Functions Dockerfiles

2020-03-27 Thread Gary Yao
+1 for a separate repository.

Best,
Gary

On Fri, Mar 27, 2020 at 2:46 AM Hequn Cheng  wrote:

> +1 for a separate repository.
> The dedicated `flink-docker` repo works fine now. We can do it similarly.
>
> Best,
> Hequn
>
> On Fri, Mar 27, 2020 at 1:16 AM Till Rohrmann 
> wrote:
>
> > +1 for a separate repository.
> >
> > Cheers,
> > Till
> >
> > On Thu, Mar 26, 2020 at 5:13 PM Ufuk Celebi  wrote:
> >
> > > +1.
> > >
> > > The repo creation process is a light-weight, automated process on the
> ASF
> > > side. When Patrick Lucas contributed docker-flink back to the Flink
> > > community (as flink-docker), there was virtually no overhead in
> creating
> > > the repository. Reusing build scripts should still be possible at the
> > cost
> > > of some duplication which is fine imo.
> > >
> > > – Ufuk
> > >
> > > On Thu, Mar 26, 2020 at 4:18 PM Stephan Ewen  wrote:
> > > >
> > > > +1 to a separate repository.
> > > >
> > > > It seems to be best practice in the docker community.
> > > > And since it does not add overhead, why not go with the best
> practice?
> > > >
> > > > Best,
> > > > Stephan
> > > >
> > > >
> > > > On Thu, Mar 26, 2020 at 4:15 PM Tzu-Li (Gordon) Tai <
> > tzuli...@apache.org
> > > >
> > > wrote:
> > > >>
> > > >> Hi Flink devs,
> > > >>
> > > >> As part of a Stateful Functions release, we would like to publish
> > > Stateful
> > > >> Functions Docker images to Dockerhub as an official image.
> > > >>
> > > >> Some background context on Stateful Function images, for those who
> are
> > > not
> > > >> familiar with the project yet:
> > > >>
> > > >>- Stateful Function images are built on top of the Flink official
> > > >>images, with additional StateFun dependencies being added.
> > > >>You can take a look at the scripts we currently use to build the
> > > images
> > > >>locally for development purposes [1].
> > > >>- They are quite important for user experience, since building a
> > > Docker
> > > >>image is the recommended go-to deployment mode for StateFun user
> > > >>applications [2].
> > > >>
> > > >>
> > > >> A prerequisite for all of this is to first decide where we host the
> > > >> Stateful Functions Dockerfiles,
> > > >> before we can proceed with the process of requesting a new official
> > > image
> > > >> repository at Dockerhub.
> > > >>
> > > >> We’re proposing to create a new dedicated repo for this purpose,
> > > >> with the name `apache/flink-statefun-docker`.
> > > >>
> > > >> While we did initially consider integrating the StateFun Dockerfiles
> > to
> > > be
> > > >> hosted together with the Flink ones in the existing
> > > `apache/flink-docker`
> > > >> repo, we had the following concerns:
> > > >>
> > > >>- In general, it is a convention that each official Dockerhub
> image
> > > is
> > > >>backed by a dedicated source repo hosting the Dockerfiles.
> > > >>- The `apache/flink-docker` repo already has quite a few
> dedicated
> > > >>tooling and CI smoke tests specific for the Flink images.
> > > >>- Flink and StateFun have separate versioning schemes and
> > independent
> > > >>release cycles. A new Flink release does not necessarily require
> a
> > > >>“lock-step” to release new StateFun images as well.
> > > >>- Considering the above all-together, and the fact that creating
> a
> > > new
> > > >>repo is rather low-effort, having a separate repo would probably
> > make
> > > more
> > > >>sense here.
> > > >>
> > > >>
> > > >> What do you think?
> > > >>
> > > >> Cheers,
> > > >> Gordon
> > > >>
> > > >> [1]
> > > >>
> > >
> > >
> >
> https://github.com/apache/flink-statefun/blob/master/tools/docker/build-stateful-functions.sh
> > > >> [2]
> > > >>
> > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-statefun-docs-master/deployment-and-operations/packaging.html
> > >
> >
>


Re: Dynamic Flink SQL

2020-03-27 Thread Krzysztof Zarzycki
I want to do a bit different hacky PoC:
* I will write a sink, that caches the results in "JVM global" memory. Then
I will write a source, that reads this cache.
* I will launch one job, that reads from Kafka source, shuffles the data to
the desired partitioning and then sinks to that cache.
* Then I will lunch multiple jobs (Datastream based or Flink SQL based) ,
that uses the source from cache to read the data out and then reinterprets
it as keyed stream [1].
* Using JVM global memory is necessary, because AFAIK the jobs use
different classloaders. The class of cached object also needs to be
available in the parent classloader i.e. in the cluster's classpath.
This is just to prove the idea, the performance and usefulness of it. All
the problems of checkpointing this data I will leave for later.

I'm very very interested in your, community, comments about this idea and
later productization of it.
Thanks!

Answering your comments:

> Unless you need reprocessing for newly added rules, I'd probably just
> cancel with savepoint and restart the application with the new rules. Of
> course, it depends on the rules themselves and how much state they require
> if a restart is viable. That's up to a POC.
>
No, I don't need reprocessing (yet). The rule starts processing the data
from the moment it is defined.
The cancellation with savepoint was considered, but because the number of
new rules defined/changed daily might be large enough, that will generate
too much of downtime. There is a lot of state kept in those rules making
the restart heavy. What's worse, that would be cross-tenant downtime,
unless the job was somehow per team/tenant. Therefore we reject this option.
BTW, the current design of our system is similar to the one from the blog
series by Alexander Fedulov about dynamic rules pattern [2] he's just
publishing.


> They will consume the same high intensive source(s) therefore I want to
>> optimize for that by consuming the messages in Flink only once.
>>
> That's why I proposed to run one big query instead of 500 small ones. Have
> a POC where you add two of your rules manually to a Table and see how the
> optimized logical plan looks like. I'd bet that the source is only tapped
> once.
>

I can do that PoC, no problem. But AFAIK it will only work with the
"restart with savepoint" pattern discussed above.


[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/experimental.html#reinterpreting-a-pre-partitioned-data-stream-as-keyed-stream
[2] https://flink.apache.org/news/2020/03/24/demo-fraud-detection-2.html



> On Wed, Mar 25, 2020 at 6:15 PM Krzysztof Zarzycki 
> wrote:
>
>> Hello Arvid,
>> Thanks for joining to the thread!
>> First, did you take into consideration that I would like to dynamically
>> add queries on the same source? That means first define one query, later
>> the day add another one , then another one, and so on. A Week later kill
>> one of those, start yet another one, etc... There will be hundreds of these
>> queries running at once, but the set of queries change several times a day.
>> They will consume the same high intensive source(s) therefore I want to
>> optimize for that by consuming the messages in Flink only once.
>>
>> Regarding the temporary tables AFAIK they are only the metadata (let's
>> say Kafka topic detals) and store it in the scope of a SQL session.
>> Therefore multiple queries against that temp table will behave the same way
>> as querying normal table, that is will read the datasource multiple times.
>>
>> It looks like the feature I want or could use is defined by the way of
>> FLIP-36 about Interactive Programming, more precisely caching the stream
>> table [1].
>> While I wouldn't like to limit the discussion to that non-existing yet
>> feature. Maybe there are other ways of achieving this danymic querying
>> capability.
>>
>> Kind Regards,
>> Krzysztof
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink#FLIP-36:SupportInteractiveProgramminginFlink-Cacheastreamtable
>>
>>
>>
>> * You want to use primary Table API as that allows you to
>>> programmatically introduce structural variance (changing rules).
>>>
>> * You start by registering the source as temporary table.
>>>
>> * Then you add your rules as SQL through `TableEnvironment#sqlQuery`.
>>> * Lastly you unionAll the results.
>>>
>>> Then I'd perform some experiment if indeed the optimizer figured out
>>> that it needs to only read the source once. The resulting code would be
>>> minimal and easy to maintain. If the performance is not satisfying, you can
>>> always make it more complicated.
>>>
>>> Best,
>>>
>>> Arvid
>>>
>>>
>>> On Mon, Mar 23, 2020 at 7:02 PM Krzysztof Zarzycki 
>>> wrote:
>>>
 Dear Flink community!

 In our company we have implemented a system that realize the dynamic
 business rules pattern. We spoke about it during Flink Forward 2019
 https://www.youtube.com/watch?v=CyrQ5B0exqU.
 

Re: [DISCUSS] Creating a new repo to host Stateful Functions Dockerfiles

2020-03-27 Thread Zhijiang
+1 for this proposal. Very reasonable analysis!

Best,
Zhijiang 


--
From:Hequn Cheng 
Send Time:2020 Mar. 27 (Fri.) 09:46
To:dev 
Cc:private 
Subject:Re: [DISCUSS] Creating a new repo to host Stateful Functions Dockerfiles

+1 for a separate repository.
The dedicated `flink-docker` repo works fine now. We can do it similarly.

Best,
Hequn

On Fri, Mar 27, 2020 at 1:16 AM Till Rohrmann  wrote:

> +1 for a separate repository.
>
> Cheers,
> Till
>
> On Thu, Mar 26, 2020 at 5:13 PM Ufuk Celebi  wrote:
>
> > +1.
> >
> > The repo creation process is a light-weight, automated process on the ASF
> > side. When Patrick Lucas contributed docker-flink back to the Flink
> > community (as flink-docker), there was virtually no overhead in creating
> > the repository. Reusing build scripts should still be possible at the
> cost
> > of some duplication which is fine imo.
> >
> > – Ufuk
> >
> > On Thu, Mar 26, 2020 at 4:18 PM Stephan Ewen  wrote:
> > >
> > > +1 to a separate repository.
> > >
> > > It seems to be best practice in the docker community.
> > > And since it does not add overhead, why not go with the best practice?
> > >
> > > Best,
> > > Stephan
> > >
> > >
> > > On Thu, Mar 26, 2020 at 4:15 PM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org
> > >
> > wrote:
> > >>
> > >> Hi Flink devs,
> > >>
> > >> As part of a Stateful Functions release, we would like to publish
> > Stateful
> > >> Functions Docker images to Dockerhub as an official image.
> > >>
> > >> Some background context on Stateful Function images, for those who are
> > not
> > >> familiar with the project yet:
> > >>
> > >>- Stateful Function images are built on top of the Flink official
> > >>images, with additional StateFun dependencies being added.
> > >>You can take a look at the scripts we currently use to build the
> > images
> > >>locally for development purposes [1].
> > >>- They are quite important for user experience, since building a
> > Docker
> > >>image is the recommended go-to deployment mode for StateFun user
> > >>applications [2].
> > >>
> > >>
> > >> A prerequisite for all of this is to first decide where we host the
> > >> Stateful Functions Dockerfiles,
> > >> before we can proceed with the process of requesting a new official
> > image
> > >> repository at Dockerhub.
> > >>
> > >> We’re proposing to create a new dedicated repo for this purpose,
> > >> with the name `apache/flink-statefun-docker`.
> > >>
> > >> While we did initially consider integrating the StateFun Dockerfiles
> to
> > be
> > >> hosted together with the Flink ones in the existing
> > `apache/flink-docker`
> > >> repo, we had the following concerns:
> > >>
> > >>- In general, it is a convention that each official Dockerhub image
> > is
> > >>backed by a dedicated source repo hosting the Dockerfiles.
> > >>- The `apache/flink-docker` repo already has quite a few dedicated
> > >>tooling and CI smoke tests specific for the Flink images.
> > >>- Flink and StateFun have separate versioning schemes and
> independent
> > >>release cycles. A new Flink release does not necessarily require a
> > >>“lock-step” to release new StateFun images as well.
> > >>- Considering the above all-together, and the fact that creating a
> > new
> > >>repo is rather low-effort, having a separate repo would probably
> make
> > more
> > >>sense here.
> > >>
> > >>
> > >> What do you think?
> > >>
> > >> Cheers,
> > >> Gordon
> > >>
> > >> [1]
> > >>
> >
> >
> https://github.com/apache/flink-statefun/blob/master/tools/docker/build-stateful-functions.sh
> > >> [2]
> > >>
> >
> >
> https://ci.apache.org/projects/flink/flink-statefun-docs-master/deployment-and-operations/packaging.html
> >
>



Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-27 Thread Márton Balassi
Hi Jack,

Yes, we know how to do it and even have the implementation ready and being
reviewed by the Atlas community at the moment. :-)
Would you be interested in having a look?

On Thu, Mar 19, 2020 at 12:56 PM jackylau  wrote:

> Hi:
>   i think flink integrate atlas also need add catalog information such as
> spark atlas project
> .https://github.com/hortonworks-spark/spark-atlas-connector
>  when user use catalog such as JDBCCatalog/HiveCatalog, flink atlas project
> will sync this information to atlas.
>   But i don't find any Event Interface for flink to implement it as
> spark-atlas-connector does. Does anyone know how to do it
>
>
>
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>


Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-27 Thread jackylau
Hi Márton Balassi:
   I am very glad to look at it and where to find .
   And it is my issue , which you can see
https://issues.apache.org/jira/browse/FLINK-16774



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/


Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-27 Thread Gyula Fóra
Hi Jack!

You can find the document here:
https://docs.google.com/document/d/1wSgzPdhcwt-SlNBBqL-Zb7g8fY6bN8JwHEg7GCdsBG8/edit?usp=sharing

The document links to an already working Atlas hook prototype (and
accompanying flink fork). The links for that are also here:
Flink: https://github.com/gyfora/flink/tree/atlas-changes
Atlas: https://github.com/gyfora/atlas/tree/flink-bridge

We need to adapt this according to the above discussion this was an early
prototype.
Gyula

On Fri, Mar 27, 2020 at 11:07 AM jackylau  wrote:

> Hi Márton Balassi:
>I am very glad to look at it and where to find .
>And it is my issue , which you can see
> https://issues.apache.org/jira/browse/FLINK-16774
>
>
>
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>


Re: [DISCUSS] FLIP-119: Pipelined Region Scheduling

2020-03-27 Thread Zhu Zhu
To Yangze,

>> the blocking edge will not be consumable before the upstream is finished.
Yes. This is how we define a BLOCKING result partition, "Blocking
partitions represent blocking data exchanges, where the data stream is
first fully produced and then consumed".

>> I'm also wondering could we execute the upstream and downstream regions
at the same time if we have enough resources
It may lead to resource waste since the tasks in downstream regions cannot
read any data before the upstream region finishes. It saves a bit time on
schedule, but usually it does not make much difference for large jobs,
since data processing takes much more time. For small jobs, one can make
all edges PIPELINED so that all the tasks can be scheduled at the same time.

>> is it possible to change the data exchange mode of two regions
dynamically?
This is not in the scope of the FLIP. But we are moving forward to a more
extensible scheduler (FLINK-10429) and resource aware scheduling
(FLINK-10407).
So I think it's possible we can have a scheduler in the future which
dynamically changes the shuffle type wisely regarding available resources.

Thanks,
Zhu Zhu

Yangze Guo  于2020年3月27日周五 下午4:49写道:

> Thanks for updating!
>
> +1 for supporting the pipelined region scheduling. Although we could
> not prevent resource deadlock in all scenarios, it is really a big
> step.
>
> The design generally LGTM.
>
> One minor thing I want to make sure. If I understand correctly, the
> blocking edge will not be consumable before the upstream is finished.
> Without it, when the failure occurs in the upstream region, there is
> still possible to have a resource deadlock. I don't know whether it is
> an explicit protocol now. But after this FLIP, I think it should not
> be broken.
> I'm also wondering could we execute the upstream and downstream
> regions at the same time if we have enough resources. It can shorten
> the running time of large job. We should not break the protocol of
> blocking edge. But if it is possible to change the data exchange mode
> of two regions dynamically?
>
> Best,
> Yangze Guo
>
> On Fri, Mar 27, 2020 at 1:15 PM Zhu Zhu  wrote:
> >
> > Thanks for reporting this Yangze.
> > I have update the permission to those images. Everyone are able to view
> them now.
> >
> > Thanks,
> > Zhu Zhu
> >
> > Yangze Guo  于2020年3月27日周五 上午11:25写道:
> >>
> >> Thanks for driving this discussion, Zhu Zhu & Gary.
> >>
> >> I found that the image link in this FLIP is not working well. When I
> >> open that link, Google doc told me that I have no access privilege.
> >> Could you take a look at that issue?
> >>
> >> Best,
> >> Yangze Guo
> >>
> >> On Fri, Mar 27, 2020 at 1:38 AM Gary Yao  wrote:
> >> >
> >> > Hi community,
> >> >
> >> > In the past releases, we have been working on refactoring Flink's
> scheduler
> >> > with the goal of making the scheduler extensible [1]. We have rolled
> out
> >> > most of the intended refactoring in Flink 1.10, and we think it is
> now time
> >> > to leverage our newly introduced abstractions to implement a new
> resource
> >> > optimized scheduling strategy: Pipelined Region Scheduling.
> >> >
> >> > This scheduling strategy aims at:
> >> >
> >> > * avoidance of resource deadlocks when running batch jobs
> >> >
> >> > * tunable with respect to resource consumption and throughput
> >> >
> >> > More details can be found in the Wiki [2]. We are looking forward to
> your
> >> > feedback.
> >> >
> >> > Best,
> >> >
> >> > Zhu Zhu & Gary
> >> >
> >> > [1] https://issues.apache.org/jira/browse/FLINK-10429
> >> >
> >> > [2]
> >> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-119+Pipelined+Region+Scheduling
>


[jira] [Created] (FLINK-16825) PrometheusReporterEndToEndITCase should rely on path returned by DownloadCache

2020-03-27 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-16825:


 Summary: PrometheusReporterEndToEndITCase should rely on path 
returned by DownloadCache
 Key: FLINK-16825
 URL: https://issues.apache.org/jira/browse/FLINK-16825
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Metrics, Tests
Affects Versions: 1.10.0
Reporter: Chesnay Schepler
Assignee: Alexander Fedulov
 Fix For: 1.10.1, 1.11.0


The {{PrometheusReporterEndToEndITCas}} uses the 
{{DownloadCache#getOrDownload}} to download a file, but ignores the returned 
{{Path}} and simply assumes the file name.
This assumption can fail if there was an error during the download, where the 
cache appends an attempt index.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-119: Pipelined Region Scheduling

2020-03-27 Thread Xintong Song
Gary & Zhu Zhu,

Thanks for preparing this FLIP, and a BIG +1 from my side. The trade-off
between resource utilization and potential deadlock problems has always
been a pain. Despite not solving all the deadlock cases, this FLIP is
definitely a big improvement. IIUC, it has already covered all the existing
single job cases, and all the mentioned non-covered cases are either in
multi-job session clusters or with diverse slot resources in future.

I've read through the FLIP, and it looks really good to me. Good job! All
the concerns and limitations that I can think of have already been clearly
stated, with reasonable potential future solutions. From the perspective of
fine-grained resource management, I do not see any serious/irresolvable
conflict at this time.

nit: The in-page links are not working. I guess those are copied from
google docs directly?


Thank you~

Xintong Song



On Fri, Mar 27, 2020 at 6:26 PM Zhu Zhu  wrote:

> To Yangze,
>
> >> the blocking edge will not be consumable before the upstream is
> finished.
> Yes. This is how we define a BLOCKING result partition, "Blocking
> partitions represent blocking data exchanges, where the data stream is
> first fully produced and then consumed".
>
> >> I'm also wondering could we execute the upstream and downstream regions
> at the same time if we have enough resources
> It may lead to resource waste since the tasks in downstream regions cannot
> read any data before the upstream region finishes. It saves a bit time on
> schedule, but usually it does not make much difference for large jobs,
> since data processing takes much more time. For small jobs, one can make
> all edges PIPELINED so that all the tasks can be scheduled at the same
> time.
>
> >> is it possible to change the data exchange mode of two regions
> dynamically?
> This is not in the scope of the FLIP. But we are moving forward to a more
> extensible scheduler (FLINK-10429) and resource aware scheduling
> (FLINK-10407).
> So I think it's possible we can have a scheduler in the future which
> dynamically changes the shuffle type wisely regarding available resources.
>
> Thanks,
> Zhu Zhu
>
> Yangze Guo  于2020年3月27日周五 下午4:49写道:
>
> > Thanks for updating!
> >
> > +1 for supporting the pipelined region scheduling. Although we could
> > not prevent resource deadlock in all scenarios, it is really a big
> > step.
> >
> > The design generally LGTM.
> >
> > One minor thing I want to make sure. If I understand correctly, the
> > blocking edge will not be consumable before the upstream is finished.
> > Without it, when the failure occurs in the upstream region, there is
> > still possible to have a resource deadlock. I don't know whether it is
> > an explicit protocol now. But after this FLIP, I think it should not
> > be broken.
> > I'm also wondering could we execute the upstream and downstream
> > regions at the same time if we have enough resources. It can shorten
> > the running time of large job. We should not break the protocol of
> > blocking edge. But if it is possible to change the data exchange mode
> > of two regions dynamically?
> >
> > Best,
> > Yangze Guo
> >
> > On Fri, Mar 27, 2020 at 1:15 PM Zhu Zhu  wrote:
> > >
> > > Thanks for reporting this Yangze.
> > > I have update the permission to those images. Everyone are able to view
> > them now.
> > >
> > > Thanks,
> > > Zhu Zhu
> > >
> > > Yangze Guo  于2020年3月27日周五 上午11:25写道:
> > >>
> > >> Thanks for driving this discussion, Zhu Zhu & Gary.
> > >>
> > >> I found that the image link in this FLIP is not working well. When I
> > >> open that link, Google doc told me that I have no access privilege.
> > >> Could you take a look at that issue?
> > >>
> > >> Best,
> > >> Yangze Guo
> > >>
> > >> On Fri, Mar 27, 2020 at 1:38 AM Gary Yao  wrote:
> > >> >
> > >> > Hi community,
> > >> >
> > >> > In the past releases, we have been working on refactoring Flink's
> > scheduler
> > >> > with the goal of making the scheduler extensible [1]. We have rolled
> > out
> > >> > most of the intended refactoring in Flink 1.10, and we think it is
> > now time
> > >> > to leverage our newly introduced abstractions to implement a new
> > resource
> > >> > optimized scheduling strategy: Pipelined Region Scheduling.
> > >> >
> > >> > This scheduling strategy aims at:
> > >> >
> > >> > * avoidance of resource deadlocks when running batch jobs
> > >> >
> > >> > * tunable with respect to resource consumption and throughput
> > >> >
> > >> > More details can be found in the Wiki [2]. We are looking forward to
> > your
> > >> > feedback.
> > >> >
> > >> > Best,
> > >> >
> > >> > Zhu Zhu & Gary
> > >> >
> > >> > [1] https://issues.apache.org/jira/browse/FLINK-10429
> > >> >
> > >> > [2]
> > >> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-119+Pipelined+Region+Scheduling
> >
>


Re: [DISCUSS] FLIP-108: Add GPU support in Flink

2020-03-27 Thread Stephan Ewen
Maybe one final comment: It is probably not an issue, but let's try and
keep user code (via user code classloader) out of the ResourceManager, if
possible.

As background:

There were thoughts in the past to support setups where the RM must run
with "superuser" credentials, but we cannot run JM/TM with these
credentials, as the user code might access them otherwise.
This is actually possible today, you can run the RM in a different JVM or
in a different container, and give it more credentials than JMs / TMs. But
for this to be feasible, we cannot allow any user-defined code to be in the
JVM, because that instantaneously breaks the isolation of credentials.



On Fri, Mar 27, 2020 at 4:01 AM Yangze Guo  wrote:

> Thanks for the feedback, @Till and @Xintong.
>
> Regarding separating the interface, I'm also +1 with it.
>
> Regarding the resource allocation interface, true, it's dangerous to
> give much access to user codes. Changing the return type to Map key, String/Long value> makes sense to me. AFAIK, it is compatible
> with all the first-party supported resources for Yarn/Kubernetes. It
> could also free us from the potential dependency issue as well.
>
> Best,
> Yangze Guo
>
> On Fri, Mar 27, 2020 at 10:42 AM Xintong Song 
> wrote:
> >
> > Thanks for updating the FLIP, Yangze.
> >
> > I agree with Till that we probably want to separate the K8s/Yarn
> decorator
> > calls. Users can still configure one driver class, and we can use
> > `instanceof` to check whether the driver implemented K8s/Yarn specific
> > interfaces.
> >
> > Moreover, I'm not sure about exposing entire `ContainerRequest` / `Pod`
> > (`AbstractKubernetesStepDecorator` directly manipulates on `Pod`) to user
> > codes. It gives more access to user codes than needed for defining
> external
> > resource, which might cause problems. Instead, I would suggest to have
> > interface like `Map
> > getYarn/KubernetesExternalResource()` and assemble them into
> > `ContainerRequest` / `Pod` in Yarn/KubernetesResourceManager.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Fri, Mar 27, 2020 at 1:10 AM Till Rohrmann 
> wrote:
> >
> > > Hi everyone,
> > >
> > > I'm a bit late to the party. I think the current proposal looks good.
> > >
> > > Concerning the ExternalResourceDriver interface defined in the FLIP
> [1], I
> > > would suggest to not include the decorator calls for Kubernetes and
> Yarn in
> > > the base interface. Instead I would suggest to segregate the deployment
> > > specific decorator calls into separate interfaces. That way an
> > > ExternalResourceDriver does not have to support all deployments from
> the
> > > very beginning. Moreover, some resources might not be supported by a
> > > specific deployment target and the natural way to express this would
> be to
> > > not implement the respective deployment specific interface.
> > >
> > > Moreover, having void
> > > addExternalResourceToRequest(AMRMClient.ContainerRequest
> containerRequest)
> > > in the ExternalResourceDriver interface would require Hadoop on Flink's
> > > classpath whenever the external resource driver is being used.
> > >
> > > [1]
> > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > >
> > > Cheers,
> > > Till
> > >
> > > On Thu, Mar 26, 2020 at 12:45 PM Stephan Ewen 
> wrote:
> > >
> > > > Nice, thanks a lot!
> > > >
> > > > On Thu, Mar 26, 2020 at 10:21 AM Yangze Guo 
> wrote:
> > > >
> > > > > Thanks for the suggestion, @Stephan, @Becket and @Xintong.
> > > > >
> > > > > I've updated the FLIP accordingly. I do not add a
> > > > > ResourceInfoProvider. Instead, I introduce the
> ExternalResourceDriver,
> > > > > which takes the responsibility of all relevant operations on both
> RM
> > > > > and TM sides.
> > > > > After a rethink about decoupling the management of external
> resources
> > > > > from TaskExecutor, I think we could do the same thing on the
> > > > > ResourceManager side. We do not need to add a specific allocation
> > > > > logic to the ResourceManager each time we add a specific external
> > > > > resource.
> > > > > - For Yarn, we need the ExternalResourceDriver to edit the
> > > > > containerRequest.
> > > > > - For Kubenetes, ExternalResourceDriver could provide a decorator
> for
> > > > > the TM pod.
> > > > >
> > > > > In this way, just like MetricReporter, we allow users to define
> their
> > > > > custom ExternalResourceDriver. It is more extensible and fits the
> > > > > separation of concerns. For more details, please take a look at
> [1].
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Wed, Mar 25, 2020 at 7:32 PM Stephan Ewen 
> wrote:
> > > > > >
> > > > > > This sounds good to go ahead from my side.
> > > > > >
> > > > > > I like the approach that Becket suggested - in that case the core
> > > > > > abstraction that ev

[jira] [Created] (FLINK-16826) Support copying of jars

2020-03-27 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-16826:


 Summary: Support copying of jars
 Key: FLINK-16826
 URL: https://issues.apache.org/jira/browse/FLINK-16826
 Project: Flink
  Issue Type: Improvement
  Components: Test Infrastructure
Reporter: Chesnay Schepler
Assignee: Alexander Fedulov
 Fix For: 1.11.0


The {{FlinkResourceSetup}} currently allows moving jars around, e.g., from opt 
to lib.
For some tests this isn't sufficient as they may want a jar to be in lib and 
the plugins directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Releasing Flink 1.10.1

2020-03-27 Thread Yu Li
Here comes the latest status of issues in 1.10.1 watch list:

* Blockers (4 left)
  - [Under Discussion] FLINK-16018 Improve error reporting when submitting
batch job (instead of AskTimeoutException)
  - [Closed] FLINK-16142 Memory Leak causes Metaspace OOM error on repeated
job submission
  - [Closed] FLINK-16170 SearchTemplateRequest ClassNotFoundException when
use flink-sql-connector-elasticsearch7
  - [PR Approved] FLINK-16262 Class loader problem with
FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib directory
  - [Closed] FLINK-16406 Increase default value for JVM Metaspace to
minimise its OutOfMemoryError
  - [Closed] FLINK-16454 Update the copyright year in NOTICE files
  - [In Progress] FLINK-16576 State inconsistency on restore with memory
state backends
  - [New] [PR Submitted] FLINK-16705 LocalExecutor tears down MiniCluster
before client can retrieve JobResult

* Critical (2 left)
  - [Closed] FLINK-16047 Blink planner produces wrong aggregate results
with state clean up
  - [PR Submitted] FLINK-16070 Blink planner can not extract correct unique
key for UpsertStreamTableSink
  - [Fix for 1.10.1 Done] FLINK-16225 Metaspace Out Of Memory should be
handled as Fatal Error in TaskManager
  - [Open] FLINK-16408 Bind user code class loader to lifetime of a slot

There's a new blocker added. Please let me know if you find any other
issues should be put into the watch list. Thanks.

Best Regards,
Yu


On Sat, 21 Mar 2020 at 14:37, Yu Li  wrote:

> Hi All,
>
> Here is the status update of issues in 1.10.1 watch list:
>
> * Blockers (7)
>
>   - [Under Discussion] FLINK-16018 Improve error reporting when submitting
> batch job (instead of AskTimeoutException)
>
>   - [Under Discussion] FLINK-16142 Memory Leak causes Metaspace OOM error
> on repeated job submission
>
>   - [PR Submitted] FLINK-16170 SearchTemplateRequest
> ClassNotFoundException when use flink-sql-connector-elasticsearch7
>
>   - [PR Submitted] FLINK-16262 Class loader problem with
> FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib directory
>
>   - [Closed] FLINK-16406 Increase default value for JVM Metaspace to
> minimise its OutOfMemoryError
>
>   - [Closed] FLINK-16454 Update the copyright year in NOTICE files
>
>   - [Open] FLINK-16576 State inconsistency on restore with memory state
> backends
>
>
> * Critical (4)
>
>   - [Closed] FLINK-16047 Blink planner produces wrong aggregate results
> with state clean up
>
>   - [PR Submitted] FLINK-16070 Blink planner can not extract correct
> unique key for UpsertStreamTableSink
>
>   - [Under Discussion] FLINK-16225 Metaspace Out Of Memory should be
> handled as Fatal Error in TaskManager
>
>   - [Open] FLINK-16408 Bind user code class loader to lifetime of a slot
>
> Please let me know if you see any new blockers to add to the list. Thanks.
>
> Best Regards,
> Yu
>
>
> On Wed, 18 Mar 2020 at 00:11, Yu Li  wrote:
>
>> Thanks for the updates Till!
>>
>> For FLINK-16018, maybe we could create two sub-tasks for easy and
>> complete fix separately, and only include the easy one in 1.10.1? Or please
>> just feel free to postpone the whole task to 1.10.2 if "all or nothing"
>> policy is preferred (smile). Thanks.
>>
>> Best Regards,
>> Yu
>>
>>
>> On Tue, 17 Mar 2020 at 23:33, Till Rohrmann  wrote:
>>
>>> +1 for a soonish bug fix release. Thanks for volunteering as our release
>>> manager Yu.
>>>
>>> I think we can soon merge the increase of metaspace size and improving
>>> the
>>> error message. The assumption is that we currently don't have too many
>>> small Flink 1.10 deployments with a process size <= 1GB. Of course, the
>>> sooner we release the bug fix release, the fewer deployments will be
>>> affected by this change.
>>>
>>> For FLINK-16018, I think there would be an easy solution which we could
>>> include in the bug fix release. The proper fix will most likely take a
>>> bit
>>> longer, though.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Fri, Mar 13, 2020 at 8:08 PM Andrey Zagrebin 
>>> wrote:
>>>
>>> > > @Andrey and @Xintong - could we have a quick poll on the user mailing
>>> > list
>>> > > about increasing the metaspace size in Flink 1.10.1? Specifically
>>> asking
>>> > > for who has very small TM setups?
>>> >
>>> > There has been a survey about this topic since 10 days:
>>> >
>>> > `[Survey] Default size for the new JVM Metaspace limit in 1.10`
>>> > I can ask there about the small TM setups.
>>> >
>>> > On Fri, Mar 13, 2020 at 5:19 PM Yu Li  wrote:
>>> >
>>> > > Another blocker for 1.10.1: FLINK-16576 State inconsistency on
>>> restore
>>> > with
>>> > > memory state backends
>>> > >
>>> > > Let me recompile the watching list with recent feedbacks. There're
>>> > totally
>>> > > 45 issues with Blocker/Critical priority for 1.10.1, out of which 14
>>> > > already resolved and 31 left, and the below ones are watched (meaning
>>> > that
>>> > > once the below ones got fixed, we will kick of the releasing with
>>> left
>>> > ones
>>> > > postponed to 1.10.2 unless objections):
>>> > >
>

[jira] [Created] (FLINK-16827) FLINK 1.9.1 KAFKA流数据按时间字段排序在多并发情况下抛出空指针异常

2020-03-27 Thread wuchangjun (Jira)
wuchangjun created FLINK-16827:
--

 Summary: FLINK 1.9.1  KAFKA流数据按时间字段排序在多并发情况下抛出空指针异常
 Key: FLINK-16827
 URL: https://issues.apache.org/jira/browse/FLINK-16827
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.9.1
 Environment: flink on yarn

!image-2020-03-27-21-23-13-648.png!
Reporter: wuchangjun
 Attachments: image-2020-03-27-21-22-21-122.png, 
image-2020-03-27-21-22-44-191.png, image-2020-03-27-21-23-13-648.png

flink reads kafka data and sorts by time field. In the case of multiple 
concurrency, it throws the following null pointer exception. One concurrent 
processing is normal.

PS:flink读取kafka数据并按照时间字段排序在多并发情况下抛出如下空指针异常。一个并发处理正常。

!image-2020-03-27-21-22-21-122.png!

 

!image-2020-03-27-21-22-44-191.png!

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16828) PrometheusReporters support factories

2020-03-27 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-16828:


 Summary: PrometheusReporters support factories
 Key: FLINK-16828
 URL: https://issues.apache.org/jira/browse/FLINK-16828
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Metrics
Reporter: Chesnay Schepler
Assignee: Alexander Fedulov
 Fix For: 1.11.0


Refactor the PrometheusReporter instantiation to go through a 
MetricReporterFactory instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16829) Pass PrometheusReporter arguments via constructor

2020-03-27 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-16829:


 Summary: Pass PrometheusReporter arguments via constructor
 Key: FLINK-16829
 URL: https://issues.apache.org/jira/browse/FLINK-16829
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Metrics
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.11.0


With FLINK-16828 Prometheus reporters are now instantiated via factories. We 
can now refactor the initialization to pass all parameter through the 
constructor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16830) Let users use Row/List/Map/Seq directly in Expression DSL

2020-03-27 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-16830:


 Summary: Let users use Row/List/Map/Seq directly in Expression DSL
 Key: FLINK-16830
 URL: https://issues.apache.org/jira/browse/FLINK-16830
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.11.0


To improve the usability of the Expression DSL we should provide conversions 
from Row/List/Map/Seq classes to corresponding expressions.

This include updating the {{ApiExpressionUtils#objectToExpression}} as well as 
adding Scala's implicit conversions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16831) Support plugin directory as JarLocation

2020-03-27 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-16831:


 Summary: Support plugin directory as JarLocation
 Key: FLINK-16831
 URL: https://issues.apache.org/jira/browse/FLINK-16831
 Project: Flink
  Issue Type: Improvement
  Components: Test Infrastructure
Reporter: Chesnay Schepler
Assignee: Alexander Fedulov
 Fix For: 1.11.0


To test the plugin mechanism we need to be able to move/copy jars into the 
plugins directory, including a parent directory as expected by the plugin 
system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #2

2020-03-27 Thread Hequn Cheng
Thanks Gordon for the release and the nice release checking guide!

It seems the NOTICE file is missing in the
`statefun-ridesharing-example-simulator` module while it bundles
dependencies like
`org.springframework.boot:spring-boot-loader:2.1.6.RELEASE`.

Best,
Hequn

On Fri, Mar 27, 2020 at 3:35 PM Tzu-Li (Gordon) Tai 
wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #2 for the version 2.0.0 of
> Apache Flink Stateful Functions,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> **Testing Guideline**
>
> You can find here [1] a doc that we can use for collaborating testing
> efforts.
> The listed testing tasks in the doc also serve as a guideline in what to
> test for this release.
> If you wish to take ownership of a testing task, simply put your name down
> in the "Checked by" field of the task.
>
> **Release Overview**
>
> As an overview, the release consists of the following:
> a) Stateful Functions canonical source distribution, to be deployed to the
> release repository at dist.apache.org
> b) Stateful Functions Python SDK distributions to be deployed to PyPI
> c) Maven artifacts to be deployed to the Maven Central Repository
>
> **Staging Areas to Review**
>
> The staging areas containing the above mentioned artifacts are as follows,
> for your review:
> * All artifacts for a) and b) can be found in the corresponding dev
> repository at dist.apache.org [2]
> * All artifacts for c) can be found at the Apache Nexus Repository [3]
>
> All artifacts are singed with the
> key 1C1E2394D3194E1944613488F320986D35C33D6A [4]
>
> Other links for your review:
> * JIRA release notes [5]
> * source code tag "release-2.0.0-rc2" [6] [7]
>
> **Extra Remarks**
>
> * Part of the release is also official Docker images for Stateful
> Functions. This can be a separate process, since the creation of those
> relies on the fact that we have distribution jars already deployed to
> Maven. I will follow-up with this after these artifacts are officially
> released.
> In the meantime, there is this discussion [8] ongoing about where to host
> the StateFun Dockerfiles.
> * The Flink Website and blog post is also being worked on (by Marta) as
> part of the release, to incorporate the new Stateful Functions project. We
> can follow up with a link to those changes afterwards in this vote thread,
> but that would not block you to test and cast your votes already.
> * Since the Flink website changes are still being worked on, you will not
> yet be able to find the Stateful Functions docs from there. Here are the
> links [9] [10].
>
> **Vote Duration**
>
> The vote will be open for at least 72 hours *(target end date is next
> Tuesday, April 31).*
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Gordon
>
> [1]
>
> https://docs.google.com/document/d/1P9yjwSbPQtul0z2AXMnVolWQbzhxs68suJvzR6xMjcs/edit?usp=sharing
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-statefun-2.0.0-rc2/
> [3]
> https://repository.apache.org/content/repositories/orgapacheflink-1340/
> [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> [5]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346878
> [6]
>
> https://gitbox.apache.org/repos/asf?p=flink-statefun.git;a=commit;h=14ce58048a3dda792f2329cf14d30aa952f6cb24
> [7] https://github.com/apache/flink-statefun/tree/release-2.0.0-rc2
> [8]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Creating-a-new-repo-to-host-Stateful-Functions-Dockerfiles-td39342.html
> [9] https://ci.apache.org/projects/flink/flink-statefun-docs-master/
> [10] https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/
>
> TIP: You can create a `settings.xml` file with these contents:
>
> """
> 
>   
> flink-statefun-2.0.0
>   
>   
> 
>   flink-statefun-2.0.0
>   
> 
>   flink-statefun-2.0.0
>   
> https://repository.apache.org/content/repositories/orgapacheflink-1340/
> 
> 
> 
>   archetype
>   
> https://repository.apache.org/content/repositories/orgapacheflink-1340/
> 
> 
>   
> 
>   
> 
> """
>
> And reference that in you maven commands via `--settings
> path/to/settings.xml`.
> This is useful for creating a quickstart based on the staged release and
> for building against the staged jars.
>


Re: [DISCUSS] FLIP-119: Pipelined Region Scheduling

2020-03-27 Thread Till Rohrmann
Thanks for creating this FLIP Zhu Zhu and Gary!

+1 for adding pipelined region scheduling.

Concerning the extended SlotProvider interface I have an idea how we could
further improve it. If I am not mistaken, then you have proposed to
introduce the two timeouts in order to distinguish between batch and
streaming jobs and to encode that batch job requests can wait if there are
enough resources in the SlotPool (not necessarily being available right
now). I think what we actually need to tell the SlotProvider is whether a
request will use the slot only for a limited time or not. This is exactly
the difference between processing bounded and unbounded streams. If the
SlotProvider knows this difference, then it can tell which slots will
eventually be reusable and which not. Based on this it can tell whether a
slot request can be fulfilled eventually or whether we fail after the
specified timeout. Another benefit of this approach would be that we can
easily support mixed bounded/unbounded workloads. What we would need to
know for this approach is whether a pipelined region is processing a
bounded or unbounded stream.

To give an example let's assume we request the following sets of slots
where each pipelined region requires the same slots:

slotProvider.allocateSlots(pr1_bounded, timeout);
slotProvider.allocateSlots(pr2_unbounded, timeout);
slotProvider.allocateSlots(pr3_bounded, timeout);

Let's assume we receive slots for pr1_bounded in < timeout and can then
fulfill the request. Then we request pr2_unbounded. Since we know that
pr1_bounded will complete eventually, we don't fail this request after
timeout. Next we request pr3_bounded after pr2_unbounded has been
completed. In this case, we see that we need to request new resources
because pr2_unbounded won't release its slots. Hence, if we cannot allocate
new resources within timeout, we fail this request.

A small comment concerning "Resource deadlocks when slot allocation
competition happens between multiple jobs in a session cluster": Another
idea to solve this situation would be to give the ResourceManager the right
to revoke slot assignments in order to change the mapping between requests
and available slots.

Cheers,
Till

On Fri, Mar 27, 2020 at 12:44 PM Xintong Song  wrote:

> Gary & Zhu Zhu,
>
> Thanks for preparing this FLIP, and a BIG +1 from my side. The trade-off
> between resource utilization and potential deadlock problems has always
> been a pain. Despite not solving all the deadlock cases, this FLIP is
> definitely a big improvement. IIUC, it has already covered all the existing
> single job cases, and all the mentioned non-covered cases are either in
> multi-job session clusters or with diverse slot resources in future.
>
> I've read through the FLIP, and it looks really good to me. Good job! All
> the concerns and limitations that I can think of have already been clearly
> stated, with reasonable potential future solutions. From the perspective of
> fine-grained resource management, I do not see any serious/irresolvable
> conflict at this time.
>
> nit: The in-page links are not working. I guess those are copied from
> google docs directly?
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Fri, Mar 27, 2020 at 6:26 PM Zhu Zhu  wrote:
>
> > To Yangze,
> >
> > >> the blocking edge will not be consumable before the upstream is
> > finished.
> > Yes. This is how we define a BLOCKING result partition, "Blocking
> > partitions represent blocking data exchanges, where the data stream is
> > first fully produced and then consumed".
> >
> > >> I'm also wondering could we execute the upstream and downstream
> regions
> > at the same time if we have enough resources
> > It may lead to resource waste since the tasks in downstream regions
> cannot
> > read any data before the upstream region finishes. It saves a bit time on
> > schedule, but usually it does not make much difference for large jobs,
> > since data processing takes much more time. For small jobs, one can make
> > all edges PIPELINED so that all the tasks can be scheduled at the same
> > time.
> >
> > >> is it possible to change the data exchange mode of two regions
> > dynamically?
> > This is not in the scope of the FLIP. But we are moving forward to a more
> > extensible scheduler (FLINK-10429) and resource aware scheduling
> > (FLINK-10407).
> > So I think it's possible we can have a scheduler in the future which
> > dynamically changes the shuffle type wisely regarding available
> resources.
> >
> > Thanks,
> > Zhu Zhu
> >
> > Yangze Guo  于2020年3月27日周五 下午4:49写道:
> >
> > > Thanks for updating!
> > >
> > > +1 for supporting the pipelined region scheduling. Although we could
> > > not prevent resource deadlock in all scenarios, it is really a big
> > > step.
> > >
> > > The design generally LGTM.
> > >
> > > One minor thing I want to make sure. If I understand correctly, the
> > > blocking edge will not be consumable before the upstream is finished.
> > > Without it, when the failure occurs in

[jira] [Created] (FLINK-16832) Refactor ReporSetup

2020-03-27 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-16832:


 Summary: Refactor ReporSetup
 Key: FLINK-16832
 URL: https://issues.apache.org/jira/browse/FLINK-16832
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Metrics
Reporter: Chesnay Schepler
Assignee: Alexander Fedulov
 Fix For: 1.11.0


To improve readability we can split up {{ReporterSetup#fromConfiguration}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16833) Make JDBCDialect pluggable for JDBC SQL connector

2020-03-27 Thread Jark Wu (Jira)
Jark Wu created FLINK-16833:
---

 Summary: Make JDBCDialect pluggable for JDBC SQL connector
 Key: FLINK-16833
 URL: https://issues.apache.org/jira/browse/FLINK-16833
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / JDBC, Table SQL / Ecosystem
Reporter: Jark Wu
 Fix For: 1.11.0


Currently, we only natively support very limited JDBC dialects in flink-jdbc. 
However, there are a lot of jdbc drivers in the world . We should expose the 
ability to users to make it pluggable. 

Some initial ideas:
 - expose a connector configuration to accept a JDBCDialect class name.
 - find supported JDBCDialect via SPI.

A more detaited design should be proposed for disucssion. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16834) Examples cannot be run from IDE

2020-03-27 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-16834:
-

 Summary: Examples cannot be run from IDE
 Key: FLINK-16834
 URL: https://issues.apache.org/jira/browse/FLINK-16834
 Project: Flink
  Issue Type: Bug
  Components: Client / Job Submission, Examples
Affects Versions: 1.11.0
Reporter: Till Rohrmann
 Fix For: 1.11.0


Due to removing the dependency {{flink-clients}} from {{flink-streaming-java}}, 
the examples can no longer be executed from the IDE. The problem is that the 
{{flink-clients}} dependency is missing.

In order to solve this problem, we need to add the {{flink-clients}} dependency 
to all modules which need it and previously obtained it transitively from 
{{flink-streaming-java}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16835) Replace TableConfig with Configuration

2020-03-27 Thread Timo Walther (Jira)
Timo Walther created FLINK-16835:


 Summary: Replace TableConfig with Configuration
 Key: FLINK-16835
 URL: https://issues.apache.org/jira/browse/FLINK-16835
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: Timo Walther


In order to allow reading and writing of configuration from a file or 
string-based properties. We should consider removing {{TableConfig}} and fully 
rely on a Configuration-based object with {{ConfigOption}}s.

This effort was partially already started which is why 
{{TableConfig.getConfiguration}} exists.

However, we should clarify if we would like to have control and traceability 
over layered configurations such as {{flink-conf,yaml < 
StreamExecutionEnvironment < TableEnvironment}}. Maybe the {{Configuration}} 
class is not the right abstraction for this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16836) Losing leadership does not clear rpc connection in JobManagerLeaderListener

2020-03-27 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-16836:
-

 Summary: Losing leadership does not clear rpc connection in 
JobManagerLeaderListener
 Key: FLINK-16836
 URL: https://issues.apache.org/jira/browse/FLINK-16836
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.11.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 1.11.0


When losing the leadership the {{JobManagerLeaderListener}} closes the current 
{{rpcConnection}} but does not clear the field. This can lead to a failure of 
{{JobManagerLeaderListener#reconnect}} if this method is called after the 
{{JobMaster}} has lost its leadership.

I propose to clear the field so that {{RegisteredRpcConnection#tryReconnect}} 
won't be called on a closed rpc connection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #2

2020-03-27 Thread Tzu-Li (Gordon) Tai
Hi Hequn,

That's a good catch.

Unfortunately, the spring boot dependency there, while itself being ASLv2
licensed, pulls in other dependencies that are not ASLv2.
That would indeed make this problem a blocker.

I'll do a thorough check again on the Maven artifacts that do bundle
dependencies, before creating a new RC. AFAIK, there should be no more
other than:
- statefun-flink-distribution
- statefun-ridesharing-example-simulator

BR,
Gordon

On Fri, Mar 27, 2020 at 10:41 PM Hequn Cheng  wrote:

> Thanks Gordon for the release and the nice release checking guide!
>
> It seems the NOTICE file is missing in the
> `statefun-ridesharing-example-simulator` module while it bundles
> dependencies like
> `org.springframework.boot:spring-boot-loader:2.1.6.RELEASE`.
>
> Best,
> Hequn
>
> On Fri, Mar 27, 2020 at 3:35 PM Tzu-Li (Gordon) Tai 
> wrote:
>
> > Hi everyone,
> >
> > Please review and vote on the release candidate #2 for the version 2.0.0
> of
> > Apache Flink Stateful Functions,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > **Testing Guideline**
> >
> > You can find here [1] a doc that we can use for collaborating testing
> > efforts.
> > The listed testing tasks in the doc also serve as a guideline in what to
> > test for this release.
> > If you wish to take ownership of a testing task, simply put your name
> down
> > in the "Checked by" field of the task.
> >
> > **Release Overview**
> >
> > As an overview, the release consists of the following:
> > a) Stateful Functions canonical source distribution, to be deployed to
> the
> > release repository at dist.apache.org
> > b) Stateful Functions Python SDK distributions to be deployed to PyPI
> > c) Maven artifacts to be deployed to the Maven Central Repository
> >
> > **Staging Areas to Review**
> >
> > The staging areas containing the above mentioned artifacts are as
> follows,
> > for your review:
> > * All artifacts for a) and b) can be found in the corresponding dev
> > repository at dist.apache.org [2]
> > * All artifacts for c) can be found at the Apache Nexus Repository [3]
> >
> > All artifacts are singed with the
> > key 1C1E2394D3194E1944613488F320986D35C33D6A [4]
> >
> > Other links for your review:
> > * JIRA release notes [5]
> > * source code tag "release-2.0.0-rc2" [6] [7]
> >
> > **Extra Remarks**
> >
> > * Part of the release is also official Docker images for Stateful
> > Functions. This can be a separate process, since the creation of those
> > relies on the fact that we have distribution jars already deployed to
> > Maven. I will follow-up with this after these artifacts are officially
> > released.
> > In the meantime, there is this discussion [8] ongoing about where to host
> > the StateFun Dockerfiles.
> > * The Flink Website and blog post is also being worked on (by Marta) as
> > part of the release, to incorporate the new Stateful Functions project.
> We
> > can follow up with a link to those changes afterwards in this vote
> thread,
> > but that would not block you to test and cast your votes already.
> > * Since the Flink website changes are still being worked on, you will not
> > yet be able to find the Stateful Functions docs from there. Here are the
> > links [9] [10].
> >
> > **Vote Duration**
> >
> > The vote will be open for at least 72 hours *(target end date is next
> > Tuesday, April 31).*
> > It is adopted by majority approval, with at least 3 PMC affirmative
> votes.
> >
> > Thanks,
> > Gordon
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/1P9yjwSbPQtul0z2AXMnVolWQbzhxs68suJvzR6xMjcs/edit?usp=sharing
> > [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-statefun-2.0.0-rc2/
> > [3]
> > https://repository.apache.org/content/repositories/orgapacheflink-1340/
> > [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [5]
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346878
> > [6]
> >
> >
> https://gitbox.apache.org/repos/asf?p=flink-statefun.git;a=commit;h=14ce58048a3dda792f2329cf14d30aa952f6cb24
> > [7] https://github.com/apache/flink-statefun/tree/release-2.0.0-rc2
> > [8]
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Creating-a-new-repo-to-host-Stateful-Functions-Dockerfiles-td39342.html
> > [9] https://ci.apache.org/projects/flink/flink-statefun-docs-master/
> > [10]
> https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/
> >
> > TIP: You can create a `settings.xml` file with these contents:
> >
> > """
> > 
> >   
> > flink-statefun-2.0.0
> >   
> >   
> > 
> >   flink-statefun-2.0.0
> >   
> > 
> >   flink-statefun-2.0.0
> >   
> > https://repository.apache.org/content/repositories/orgapacheflink-1340/
> > 
> > 
> > 
> >   archetype
> >   
> > https://repository.apache.org/content/repositories/orgapacheflink-1340/
> > 
> > 
> >  

[jira] [Created] (FLINK-16837) Disable trimStackTrace in surefire plugin

2020-03-27 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-16837:


 Summary: Disable trimStackTrace in surefire plugin
 Key: FLINK-16837
 URL: https://issues.apache.org/jira/browse/FLINK-16837
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.11.0


Surefire has a {{trimStackTrace}} option (enabled by default) which is supposed 
to cut off the junit parts of stacktraces.

However this has various unfortunate side-effects, such as being overzealous 
and hiding important bits of the stacktrace and hiding suppressed exceptions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


contribute to Apache Flink

2020-03-27 Thread Leping Huang
Hi Guys,

I want to contribute to Apache Flink.
Would you please give me the permission as a contributor?
My JIRA ID is soundhearer.


Re: contribute to Apache Flink

2020-03-27 Thread Hequn Cheng
Hi Leping,

Welcome to the community!

You no longer need contributor permissions. You can simply create a JIRA
ticket and ask to be assigned to work on it. For some reasons[1], only
committers can assign a
Jira ticket now. You can also take a look at the Flink's contribution
guidelines [2] for more
information.

Best,
Hequn

[1]
https://flink.apache.org/contributing/contribute-code.html#create-jira-ticket-and-reach-consensus
[2] https://flink.apache.org/contributing/how-to-contribute.html

On Sat, Mar 28, 2020 at 7:09 AM Leping Huang 
wrote:

> Hi Guys,
>
> I want to contribute to Apache Flink.
> Would you please give me the permission as a contributor?
> My JIRA ID is soundhearer.
>


[jira] [Created] (FLINK-16838) Stateful Functions Quickstart archetype Dockerfile should reference a specific version tag

2020-03-27 Thread Tzu-Li (Gordon) Tai (Jira)
Tzu-Li (Gordon) Tai created FLINK-16838:
---

 Summary: Stateful Functions Quickstart archetype Dockerfile should 
reference a specific version tag
 Key: FLINK-16838
 URL: https://issues.apache.org/jira/browse/FLINK-16838
 Project: Flink
  Issue Type: Bug
  Components: Stateful Functions
Reporter: Tzu-Li (Gordon) Tai
Assignee: Tzu-Li (Gordon) Tai


Currently, the quickstart archetype provides a skeleton Dockerfile that always 
builds on top of the latest image:
{code}
FROM statefun
{code}

While it happens to work for the first release since the `latest` tag will 
(coincidentally) point to the correct version,
once we have multiple releases this will no longer be correct.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16839) Connectors Kinesis metrics can be disabled

2020-03-27 Thread Muchl (Jira)
Muchl created FLINK-16839:
-

 Summary: Connectors Kinesis metrics can be disabled
 Key: FLINK-16839
 URL: https://issues.apache.org/jira/browse/FLINK-16839
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kinesis
Affects Versions: 1.10.0
Reporter: Muchl


Currently, there are 9 metrics in the kinesis connector, each of which is 
recorded according to the kinesis shard dimension. If there are enough shards, 
taskmanager mtrics will be unavailable.

In our production environment, a case of a job that reads dynamodb stream 
kinesis adapter, this dynamodb has more than 10,000 shards, multiplied by 9 
metrics, there are more than 100,000 metrics for kinesis, and the entire 
metrics output reaches tens of MB , Cause prometheus cannot collect metrics.

We should have a configuration item that can disable kinesis metrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)