[VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #2
Hi everyone, Please review and vote on the release candidate #2 for the version 2.0.0 of Apache Flink Stateful Functions, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) **Testing Guideline** You can find here [1] a doc that we can use for collaborating testing efforts. The listed testing tasks in the doc also serve as a guideline in what to test for this release. If you wish to take ownership of a testing task, simply put your name down in the "Checked by" field of the task. **Release Overview** As an overview, the release consists of the following: a) Stateful Functions canonical source distribution, to be deployed to the release repository at dist.apache.org b) Stateful Functions Python SDK distributions to be deployed to PyPI c) Maven artifacts to be deployed to the Maven Central Repository **Staging Areas to Review** The staging areas containing the above mentioned artifacts are as follows, for your review: * All artifacts for a) and b) can be found in the corresponding dev repository at dist.apache.org [2] * All artifacts for c) can be found at the Apache Nexus Repository [3] All artifacts are singed with the key 1C1E2394D3194E1944613488F320986D35C33D6A [4] Other links for your review: * JIRA release notes [5] * source code tag "release-2.0.0-rc2" [6] [7] **Extra Remarks** * Part of the release is also official Docker images for Stateful Functions. This can be a separate process, since the creation of those relies on the fact that we have distribution jars already deployed to Maven. I will follow-up with this after these artifacts are officially released. In the meantime, there is this discussion [8] ongoing about where to host the StateFun Dockerfiles. * The Flink Website and blog post is also being worked on (by Marta) as part of the release, to incorporate the new Stateful Functions project. We can follow up with a link to those changes afterwards in this vote thread, but that would not block you to test and cast your votes already. * Since the Flink website changes are still being worked on, you will not yet be able to find the Stateful Functions docs from there. Here are the links [9] [10]. **Vote Duration** The vote will be open for at least 72 hours *(target end date is next Tuesday, April 31).* It is adopted by majority approval, with at least 3 PMC affirmative votes. Thanks, Gordon [1] https://docs.google.com/document/d/1P9yjwSbPQtul0z2AXMnVolWQbzhxs68suJvzR6xMjcs/edit?usp=sharing [2] https://dist.apache.org/repos/dist/dev/flink/flink-statefun-2.0.0-rc2/ [3] https://repository.apache.org/content/repositories/orgapacheflink-1340/ [4] https://dist.apache.org/repos/dist/release/flink/KEYS [5] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346878 [6] https://gitbox.apache.org/repos/asf?p=flink-statefun.git;a=commit;h=14ce58048a3dda792f2329cf14d30aa952f6cb24 [7] https://github.com/apache/flink-statefun/tree/release-2.0.0-rc2 [8] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Creating-a-new-repo-to-host-Stateful-Functions-Dockerfiles-td39342.html [9] https://ci.apache.org/projects/flink/flink-statefun-docs-master/ [10] https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/ TIP: You can create a `settings.xml` file with these contents: """ flink-statefun-2.0.0 flink-statefun-2.0.0 flink-statefun-2.0.0 https://repository.apache.org/content/repositories/orgapacheflink-1340/ archetype https://repository.apache.org/content/repositories/orgapacheflink-1340/ """ And reference that in you maven commands via `--settings path/to/settings.xml`. This is useful for creating a quickstart based on the staged release and for building against the staged jars.
Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #1
This vote is cancelled. Please see the voting thread for the new candidate, RC2: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Apache-Flink-Stateful-Functions-Release-2-0-0-release-candidate-2-td39379.html On Fri, Mar 27, 2020 at 2:39 PM Tzu-Li (Gordon) Tai wrote: > -1 > > Already discovered that the source distribution NOTICE file is missing > mentions for font-awesome. > The source of that is bundled under "docs/page/font-awesome/fonts", and is > licensed with SIL OFL 1.1 license, which makes it a requirement to be > listed in the NOTICE file. > > I'll open a new RC2 with only the changes to the source NOTICE and LICENSE > files. > > > > On Fri, Mar 27, 2020 at 10:37 AM Tzu-Li (Gordon) Tai > wrote: > >> Also, here is the documentation for Stateful Functions for those who were >> wondering: >> master - https://ci.apache.org/projects/flink/flink-statefun-docs-master/ >> release-2.0 - >> https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/ >> >> This is not yet visible directly from the Flink website, since the >> efforts for incorporating Stateful Functions in the website is still >> ongoing. >> >> On Fri, Mar 27, 2020 at 12:48 AM Tzu-Li (Gordon) Tai >> wrote: >> >>> Hi everyone, >>> >>> Please review and vote on the release candidate #0 for the version 2.0.0 >>> of Apache Flink Stateful Functions, >>> as follows: >>> [ ] +1, Approve the release >>> [ ] -1, Do not approve the release (please provide specific comments) >>> >>> **Testing Guideline** >>> >>> You can find here [1] a doc that we can use for collaborating testing >>> efforts. >>> The listed testing tasks in the doc also serve as a guideline in what to >>> test for this release. >>> If you wish to take ownership of a testing task, simply put your name >>> down in the "Checked by" field of the task. >>> >>> **Release Overview** >>> >>> As an overview, the release consists of the following: >>> a) Stateful Functions canonical source distribution, to be deployed to >>> the release repository at dist.apache.org >>> b) Stateful Functions Python SDK distributions to be deployed to PyPI >>> c) Maven artifacts to be deployed to the Maven Central Repository >>> >>> **Staging Areas to Review** >>> >>> The staging areas containing the above mentioned artifacts are as >>> follows, for your review: >>> * All artifacts for a) and b) can be found in the corresponding dev >>> repository at dist.apache.org [2] >>> * All artifacts for c) can be found at the Apache Nexus Repository [3] >>> >>> All artifacts are singed with the >>> key 1C1E2394D3194E1944613488F320986D35C33D6A [4] >>> >>> Other links for your review: >>> * JIRA release notes [5] >>> * source code tag "release-2.0.0-rc0" [6] [7] >>> >>> **Extra Remarks** >>> >>> * Part of the release is also official Docker images for Stateful >>> Functions. This can be a separate process, since the creation of those >>> relies on the fact that we have distribution jars already deployed to >>> Maven. I will follow-up with this after these artifacts are officially >>> released. >>> In the meantime, there is this discussion [8] ongoing about where to >>> host the StateFun Dockerfiles. >>> * The Flink Website and blog post is also being worked on (by Marta) as >>> part of the release, to incorporate the new Stateful Functions project. We >>> can follow up with a link to those changes afterwards in this vote thread, >>> but that would not block you to test and cast your votes already. >>> >>> **Vote Duration** >>> >>> The vote will be open for at least 72 hours *(target end date is next >>> Tuesday, April 31).* >>> It is adopted by majority approval, with at least 3 PMC >>> affirmative votes. >>> >>> Thanks, >>> Gordon >>> >>> [1] >>> https://docs.google.com/document/d/1P9yjwSbPQtul0z2AXMnVolWQbzhxs68suJvzR6xMjcs/edit?usp=sharing >>> [2] >>> https://dist.apache.org/repos/dist/dev/flink/flink-statefun-2.0.0-rc1/ >>> [3] >>> https://repository.apache.org/content/repositories/orgapacheflink-1339/ >>> [4] https://dist.apache.org/repos/dist/release/flink/KEYS >>> [5] >>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346878 >>> [6] >>> https://gitbox.apache.org/repos/asf?p=flink-statefun.git;a=commit;h=ebd7ca866f7d11fa43c7a5bb36861ee1b24b0980 >>> [7] https://github.com/apache/flink-statefun/tree/release-2.0.0-rc1 >>> [8] >>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Creating-a-new-repo-to-host-Stateful-Functions-Dockerfiles-td39342.html >>> >>> TIP: You can create a `settings.xml` file with these contents: >>> >>> """ >>> >>> >>> flink-statefun-2.0.0 >>> >>> >>> >>> flink-statefun-2.0.0 >>> >>> >>> flink-statefun-2.0.0 >>> >>> https://repository.apache.org/content/repositories/orgapacheflink-1339/ >>> >>> >>> >>> archetype >>> >>> https://repository.apache.org/content/repositories/orgapacheflink-1339/ >>> >>>
Re: [DISCUSS] FLIP-119: Pipelined Region Scheduling
Thanks for updating! +1 for supporting the pipelined region scheduling. Although we could not prevent resource deadlock in all scenarios, it is really a big step. The design generally LGTM. One minor thing I want to make sure. If I understand correctly, the blocking edge will not be consumable before the upstream is finished. Without it, when the failure occurs in the upstream region, there is still possible to have a resource deadlock. I don't know whether it is an explicit protocol now. But after this FLIP, I think it should not be broken. I'm also wondering could we execute the upstream and downstream regions at the same time if we have enough resources. It can shorten the running time of large job. We should not break the protocol of blocking edge. But if it is possible to change the data exchange mode of two regions dynamically? Best, Yangze Guo On Fri, Mar 27, 2020 at 1:15 PM Zhu Zhu wrote: > > Thanks for reporting this Yangze. > I have update the permission to those images. Everyone are able to view them > now. > > Thanks, > Zhu Zhu > > Yangze Guo 于2020年3月27日周五 上午11:25写道: >> >> Thanks for driving this discussion, Zhu Zhu & Gary. >> >> I found that the image link in this FLIP is not working well. When I >> open that link, Google doc told me that I have no access privilege. >> Could you take a look at that issue? >> >> Best, >> Yangze Guo >> >> On Fri, Mar 27, 2020 at 1:38 AM Gary Yao wrote: >> > >> > Hi community, >> > >> > In the past releases, we have been working on refactoring Flink's scheduler >> > with the goal of making the scheduler extensible [1]. We have rolled out >> > most of the intended refactoring in Flink 1.10, and we think it is now time >> > to leverage our newly introduced abstractions to implement a new resource >> > optimized scheduling strategy: Pipelined Region Scheduling. >> > >> > This scheduling strategy aims at: >> > >> > * avoidance of resource deadlocks when running batch jobs >> > >> > * tunable with respect to resource consumption and throughput >> > >> > More details can be found in the Wiki [2]. We are looking forward to your >> > feedback. >> > >> > Best, >> > >> > Zhu Zhu & Gary >> > >> > [1] https://issues.apache.org/jira/browse/FLINK-10429 >> > >> > [2] >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-119+Pipelined+Region+Scheduling
[jira] [Created] (FLINK-16824) Creating Temporal Table Function via DDL
Konstantin Knauf created FLINK-16824: Summary: Creating Temporal Table Function via DDL Key: FLINK-16824 URL: https://issues.apache.org/jira/browse/FLINK-16824 Project: Flink Issue Type: Improvement Components: Table SQL / API Reporter: Konstantin Knauf Currently, a Temporal Table Function can only be created via the Table API or indirectly via the configuration file of the SQL Client. It would be great, if this was also possible in pure SQL via a DDL statement. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Creating a new repo to host Stateful Functions Dockerfiles
+1 for a separate repository. Best, Gary On Fri, Mar 27, 2020 at 2:46 AM Hequn Cheng wrote: > +1 for a separate repository. > The dedicated `flink-docker` repo works fine now. We can do it similarly. > > Best, > Hequn > > On Fri, Mar 27, 2020 at 1:16 AM Till Rohrmann > wrote: > > > +1 for a separate repository. > > > > Cheers, > > Till > > > > On Thu, Mar 26, 2020 at 5:13 PM Ufuk Celebi wrote: > > > > > +1. > > > > > > The repo creation process is a light-weight, automated process on the > ASF > > > side. When Patrick Lucas contributed docker-flink back to the Flink > > > community (as flink-docker), there was virtually no overhead in > creating > > > the repository. Reusing build scripts should still be possible at the > > cost > > > of some duplication which is fine imo. > > > > > > – Ufuk > > > > > > On Thu, Mar 26, 2020 at 4:18 PM Stephan Ewen wrote: > > > > > > > > +1 to a separate repository. > > > > > > > > It seems to be best practice in the docker community. > > > > And since it does not add overhead, why not go with the best > practice? > > > > > > > > Best, > > > > Stephan > > > > > > > > > > > > On Thu, Mar 26, 2020 at 4:15 PM Tzu-Li (Gordon) Tai < > > tzuli...@apache.org > > > > > > > wrote: > > > >> > > > >> Hi Flink devs, > > > >> > > > >> As part of a Stateful Functions release, we would like to publish > > > Stateful > > > >> Functions Docker images to Dockerhub as an official image. > > > >> > > > >> Some background context on Stateful Function images, for those who > are > > > not > > > >> familiar with the project yet: > > > >> > > > >>- Stateful Function images are built on top of the Flink official > > > >>images, with additional StateFun dependencies being added. > > > >>You can take a look at the scripts we currently use to build the > > > images > > > >>locally for development purposes [1]. > > > >>- They are quite important for user experience, since building a > > > Docker > > > >>image is the recommended go-to deployment mode for StateFun user > > > >>applications [2]. > > > >> > > > >> > > > >> A prerequisite for all of this is to first decide where we host the > > > >> Stateful Functions Dockerfiles, > > > >> before we can proceed with the process of requesting a new official > > > image > > > >> repository at Dockerhub. > > > >> > > > >> We’re proposing to create a new dedicated repo for this purpose, > > > >> with the name `apache/flink-statefun-docker`. > > > >> > > > >> While we did initially consider integrating the StateFun Dockerfiles > > to > > > be > > > >> hosted together with the Flink ones in the existing > > > `apache/flink-docker` > > > >> repo, we had the following concerns: > > > >> > > > >>- In general, it is a convention that each official Dockerhub > image > > > is > > > >>backed by a dedicated source repo hosting the Dockerfiles. > > > >>- The `apache/flink-docker` repo already has quite a few > dedicated > > > >>tooling and CI smoke tests specific for the Flink images. > > > >>- Flink and StateFun have separate versioning schemes and > > independent > > > >>release cycles. A new Flink release does not necessarily require > a > > > >>“lock-step” to release new StateFun images as well. > > > >>- Considering the above all-together, and the fact that creating > a > > > new > > > >>repo is rather low-effort, having a separate repo would probably > > make > > > more > > > >>sense here. > > > >> > > > >> > > > >> What do you think? > > > >> > > > >> Cheers, > > > >> Gordon > > > >> > > > >> [1] > > > >> > > > > > > > > > https://github.com/apache/flink-statefun/blob/master/tools/docker/build-stateful-functions.sh > > > >> [2] > > > >> > > > > > > > > > https://ci.apache.org/projects/flink/flink-statefun-docs-master/deployment-and-operations/packaging.html > > > > > >
Re: Dynamic Flink SQL
I want to do a bit different hacky PoC: * I will write a sink, that caches the results in "JVM global" memory. Then I will write a source, that reads this cache. * I will launch one job, that reads from Kafka source, shuffles the data to the desired partitioning and then sinks to that cache. * Then I will lunch multiple jobs (Datastream based or Flink SQL based) , that uses the source from cache to read the data out and then reinterprets it as keyed stream [1]. * Using JVM global memory is necessary, because AFAIK the jobs use different classloaders. The class of cached object also needs to be available in the parent classloader i.e. in the cluster's classpath. This is just to prove the idea, the performance and usefulness of it. All the problems of checkpointing this data I will leave for later. I'm very very interested in your, community, comments about this idea and later productization of it. Thanks! Answering your comments: > Unless you need reprocessing for newly added rules, I'd probably just > cancel with savepoint and restart the application with the new rules. Of > course, it depends on the rules themselves and how much state they require > if a restart is viable. That's up to a POC. > No, I don't need reprocessing (yet). The rule starts processing the data from the moment it is defined. The cancellation with savepoint was considered, but because the number of new rules defined/changed daily might be large enough, that will generate too much of downtime. There is a lot of state kept in those rules making the restart heavy. What's worse, that would be cross-tenant downtime, unless the job was somehow per team/tenant. Therefore we reject this option. BTW, the current design of our system is similar to the one from the blog series by Alexander Fedulov about dynamic rules pattern [2] he's just publishing. > They will consume the same high intensive source(s) therefore I want to >> optimize for that by consuming the messages in Flink only once. >> > That's why I proposed to run one big query instead of 500 small ones. Have > a POC where you add two of your rules manually to a Table and see how the > optimized logical plan looks like. I'd bet that the source is only tapped > once. > I can do that PoC, no problem. But AFAIK it will only work with the "restart with savepoint" pattern discussed above. [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/experimental.html#reinterpreting-a-pre-partitioned-data-stream-as-keyed-stream [2] https://flink.apache.org/news/2020/03/24/demo-fraud-detection-2.html > On Wed, Mar 25, 2020 at 6:15 PM Krzysztof Zarzycki > wrote: > >> Hello Arvid, >> Thanks for joining to the thread! >> First, did you take into consideration that I would like to dynamically >> add queries on the same source? That means first define one query, later >> the day add another one , then another one, and so on. A Week later kill >> one of those, start yet another one, etc... There will be hundreds of these >> queries running at once, but the set of queries change several times a day. >> They will consume the same high intensive source(s) therefore I want to >> optimize for that by consuming the messages in Flink only once. >> >> Regarding the temporary tables AFAIK they are only the metadata (let's >> say Kafka topic detals) and store it in the scope of a SQL session. >> Therefore multiple queries against that temp table will behave the same way >> as querying normal table, that is will read the datasource multiple times. >> >> It looks like the feature I want or could use is defined by the way of >> FLIP-36 about Interactive Programming, more precisely caching the stream >> table [1]. >> While I wouldn't like to limit the discussion to that non-existing yet >> feature. Maybe there are other ways of achieving this danymic querying >> capability. >> >> Kind Regards, >> Krzysztof >> >> [1] >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink#FLIP-36:SupportInteractiveProgramminginFlink-Cacheastreamtable >> >> >> >> * You want to use primary Table API as that allows you to >>> programmatically introduce structural variance (changing rules). >>> >> * You start by registering the source as temporary table. >>> >> * Then you add your rules as SQL through `TableEnvironment#sqlQuery`. >>> * Lastly you unionAll the results. >>> >>> Then I'd perform some experiment if indeed the optimizer figured out >>> that it needs to only read the source once. The resulting code would be >>> minimal and easy to maintain. If the performance is not satisfying, you can >>> always make it more complicated. >>> >>> Best, >>> >>> Arvid >>> >>> >>> On Mon, Mar 23, 2020 at 7:02 PM Krzysztof Zarzycki >>> wrote: >>> Dear Flink community! In our company we have implemented a system that realize the dynamic business rules pattern. We spoke about it during Flink Forward 2019 https://www.youtube.com/watch?v=CyrQ5B0exqU.
Re: [DISCUSS] Creating a new repo to host Stateful Functions Dockerfiles
+1 for this proposal. Very reasonable analysis! Best, Zhijiang -- From:Hequn Cheng Send Time:2020 Mar. 27 (Fri.) 09:46 To:dev Cc:private Subject:Re: [DISCUSS] Creating a new repo to host Stateful Functions Dockerfiles +1 for a separate repository. The dedicated `flink-docker` repo works fine now. We can do it similarly. Best, Hequn On Fri, Mar 27, 2020 at 1:16 AM Till Rohrmann wrote: > +1 for a separate repository. > > Cheers, > Till > > On Thu, Mar 26, 2020 at 5:13 PM Ufuk Celebi wrote: > > > +1. > > > > The repo creation process is a light-weight, automated process on the ASF > > side. When Patrick Lucas contributed docker-flink back to the Flink > > community (as flink-docker), there was virtually no overhead in creating > > the repository. Reusing build scripts should still be possible at the > cost > > of some duplication which is fine imo. > > > > – Ufuk > > > > On Thu, Mar 26, 2020 at 4:18 PM Stephan Ewen wrote: > > > > > > +1 to a separate repository. > > > > > > It seems to be best practice in the docker community. > > > And since it does not add overhead, why not go with the best practice? > > > > > > Best, > > > Stephan > > > > > > > > > On Thu, Mar 26, 2020 at 4:15 PM Tzu-Li (Gordon) Tai < > tzuli...@apache.org > > > > > wrote: > > >> > > >> Hi Flink devs, > > >> > > >> As part of a Stateful Functions release, we would like to publish > > Stateful > > >> Functions Docker images to Dockerhub as an official image. > > >> > > >> Some background context on Stateful Function images, for those who are > > not > > >> familiar with the project yet: > > >> > > >>- Stateful Function images are built on top of the Flink official > > >>images, with additional StateFun dependencies being added. > > >>You can take a look at the scripts we currently use to build the > > images > > >>locally for development purposes [1]. > > >>- They are quite important for user experience, since building a > > Docker > > >>image is the recommended go-to deployment mode for StateFun user > > >>applications [2]. > > >> > > >> > > >> A prerequisite for all of this is to first decide where we host the > > >> Stateful Functions Dockerfiles, > > >> before we can proceed with the process of requesting a new official > > image > > >> repository at Dockerhub. > > >> > > >> We’re proposing to create a new dedicated repo for this purpose, > > >> with the name `apache/flink-statefun-docker`. > > >> > > >> While we did initially consider integrating the StateFun Dockerfiles > to > > be > > >> hosted together with the Flink ones in the existing > > `apache/flink-docker` > > >> repo, we had the following concerns: > > >> > > >>- In general, it is a convention that each official Dockerhub image > > is > > >>backed by a dedicated source repo hosting the Dockerfiles. > > >>- The `apache/flink-docker` repo already has quite a few dedicated > > >>tooling and CI smoke tests specific for the Flink images. > > >>- Flink and StateFun have separate versioning schemes and > independent > > >>release cycles. A new Flink release does not necessarily require a > > >>“lock-step” to release new StateFun images as well. > > >>- Considering the above all-together, and the fact that creating a > > new > > >>repo is rather low-effort, having a separate repo would probably > make > > more > > >>sense here. > > >> > > >> > > >> What do you think? > > >> > > >> Cheers, > > >> Gordon > > >> > > >> [1] > > >> > > > > > https://github.com/apache/flink-statefun/blob/master/tools/docker/build-stateful-functions.sh > > >> [2] > > >> > > > > > https://ci.apache.org/projects/flink/flink-statefun-docs-master/deployment-and-operations/packaging.html > > >
Re: [Discussion] Job generation / submission hooks & Atlas integration
Hi Jack, Yes, we know how to do it and even have the implementation ready and being reviewed by the Atlas community at the moment. :-) Would you be interested in having a look? On Thu, Mar 19, 2020 at 12:56 PM jackylau wrote: > Hi: > i think flink integrate atlas also need add catalog information such as > spark atlas project > .https://github.com/hortonworks-spark/spark-atlas-connector > when user use catalog such as JDBCCatalog/HiveCatalog, flink atlas project > will sync this information to atlas. > But i don't find any Event Interface for flink to implement it as > spark-atlas-connector does. Does anyone know how to do it > > > > -- > Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ >
Re: [Discussion] Job generation / submission hooks & Atlas integration
Hi Márton Balassi: I am very glad to look at it and where to find . And it is my issue , which you can see https://issues.apache.org/jira/browse/FLINK-16774 -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
Re: [Discussion] Job generation / submission hooks & Atlas integration
Hi Jack! You can find the document here: https://docs.google.com/document/d/1wSgzPdhcwt-SlNBBqL-Zb7g8fY6bN8JwHEg7GCdsBG8/edit?usp=sharing The document links to an already working Atlas hook prototype (and accompanying flink fork). The links for that are also here: Flink: https://github.com/gyfora/flink/tree/atlas-changes Atlas: https://github.com/gyfora/atlas/tree/flink-bridge We need to adapt this according to the above discussion this was an early prototype. Gyula On Fri, Mar 27, 2020 at 11:07 AM jackylau wrote: > Hi Márton Balassi: >I am very glad to look at it and where to find . >And it is my issue , which you can see > https://issues.apache.org/jira/browse/FLINK-16774 > > > > -- > Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ >
Re: [DISCUSS] FLIP-119: Pipelined Region Scheduling
To Yangze, >> the blocking edge will not be consumable before the upstream is finished. Yes. This is how we define a BLOCKING result partition, "Blocking partitions represent blocking data exchanges, where the data stream is first fully produced and then consumed". >> I'm also wondering could we execute the upstream and downstream regions at the same time if we have enough resources It may lead to resource waste since the tasks in downstream regions cannot read any data before the upstream region finishes. It saves a bit time on schedule, but usually it does not make much difference for large jobs, since data processing takes much more time. For small jobs, one can make all edges PIPELINED so that all the tasks can be scheduled at the same time. >> is it possible to change the data exchange mode of two regions dynamically? This is not in the scope of the FLIP. But we are moving forward to a more extensible scheduler (FLINK-10429) and resource aware scheduling (FLINK-10407). So I think it's possible we can have a scheduler in the future which dynamically changes the shuffle type wisely regarding available resources. Thanks, Zhu Zhu Yangze Guo 于2020年3月27日周五 下午4:49写道: > Thanks for updating! > > +1 for supporting the pipelined region scheduling. Although we could > not prevent resource deadlock in all scenarios, it is really a big > step. > > The design generally LGTM. > > One minor thing I want to make sure. If I understand correctly, the > blocking edge will not be consumable before the upstream is finished. > Without it, when the failure occurs in the upstream region, there is > still possible to have a resource deadlock. I don't know whether it is > an explicit protocol now. But after this FLIP, I think it should not > be broken. > I'm also wondering could we execute the upstream and downstream > regions at the same time if we have enough resources. It can shorten > the running time of large job. We should not break the protocol of > blocking edge. But if it is possible to change the data exchange mode > of two regions dynamically? > > Best, > Yangze Guo > > On Fri, Mar 27, 2020 at 1:15 PM Zhu Zhu wrote: > > > > Thanks for reporting this Yangze. > > I have update the permission to those images. Everyone are able to view > them now. > > > > Thanks, > > Zhu Zhu > > > > Yangze Guo 于2020年3月27日周五 上午11:25写道: > >> > >> Thanks for driving this discussion, Zhu Zhu & Gary. > >> > >> I found that the image link in this FLIP is not working well. When I > >> open that link, Google doc told me that I have no access privilege. > >> Could you take a look at that issue? > >> > >> Best, > >> Yangze Guo > >> > >> On Fri, Mar 27, 2020 at 1:38 AM Gary Yao wrote: > >> > > >> > Hi community, > >> > > >> > In the past releases, we have been working on refactoring Flink's > scheduler > >> > with the goal of making the scheduler extensible [1]. We have rolled > out > >> > most of the intended refactoring in Flink 1.10, and we think it is > now time > >> > to leverage our newly introduced abstractions to implement a new > resource > >> > optimized scheduling strategy: Pipelined Region Scheduling. > >> > > >> > This scheduling strategy aims at: > >> > > >> > * avoidance of resource deadlocks when running batch jobs > >> > > >> > * tunable with respect to resource consumption and throughput > >> > > >> > More details can be found in the Wiki [2]. We are looking forward to > your > >> > feedback. > >> > > >> > Best, > >> > > >> > Zhu Zhu & Gary > >> > > >> > [1] https://issues.apache.org/jira/browse/FLINK-10429 > >> > > >> > [2] > >> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-119+Pipelined+Region+Scheduling >
[jira] [Created] (FLINK-16825) PrometheusReporterEndToEndITCase should rely on path returned by DownloadCache
Chesnay Schepler created FLINK-16825: Summary: PrometheusReporterEndToEndITCase should rely on path returned by DownloadCache Key: FLINK-16825 URL: https://issues.apache.org/jira/browse/FLINK-16825 Project: Flink Issue Type: Bug Components: Runtime / Metrics, Tests Affects Versions: 1.10.0 Reporter: Chesnay Schepler Assignee: Alexander Fedulov Fix For: 1.10.1, 1.11.0 The {{PrometheusReporterEndToEndITCas}} uses the {{DownloadCache#getOrDownload}} to download a file, but ignores the returned {{Path}} and simply assumes the file name. This assumption can fail if there was an error during the download, where the cache appends an attempt index. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] FLIP-119: Pipelined Region Scheduling
Gary & Zhu Zhu, Thanks for preparing this FLIP, and a BIG +1 from my side. The trade-off between resource utilization and potential deadlock problems has always been a pain. Despite not solving all the deadlock cases, this FLIP is definitely a big improvement. IIUC, it has already covered all the existing single job cases, and all the mentioned non-covered cases are either in multi-job session clusters or with diverse slot resources in future. I've read through the FLIP, and it looks really good to me. Good job! All the concerns and limitations that I can think of have already been clearly stated, with reasonable potential future solutions. From the perspective of fine-grained resource management, I do not see any serious/irresolvable conflict at this time. nit: The in-page links are not working. I guess those are copied from google docs directly? Thank you~ Xintong Song On Fri, Mar 27, 2020 at 6:26 PM Zhu Zhu wrote: > To Yangze, > > >> the blocking edge will not be consumable before the upstream is > finished. > Yes. This is how we define a BLOCKING result partition, "Blocking > partitions represent blocking data exchanges, where the data stream is > first fully produced and then consumed". > > >> I'm also wondering could we execute the upstream and downstream regions > at the same time if we have enough resources > It may lead to resource waste since the tasks in downstream regions cannot > read any data before the upstream region finishes. It saves a bit time on > schedule, but usually it does not make much difference for large jobs, > since data processing takes much more time. For small jobs, one can make > all edges PIPELINED so that all the tasks can be scheduled at the same > time. > > >> is it possible to change the data exchange mode of two regions > dynamically? > This is not in the scope of the FLIP. But we are moving forward to a more > extensible scheduler (FLINK-10429) and resource aware scheduling > (FLINK-10407). > So I think it's possible we can have a scheduler in the future which > dynamically changes the shuffle type wisely regarding available resources. > > Thanks, > Zhu Zhu > > Yangze Guo 于2020年3月27日周五 下午4:49写道: > > > Thanks for updating! > > > > +1 for supporting the pipelined region scheduling. Although we could > > not prevent resource deadlock in all scenarios, it is really a big > > step. > > > > The design generally LGTM. > > > > One minor thing I want to make sure. If I understand correctly, the > > blocking edge will not be consumable before the upstream is finished. > > Without it, when the failure occurs in the upstream region, there is > > still possible to have a resource deadlock. I don't know whether it is > > an explicit protocol now. But after this FLIP, I think it should not > > be broken. > > I'm also wondering could we execute the upstream and downstream > > regions at the same time if we have enough resources. It can shorten > > the running time of large job. We should not break the protocol of > > blocking edge. But if it is possible to change the data exchange mode > > of two regions dynamically? > > > > Best, > > Yangze Guo > > > > On Fri, Mar 27, 2020 at 1:15 PM Zhu Zhu wrote: > > > > > > Thanks for reporting this Yangze. > > > I have update the permission to those images. Everyone are able to view > > them now. > > > > > > Thanks, > > > Zhu Zhu > > > > > > Yangze Guo 于2020年3月27日周五 上午11:25写道: > > >> > > >> Thanks for driving this discussion, Zhu Zhu & Gary. > > >> > > >> I found that the image link in this FLIP is not working well. When I > > >> open that link, Google doc told me that I have no access privilege. > > >> Could you take a look at that issue? > > >> > > >> Best, > > >> Yangze Guo > > >> > > >> On Fri, Mar 27, 2020 at 1:38 AM Gary Yao wrote: > > >> > > > >> > Hi community, > > >> > > > >> > In the past releases, we have been working on refactoring Flink's > > scheduler > > >> > with the goal of making the scheduler extensible [1]. We have rolled > > out > > >> > most of the intended refactoring in Flink 1.10, and we think it is > > now time > > >> > to leverage our newly introduced abstractions to implement a new > > resource > > >> > optimized scheduling strategy: Pipelined Region Scheduling. > > >> > > > >> > This scheduling strategy aims at: > > >> > > > >> > * avoidance of resource deadlocks when running batch jobs > > >> > > > >> > * tunable with respect to resource consumption and throughput > > >> > > > >> > More details can be found in the Wiki [2]. We are looking forward to > > your > > >> > feedback. > > >> > > > >> > Best, > > >> > > > >> > Zhu Zhu & Gary > > >> > > > >> > [1] https://issues.apache.org/jira/browse/FLINK-10429 > > >> > > > >> > [2] > > >> > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-119+Pipelined+Region+Scheduling > > >
Re: [DISCUSS] FLIP-108: Add GPU support in Flink
Maybe one final comment: It is probably not an issue, but let's try and keep user code (via user code classloader) out of the ResourceManager, if possible. As background: There were thoughts in the past to support setups where the RM must run with "superuser" credentials, but we cannot run JM/TM with these credentials, as the user code might access them otherwise. This is actually possible today, you can run the RM in a different JVM or in a different container, and give it more credentials than JMs / TMs. But for this to be feasible, we cannot allow any user-defined code to be in the JVM, because that instantaneously breaks the isolation of credentials. On Fri, Mar 27, 2020 at 4:01 AM Yangze Guo wrote: > Thanks for the feedback, @Till and @Xintong. > > Regarding separating the interface, I'm also +1 with it. > > Regarding the resource allocation interface, true, it's dangerous to > give much access to user codes. Changing the return type to Map key, String/Long value> makes sense to me. AFAIK, it is compatible > with all the first-party supported resources for Yarn/Kubernetes. It > could also free us from the potential dependency issue as well. > > Best, > Yangze Guo > > On Fri, Mar 27, 2020 at 10:42 AM Xintong Song > wrote: > > > > Thanks for updating the FLIP, Yangze. > > > > I agree with Till that we probably want to separate the K8s/Yarn > decorator > > calls. Users can still configure one driver class, and we can use > > `instanceof` to check whether the driver implemented K8s/Yarn specific > > interfaces. > > > > Moreover, I'm not sure about exposing entire `ContainerRequest` / `Pod` > > (`AbstractKubernetesStepDecorator` directly manipulates on `Pod`) to user > > codes. It gives more access to user codes than needed for defining > external > > resource, which might cause problems. Instead, I would suggest to have > > interface like `Map > > getYarn/KubernetesExternalResource()` and assemble them into > > `ContainerRequest` / `Pod` in Yarn/KubernetesResourceManager. > > > > Thank you~ > > > > Xintong Song > > > > > > > > On Fri, Mar 27, 2020 at 1:10 AM Till Rohrmann > wrote: > > > > > Hi everyone, > > > > > > I'm a bit late to the party. I think the current proposal looks good. > > > > > > Concerning the ExternalResourceDriver interface defined in the FLIP > [1], I > > > would suggest to not include the decorator calls for Kubernetes and > Yarn in > > > the base interface. Instead I would suggest to segregate the deployment > > > specific decorator calls into separate interfaces. That way an > > > ExternalResourceDriver does not have to support all deployments from > the > > > very beginning. Moreover, some resources might not be supported by a > > > specific deployment target and the natural way to express this would > be to > > > not implement the respective deployment specific interface. > > > > > > Moreover, having void > > > addExternalResourceToRequest(AMRMClient.ContainerRequest > containerRequest) > > > in the ExternalResourceDriver interface would require Hadoop on Flink's > > > classpath whenever the external resource driver is being used. > > > > > > [1] > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink > > > > > > Cheers, > > > Till > > > > > > On Thu, Mar 26, 2020 at 12:45 PM Stephan Ewen > wrote: > > > > > > > Nice, thanks a lot! > > > > > > > > On Thu, Mar 26, 2020 at 10:21 AM Yangze Guo > wrote: > > > > > > > > > Thanks for the suggestion, @Stephan, @Becket and @Xintong. > > > > > > > > > > I've updated the FLIP accordingly. I do not add a > > > > > ResourceInfoProvider. Instead, I introduce the > ExternalResourceDriver, > > > > > which takes the responsibility of all relevant operations on both > RM > > > > > and TM sides. > > > > > After a rethink about decoupling the management of external > resources > > > > > from TaskExecutor, I think we could do the same thing on the > > > > > ResourceManager side. We do not need to add a specific allocation > > > > > logic to the ResourceManager each time we add a specific external > > > > > resource. > > > > > - For Yarn, we need the ExternalResourceDriver to edit the > > > > > containerRequest. > > > > > - For Kubenetes, ExternalResourceDriver could provide a decorator > for > > > > > the TM pod. > > > > > > > > > > In this way, just like MetricReporter, we allow users to define > their > > > > > custom ExternalResourceDriver. It is more extensible and fits the > > > > > separation of concerns. For more details, please take a look at > [1]. > > > > > > > > > > [1] > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink > > > > > > > > > > Best, > > > > > Yangze Guo > > > > > > > > > > On Wed, Mar 25, 2020 at 7:32 PM Stephan Ewen > wrote: > > > > > > > > > > > > This sounds good to go ahead from my side. > > > > > > > > > > > > I like the approach that Becket suggested - in that case the core > > > > > > abstraction that ev
[jira] [Created] (FLINK-16826) Support copying of jars
Chesnay Schepler created FLINK-16826: Summary: Support copying of jars Key: FLINK-16826 URL: https://issues.apache.org/jira/browse/FLINK-16826 Project: Flink Issue Type: Improvement Components: Test Infrastructure Reporter: Chesnay Schepler Assignee: Alexander Fedulov Fix For: 1.11.0 The {{FlinkResourceSetup}} currently allows moving jars around, e.g., from opt to lib. For some tests this isn't sufficient as they may want a jar to be in lib and the plugins directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Releasing Flink 1.10.1
Here comes the latest status of issues in 1.10.1 watch list: * Blockers (4 left) - [Under Discussion] FLINK-16018 Improve error reporting when submitting batch job (instead of AskTimeoutException) - [Closed] FLINK-16142 Memory Leak causes Metaspace OOM error on repeated job submission - [Closed] FLINK-16170 SearchTemplateRequest ClassNotFoundException when use flink-sql-connector-elasticsearch7 - [PR Approved] FLINK-16262 Class loader problem with FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib directory - [Closed] FLINK-16406 Increase default value for JVM Metaspace to minimise its OutOfMemoryError - [Closed] FLINK-16454 Update the copyright year in NOTICE files - [In Progress] FLINK-16576 State inconsistency on restore with memory state backends - [New] [PR Submitted] FLINK-16705 LocalExecutor tears down MiniCluster before client can retrieve JobResult * Critical (2 left) - [Closed] FLINK-16047 Blink planner produces wrong aggregate results with state clean up - [PR Submitted] FLINK-16070 Blink planner can not extract correct unique key for UpsertStreamTableSink - [Fix for 1.10.1 Done] FLINK-16225 Metaspace Out Of Memory should be handled as Fatal Error in TaskManager - [Open] FLINK-16408 Bind user code class loader to lifetime of a slot There's a new blocker added. Please let me know if you find any other issues should be put into the watch list. Thanks. Best Regards, Yu On Sat, 21 Mar 2020 at 14:37, Yu Li wrote: > Hi All, > > Here is the status update of issues in 1.10.1 watch list: > > * Blockers (7) > > - [Under Discussion] FLINK-16018 Improve error reporting when submitting > batch job (instead of AskTimeoutException) > > - [Under Discussion] FLINK-16142 Memory Leak causes Metaspace OOM error > on repeated job submission > > - [PR Submitted] FLINK-16170 SearchTemplateRequest > ClassNotFoundException when use flink-sql-connector-elasticsearch7 > > - [PR Submitted] FLINK-16262 Class loader problem with > FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib directory > > - [Closed] FLINK-16406 Increase default value for JVM Metaspace to > minimise its OutOfMemoryError > > - [Closed] FLINK-16454 Update the copyright year in NOTICE files > > - [Open] FLINK-16576 State inconsistency on restore with memory state > backends > > > * Critical (4) > > - [Closed] FLINK-16047 Blink planner produces wrong aggregate results > with state clean up > > - [PR Submitted] FLINK-16070 Blink planner can not extract correct > unique key for UpsertStreamTableSink > > - [Under Discussion] FLINK-16225 Metaspace Out Of Memory should be > handled as Fatal Error in TaskManager > > - [Open] FLINK-16408 Bind user code class loader to lifetime of a slot > > Please let me know if you see any new blockers to add to the list. Thanks. > > Best Regards, > Yu > > > On Wed, 18 Mar 2020 at 00:11, Yu Li wrote: > >> Thanks for the updates Till! >> >> For FLINK-16018, maybe we could create two sub-tasks for easy and >> complete fix separately, and only include the easy one in 1.10.1? Or please >> just feel free to postpone the whole task to 1.10.2 if "all or nothing" >> policy is preferred (smile). Thanks. >> >> Best Regards, >> Yu >> >> >> On Tue, 17 Mar 2020 at 23:33, Till Rohrmann wrote: >> >>> +1 for a soonish bug fix release. Thanks for volunteering as our release >>> manager Yu. >>> >>> I think we can soon merge the increase of metaspace size and improving >>> the >>> error message. The assumption is that we currently don't have too many >>> small Flink 1.10 deployments with a process size <= 1GB. Of course, the >>> sooner we release the bug fix release, the fewer deployments will be >>> affected by this change. >>> >>> For FLINK-16018, I think there would be an easy solution which we could >>> include in the bug fix release. The proper fix will most likely take a >>> bit >>> longer, though. >>> >>> Cheers, >>> Till >>> >>> On Fri, Mar 13, 2020 at 8:08 PM Andrey Zagrebin >>> wrote: >>> >>> > > @Andrey and @Xintong - could we have a quick poll on the user mailing >>> > list >>> > > about increasing the metaspace size in Flink 1.10.1? Specifically >>> asking >>> > > for who has very small TM setups? >>> > >>> > There has been a survey about this topic since 10 days: >>> > >>> > `[Survey] Default size for the new JVM Metaspace limit in 1.10` >>> > I can ask there about the small TM setups. >>> > >>> > On Fri, Mar 13, 2020 at 5:19 PM Yu Li wrote: >>> > >>> > > Another blocker for 1.10.1: FLINK-16576 State inconsistency on >>> restore >>> > with >>> > > memory state backends >>> > > >>> > > Let me recompile the watching list with recent feedbacks. There're >>> > totally >>> > > 45 issues with Blocker/Critical priority for 1.10.1, out of which 14 >>> > > already resolved and 31 left, and the below ones are watched (meaning >>> > that >>> > > once the below ones got fixed, we will kick of the releasing with >>> left >>> > ones >>> > > postponed to 1.10.2 unless objections): >>> > > >
[jira] [Created] (FLINK-16827) FLINK 1.9.1 KAFKA流数据按时间字段排序在多并发情况下抛出空指针异常
wuchangjun created FLINK-16827: -- Summary: FLINK 1.9.1 KAFKA流数据按时间字段排序在多并发情况下抛出空指针异常 Key: FLINK-16827 URL: https://issues.apache.org/jira/browse/FLINK-16827 Project: Flink Issue Type: Bug Components: API / Core Affects Versions: 1.9.1 Environment: flink on yarn !image-2020-03-27-21-23-13-648.png! Reporter: wuchangjun Attachments: image-2020-03-27-21-22-21-122.png, image-2020-03-27-21-22-44-191.png, image-2020-03-27-21-23-13-648.png flink reads kafka data and sorts by time field. In the case of multiple concurrency, it throws the following null pointer exception. One concurrent processing is normal. PS:flink读取kafka数据并按照时间字段排序在多并发情况下抛出如下空指针异常。一个并发处理正常。 !image-2020-03-27-21-22-21-122.png! !image-2020-03-27-21-22-44-191.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16828) PrometheusReporters support factories
Chesnay Schepler created FLINK-16828: Summary: PrometheusReporters support factories Key: FLINK-16828 URL: https://issues.apache.org/jira/browse/FLINK-16828 Project: Flink Issue Type: Improvement Components: Runtime / Metrics Reporter: Chesnay Schepler Assignee: Alexander Fedulov Fix For: 1.11.0 Refactor the PrometheusReporter instantiation to go through a MetricReporterFactory instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16829) Pass PrometheusReporter arguments via constructor
Chesnay Schepler created FLINK-16829: Summary: Pass PrometheusReporter arguments via constructor Key: FLINK-16829 URL: https://issues.apache.org/jira/browse/FLINK-16829 Project: Flink Issue Type: Improvement Components: Runtime / Metrics Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.11.0 With FLINK-16828 Prometheus reporters are now instantiated via factories. We can now refactor the initialization to pass all parameter through the constructor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16830) Let users use Row/List/Map/Seq directly in Expression DSL
Dawid Wysakowicz created FLINK-16830: Summary: Let users use Row/List/Map/Seq directly in Expression DSL Key: FLINK-16830 URL: https://issues.apache.org/jira/browse/FLINK-16830 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Dawid Wysakowicz Assignee: Dawid Wysakowicz Fix For: 1.11.0 To improve the usability of the Expression DSL we should provide conversions from Row/List/Map/Seq classes to corresponding expressions. This include updating the {{ApiExpressionUtils#objectToExpression}} as well as adding Scala's implicit conversions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16831) Support plugin directory as JarLocation
Chesnay Schepler created FLINK-16831: Summary: Support plugin directory as JarLocation Key: FLINK-16831 URL: https://issues.apache.org/jira/browse/FLINK-16831 Project: Flink Issue Type: Improvement Components: Test Infrastructure Reporter: Chesnay Schepler Assignee: Alexander Fedulov Fix For: 1.11.0 To test the plugin mechanism we need to be able to move/copy jars into the plugins directory, including a parent directory as expected by the plugin system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #2
Thanks Gordon for the release and the nice release checking guide! It seems the NOTICE file is missing in the `statefun-ridesharing-example-simulator` module while it bundles dependencies like `org.springframework.boot:spring-boot-loader:2.1.6.RELEASE`. Best, Hequn On Fri, Mar 27, 2020 at 3:35 PM Tzu-Li (Gordon) Tai wrote: > Hi everyone, > > Please review and vote on the release candidate #2 for the version 2.0.0 of > Apache Flink Stateful Functions, > as follows: > [ ] +1, Approve the release > [ ] -1, Do not approve the release (please provide specific comments) > > **Testing Guideline** > > You can find here [1] a doc that we can use for collaborating testing > efforts. > The listed testing tasks in the doc also serve as a guideline in what to > test for this release. > If you wish to take ownership of a testing task, simply put your name down > in the "Checked by" field of the task. > > **Release Overview** > > As an overview, the release consists of the following: > a) Stateful Functions canonical source distribution, to be deployed to the > release repository at dist.apache.org > b) Stateful Functions Python SDK distributions to be deployed to PyPI > c) Maven artifacts to be deployed to the Maven Central Repository > > **Staging Areas to Review** > > The staging areas containing the above mentioned artifacts are as follows, > for your review: > * All artifacts for a) and b) can be found in the corresponding dev > repository at dist.apache.org [2] > * All artifacts for c) can be found at the Apache Nexus Repository [3] > > All artifacts are singed with the > key 1C1E2394D3194E1944613488F320986D35C33D6A [4] > > Other links for your review: > * JIRA release notes [5] > * source code tag "release-2.0.0-rc2" [6] [7] > > **Extra Remarks** > > * Part of the release is also official Docker images for Stateful > Functions. This can be a separate process, since the creation of those > relies on the fact that we have distribution jars already deployed to > Maven. I will follow-up with this after these artifacts are officially > released. > In the meantime, there is this discussion [8] ongoing about where to host > the StateFun Dockerfiles. > * The Flink Website and blog post is also being worked on (by Marta) as > part of the release, to incorporate the new Stateful Functions project. We > can follow up with a link to those changes afterwards in this vote thread, > but that would not block you to test and cast your votes already. > * Since the Flink website changes are still being worked on, you will not > yet be able to find the Stateful Functions docs from there. Here are the > links [9] [10]. > > **Vote Duration** > > The vote will be open for at least 72 hours *(target end date is next > Tuesday, April 31).* > It is adopted by majority approval, with at least 3 PMC affirmative votes. > > Thanks, > Gordon > > [1] > > https://docs.google.com/document/d/1P9yjwSbPQtul0z2AXMnVolWQbzhxs68suJvzR6xMjcs/edit?usp=sharing > [2] https://dist.apache.org/repos/dist/dev/flink/flink-statefun-2.0.0-rc2/ > [3] > https://repository.apache.org/content/repositories/orgapacheflink-1340/ > [4] https://dist.apache.org/repos/dist/release/flink/KEYS > [5] > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346878 > [6] > > https://gitbox.apache.org/repos/asf?p=flink-statefun.git;a=commit;h=14ce58048a3dda792f2329cf14d30aa952f6cb24 > [7] https://github.com/apache/flink-statefun/tree/release-2.0.0-rc2 > [8] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Creating-a-new-repo-to-host-Stateful-Functions-Dockerfiles-td39342.html > [9] https://ci.apache.org/projects/flink/flink-statefun-docs-master/ > [10] https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/ > > TIP: You can create a `settings.xml` file with these contents: > > """ > > > flink-statefun-2.0.0 > > > > flink-statefun-2.0.0 > > > flink-statefun-2.0.0 > > https://repository.apache.org/content/repositories/orgapacheflink-1340/ > > > > archetype > > https://repository.apache.org/content/repositories/orgapacheflink-1340/ > > > > > > > """ > > And reference that in you maven commands via `--settings > path/to/settings.xml`. > This is useful for creating a quickstart based on the staged release and > for building against the staged jars. >
Re: [DISCUSS] FLIP-119: Pipelined Region Scheduling
Thanks for creating this FLIP Zhu Zhu and Gary! +1 for adding pipelined region scheduling. Concerning the extended SlotProvider interface I have an idea how we could further improve it. If I am not mistaken, then you have proposed to introduce the two timeouts in order to distinguish between batch and streaming jobs and to encode that batch job requests can wait if there are enough resources in the SlotPool (not necessarily being available right now). I think what we actually need to tell the SlotProvider is whether a request will use the slot only for a limited time or not. This is exactly the difference between processing bounded and unbounded streams. If the SlotProvider knows this difference, then it can tell which slots will eventually be reusable and which not. Based on this it can tell whether a slot request can be fulfilled eventually or whether we fail after the specified timeout. Another benefit of this approach would be that we can easily support mixed bounded/unbounded workloads. What we would need to know for this approach is whether a pipelined region is processing a bounded or unbounded stream. To give an example let's assume we request the following sets of slots where each pipelined region requires the same slots: slotProvider.allocateSlots(pr1_bounded, timeout); slotProvider.allocateSlots(pr2_unbounded, timeout); slotProvider.allocateSlots(pr3_bounded, timeout); Let's assume we receive slots for pr1_bounded in < timeout and can then fulfill the request. Then we request pr2_unbounded. Since we know that pr1_bounded will complete eventually, we don't fail this request after timeout. Next we request pr3_bounded after pr2_unbounded has been completed. In this case, we see that we need to request new resources because pr2_unbounded won't release its slots. Hence, if we cannot allocate new resources within timeout, we fail this request. A small comment concerning "Resource deadlocks when slot allocation competition happens between multiple jobs in a session cluster": Another idea to solve this situation would be to give the ResourceManager the right to revoke slot assignments in order to change the mapping between requests and available slots. Cheers, Till On Fri, Mar 27, 2020 at 12:44 PM Xintong Song wrote: > Gary & Zhu Zhu, > > Thanks for preparing this FLIP, and a BIG +1 from my side. The trade-off > between resource utilization and potential deadlock problems has always > been a pain. Despite not solving all the deadlock cases, this FLIP is > definitely a big improvement. IIUC, it has already covered all the existing > single job cases, and all the mentioned non-covered cases are either in > multi-job session clusters or with diverse slot resources in future. > > I've read through the FLIP, and it looks really good to me. Good job! All > the concerns and limitations that I can think of have already been clearly > stated, with reasonable potential future solutions. From the perspective of > fine-grained resource management, I do not see any serious/irresolvable > conflict at this time. > > nit: The in-page links are not working. I guess those are copied from > google docs directly? > > > Thank you~ > > Xintong Song > > > > On Fri, Mar 27, 2020 at 6:26 PM Zhu Zhu wrote: > > > To Yangze, > > > > >> the blocking edge will not be consumable before the upstream is > > finished. > > Yes. This is how we define a BLOCKING result partition, "Blocking > > partitions represent blocking data exchanges, where the data stream is > > first fully produced and then consumed". > > > > >> I'm also wondering could we execute the upstream and downstream > regions > > at the same time if we have enough resources > > It may lead to resource waste since the tasks in downstream regions > cannot > > read any data before the upstream region finishes. It saves a bit time on > > schedule, but usually it does not make much difference for large jobs, > > since data processing takes much more time. For small jobs, one can make > > all edges PIPELINED so that all the tasks can be scheduled at the same > > time. > > > > >> is it possible to change the data exchange mode of two regions > > dynamically? > > This is not in the scope of the FLIP. But we are moving forward to a more > > extensible scheduler (FLINK-10429) and resource aware scheduling > > (FLINK-10407). > > So I think it's possible we can have a scheduler in the future which > > dynamically changes the shuffle type wisely regarding available > resources. > > > > Thanks, > > Zhu Zhu > > > > Yangze Guo 于2020年3月27日周五 下午4:49写道: > > > > > Thanks for updating! > > > > > > +1 for supporting the pipelined region scheduling. Although we could > > > not prevent resource deadlock in all scenarios, it is really a big > > > step. > > > > > > The design generally LGTM. > > > > > > One minor thing I want to make sure. If I understand correctly, the > > > blocking edge will not be consumable before the upstream is finished. > > > Without it, when the failure occurs in
[jira] [Created] (FLINK-16832) Refactor ReporSetup
Chesnay Schepler created FLINK-16832: Summary: Refactor ReporSetup Key: FLINK-16832 URL: https://issues.apache.org/jira/browse/FLINK-16832 Project: Flink Issue Type: Improvement Components: Runtime / Metrics Reporter: Chesnay Schepler Assignee: Alexander Fedulov Fix For: 1.11.0 To improve readability we can split up {{ReporterSetup#fromConfiguration}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16833) Make JDBCDialect pluggable for JDBC SQL connector
Jark Wu created FLINK-16833: --- Summary: Make JDBCDialect pluggable for JDBC SQL connector Key: FLINK-16833 URL: https://issues.apache.org/jira/browse/FLINK-16833 Project: Flink Issue Type: New Feature Components: Connectors / JDBC, Table SQL / Ecosystem Reporter: Jark Wu Fix For: 1.11.0 Currently, we only natively support very limited JDBC dialects in flink-jdbc. However, there are a lot of jdbc drivers in the world . We should expose the ability to users to make it pluggable. Some initial ideas: - expose a connector configuration to accept a JDBCDialect class name. - find supported JDBCDialect via SPI. A more detaited design should be proposed for disucssion. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16834) Examples cannot be run from IDE
Till Rohrmann created FLINK-16834: - Summary: Examples cannot be run from IDE Key: FLINK-16834 URL: https://issues.apache.org/jira/browse/FLINK-16834 Project: Flink Issue Type: Bug Components: Client / Job Submission, Examples Affects Versions: 1.11.0 Reporter: Till Rohrmann Fix For: 1.11.0 Due to removing the dependency {{flink-clients}} from {{flink-streaming-java}}, the examples can no longer be executed from the IDE. The problem is that the {{flink-clients}} dependency is missing. In order to solve this problem, we need to add the {{flink-clients}} dependency to all modules which need it and previously obtained it transitively from {{flink-streaming-java}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16835) Replace TableConfig with Configuration
Timo Walther created FLINK-16835: Summary: Replace TableConfig with Configuration Key: FLINK-16835 URL: https://issues.apache.org/jira/browse/FLINK-16835 Project: Flink Issue Type: Improvement Components: Table SQL / API Reporter: Timo Walther In order to allow reading and writing of configuration from a file or string-based properties. We should consider removing {{TableConfig}} and fully rely on a Configuration-based object with {{ConfigOption}}s. This effort was partially already started which is why {{TableConfig.getConfiguration}} exists. However, we should clarify if we would like to have control and traceability over layered configurations such as {{flink-conf,yaml < StreamExecutionEnvironment < TableEnvironment}}. Maybe the {{Configuration}} class is not the right abstraction for this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16836) Losing leadership does not clear rpc connection in JobManagerLeaderListener
Till Rohrmann created FLINK-16836: - Summary: Losing leadership does not clear rpc connection in JobManagerLeaderListener Key: FLINK-16836 URL: https://issues.apache.org/jira/browse/FLINK-16836 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.11.0 Reporter: Till Rohrmann Assignee: Till Rohrmann Fix For: 1.11.0 When losing the leadership the {{JobManagerLeaderListener}} closes the current {{rpcConnection}} but does not clear the field. This can lead to a failure of {{JobManagerLeaderListener#reconnect}} if this method is called after the {{JobMaster}} has lost its leadership. I propose to clear the field so that {{RegisteredRpcConnection#tryReconnect}} won't be called on a closed rpc connection. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Apache Flink Stateful Functions Release 2.0.0, release candidate #2
Hi Hequn, That's a good catch. Unfortunately, the spring boot dependency there, while itself being ASLv2 licensed, pulls in other dependencies that are not ASLv2. That would indeed make this problem a blocker. I'll do a thorough check again on the Maven artifacts that do bundle dependencies, before creating a new RC. AFAIK, there should be no more other than: - statefun-flink-distribution - statefun-ridesharing-example-simulator BR, Gordon On Fri, Mar 27, 2020 at 10:41 PM Hequn Cheng wrote: > Thanks Gordon for the release and the nice release checking guide! > > It seems the NOTICE file is missing in the > `statefun-ridesharing-example-simulator` module while it bundles > dependencies like > `org.springframework.boot:spring-boot-loader:2.1.6.RELEASE`. > > Best, > Hequn > > On Fri, Mar 27, 2020 at 3:35 PM Tzu-Li (Gordon) Tai > wrote: > > > Hi everyone, > > > > Please review and vote on the release candidate #2 for the version 2.0.0 > of > > Apache Flink Stateful Functions, > > as follows: > > [ ] +1, Approve the release > > [ ] -1, Do not approve the release (please provide specific comments) > > > > **Testing Guideline** > > > > You can find here [1] a doc that we can use for collaborating testing > > efforts. > > The listed testing tasks in the doc also serve as a guideline in what to > > test for this release. > > If you wish to take ownership of a testing task, simply put your name > down > > in the "Checked by" field of the task. > > > > **Release Overview** > > > > As an overview, the release consists of the following: > > a) Stateful Functions canonical source distribution, to be deployed to > the > > release repository at dist.apache.org > > b) Stateful Functions Python SDK distributions to be deployed to PyPI > > c) Maven artifacts to be deployed to the Maven Central Repository > > > > **Staging Areas to Review** > > > > The staging areas containing the above mentioned artifacts are as > follows, > > for your review: > > * All artifacts for a) and b) can be found in the corresponding dev > > repository at dist.apache.org [2] > > * All artifacts for c) can be found at the Apache Nexus Repository [3] > > > > All artifacts are singed with the > > key 1C1E2394D3194E1944613488F320986D35C33D6A [4] > > > > Other links for your review: > > * JIRA release notes [5] > > * source code tag "release-2.0.0-rc2" [6] [7] > > > > **Extra Remarks** > > > > * Part of the release is also official Docker images for Stateful > > Functions. This can be a separate process, since the creation of those > > relies on the fact that we have distribution jars already deployed to > > Maven. I will follow-up with this after these artifacts are officially > > released. > > In the meantime, there is this discussion [8] ongoing about where to host > > the StateFun Dockerfiles. > > * The Flink Website and blog post is also being worked on (by Marta) as > > part of the release, to incorporate the new Stateful Functions project. > We > > can follow up with a link to those changes afterwards in this vote > thread, > > but that would not block you to test and cast your votes already. > > * Since the Flink website changes are still being worked on, you will not > > yet be able to find the Stateful Functions docs from there. Here are the > > links [9] [10]. > > > > **Vote Duration** > > > > The vote will be open for at least 72 hours *(target end date is next > > Tuesday, April 31).* > > It is adopted by majority approval, with at least 3 PMC affirmative > votes. > > > > Thanks, > > Gordon > > > > [1] > > > > > https://docs.google.com/document/d/1P9yjwSbPQtul0z2AXMnVolWQbzhxs68suJvzR6xMjcs/edit?usp=sharing > > [2] > https://dist.apache.org/repos/dist/dev/flink/flink-statefun-2.0.0-rc2/ > > [3] > > https://repository.apache.org/content/repositories/orgapacheflink-1340/ > > [4] https://dist.apache.org/repos/dist/release/flink/KEYS > > [5] > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346878 > > [6] > > > > > https://gitbox.apache.org/repos/asf?p=flink-statefun.git;a=commit;h=14ce58048a3dda792f2329cf14d30aa952f6cb24 > > [7] https://github.com/apache/flink-statefun/tree/release-2.0.0-rc2 > > [8] > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Creating-a-new-repo-to-host-Stateful-Functions-Dockerfiles-td39342.html > > [9] https://ci.apache.org/projects/flink/flink-statefun-docs-master/ > > [10] > https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/ > > > > TIP: You can create a `settings.xml` file with these contents: > > > > """ > > > > > > flink-statefun-2.0.0 > > > > > > > > flink-statefun-2.0.0 > > > > > > flink-statefun-2.0.0 > > > > https://repository.apache.org/content/repositories/orgapacheflink-1340/ > > > > > > > > archetype > > > > https://repository.apache.org/content/repositories/orgapacheflink-1340/ > > > > > >
[jira] [Created] (FLINK-16837) Disable trimStackTrace in surefire plugin
Chesnay Schepler created FLINK-16837: Summary: Disable trimStackTrace in surefire plugin Key: FLINK-16837 URL: https://issues.apache.org/jira/browse/FLINK-16837 Project: Flink Issue Type: Improvement Components: Build System Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.11.0 Surefire has a {{trimStackTrace}} option (enabled by default) which is supposed to cut off the junit parts of stacktraces. However this has various unfortunate side-effects, such as being overzealous and hiding important bits of the stacktrace and hiding suppressed exceptions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
contribute to Apache Flink
Hi Guys, I want to contribute to Apache Flink. Would you please give me the permission as a contributor? My JIRA ID is soundhearer.
Re: contribute to Apache Flink
Hi Leping, Welcome to the community! You no longer need contributor permissions. You can simply create a JIRA ticket and ask to be assigned to work on it. For some reasons[1], only committers can assign a Jira ticket now. You can also take a look at the Flink's contribution guidelines [2] for more information. Best, Hequn [1] https://flink.apache.org/contributing/contribute-code.html#create-jira-ticket-and-reach-consensus [2] https://flink.apache.org/contributing/how-to-contribute.html On Sat, Mar 28, 2020 at 7:09 AM Leping Huang wrote: > Hi Guys, > > I want to contribute to Apache Flink. > Would you please give me the permission as a contributor? > My JIRA ID is soundhearer. >
[jira] [Created] (FLINK-16838) Stateful Functions Quickstart archetype Dockerfile should reference a specific version tag
Tzu-Li (Gordon) Tai created FLINK-16838: --- Summary: Stateful Functions Quickstart archetype Dockerfile should reference a specific version tag Key: FLINK-16838 URL: https://issues.apache.org/jira/browse/FLINK-16838 Project: Flink Issue Type: Bug Components: Stateful Functions Reporter: Tzu-Li (Gordon) Tai Assignee: Tzu-Li (Gordon) Tai Currently, the quickstart archetype provides a skeleton Dockerfile that always builds on top of the latest image: {code} FROM statefun {code} While it happens to work for the first release since the `latest` tag will (coincidentally) point to the correct version, once we have multiple releases this will no longer be correct. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16839) Connectors Kinesis metrics can be disabled
Muchl created FLINK-16839: - Summary: Connectors Kinesis metrics can be disabled Key: FLINK-16839 URL: https://issues.apache.org/jira/browse/FLINK-16839 Project: Flink Issue Type: Improvement Components: Connectors / Kinesis Affects Versions: 1.10.0 Reporter: Muchl Currently, there are 9 metrics in the kinesis connector, each of which is recorded according to the kinesis shard dimension. If there are enough shards, taskmanager mtrics will be unavailable. In our production environment, a case of a job that reads dynamodb stream kinesis adapter, this dynamodb has more than 10,000 shards, multiplied by 9 metrics, there are more than 100,000 metrics for kinesis, and the entire metrics output reaches tens of MB , Cause prometheus cannot collect metrics. We should have a configuration item that can disable kinesis metrics -- This message was sent by Atlassian Jira (v8.3.4#803005)