Re: Flink native k8s integration vs. operator
Thanks for volunteering to drive this effort, Marton, Thomas and Gyula. Looking forward to the public discussion. Please feel free to reach out if there's anything you need from us. Thank you~ Xintong Song On Fri, Jan 14, 2022 at 8:27 AM Chenya Zhang wrote: > Thanks Thomas, Gyula, and Marton for driving this effort! It would greatly > ease the adoption of Apache Flink on Kubernetes and help to address the > current operational pain points as mentioned. Look forward to the proposal > and more discussions! > > Best, > Chenya > > On Thu, Jan 13, 2022 at 12:15 PM Márton Balassi > wrote: > >> Hi All, >> >> I am pleased to see the level of enthusiasm and technical consideration >> already emerging in this thread. I wholeheartedly support building an >> operator and endorsing it via placing it under the Apache Flink umbrella >> (as a separate repository) as the current lack of it is clearly becoming >> an >> adoption bottleneck for large scale Flink users. The next logical step is >> to write a FLIP to agree on the technical details, so that we can put >> forward the proposal to the Flink PMC for creating a new repository with a >> clear purpose in mind. I volunteer to work with Thomas and Gyula on the >> initial wording on the proposal which we will put up for public discussion >> in the coming weeks. >> >> Best, >> Marton >> >> On Thu, Jan 13, 2022 at 9:22 AM Konstantin Knauf >> wrote: >> >> > Hi Thomas, >> > >> > Yes, I was referring to a separate repository under Apache Flink. >> > >> > Cheers, >> > >> > Konstantin >> > >> > On Thu, Jan 13, 2022 at 6:19 AM Thomas Weise wrote: >> > >> >> Hi everyone, >> >> >> >> Thanks for the feedback and discussion. A few additional thoughts: >> >> >> >> [Konstantin] > With respect to common lifecycle management operations: >> >> these features are >> >> > not available (within Apache Flink) for any of the other resource >> >> providers >> >> > (YARN, Standalone) either. From this perspective, I wouldn't consider >> >> this >> >> > a shortcoming of the Kubernetes integration. >> >> >> >> I think time and evolution of the ecosystem are factors to consider as >> >> well. The state and usage of Flink was much different when YARN >> >> integration was novel. Expectations are different today and the >> >> lifecycle functionality provided by an operator may as well be >> >> considered essential to support the concept of a Flink application on >> >> k8s. After few years learning from operator experience outside of >> >> Flink it might be a good time to fill the gap. >> >> >> >> [Konstantin] > I still believe that we should keep this focus on low >> >> > level composable building blocks (like Jobs and Snapshots) in Apache >> >> Flink >> >> > to make it easy for everyone to build fitting higher level >> abstractions >> >> > like a FlinkApplication Custom Resource on top of it. >> >> >> >> I completely agree that it is important that the basic functions of >> >> Flink are solid and continued focus is necessary. Thanks for sharing >> >> the pointers, these are great improvements. At the same time, >> >> ecosystem, contributor base and user spectrum are growing. There have >> >> been significant additions in many areas of Flink including connectors >> >> and higher level abstractions like statefun, SQL and Python. It's also >> >> evident from additional repositories/subprojects that we have in Flink >> >> today. >> >> >> >> [Konstantin] > Having said this, if others in the community have the >> >> capacity to push and >> >> > *maintain* a somewhat minimal "reference" Kubernetes Operator for >> Apache >> >> > Flink, I don't see any blockers. If or when this happens, I'd see >> some >> >> > clear benefits of using a separate repository (easier independent >> >> > versioning and releases, different build system & tooling (go, I >> >> assume)). >> >> >> >> Naturally different contributors to the project have different focus. >> >> Let's find out if there is strong enough interest to take this on and >> >> strong enough commitment to maintain. As I see it, there is a >> >> tremendous amount of internal investment going into operationalizing >> >> Flink within many companies. Improvements to the operational side of >> >> Flink like the operator would complement Flink nicely. I assume that >> >> you are referring to a separate repository within Apache Flink, which >> >> would give it the chance to achieve better sustainability than the >> >> existing external operator efforts. There is also the fact that some >> >> organizations which are heavily invested in operationalizing Flink are >> >> allowing contributing to Apache Flink itself but less so to arbitrary >> >> github projects. Regarding the tooling, it could well turn out that >> >> Java is a good alternative given the ecosystem focus and that there is >> >> an opportunity for reuse in certain aspects (metrics, logging etc.). >> >> >> >> [Yang] > I think Xintong has given a strong point why we introduced >> >> the native K8s in
Re: [FEEDBACK] Metadata Platforms / Catalogs / Lineage integration
Hi I am a software engineer from Xiaomi. Last year we used metacat(https://github.com/Netflix/metacat) to manage all metadata, including Hive, Kudu, Doris, Iceberg, Elasticsearch, Talos (Xiaomi self-developed message queue), Mysql, Tidb.. Metacat is well compatible with the hive-metastore protocol. Therefore, we can directly use FlinkHiveCatalog to connect metacat to create different Tables, including Hive tables, or other generic types of tables. All systems are abstracted into catalog.database.table structure. So in FlinkSQL we can access any registered table through catalog.database.table. In addition, metacat uniformly manages all table creation, deletion, and partitioning operations. By analyzing the audit log of metacat, we can easily obtain the DDL lineage of different tables. At the same time, with the use of ranger(https://github.com/ranger/ranger), we have added permission control to the Flink framework, and all permission information will be saved in the form of catalog.database.table. We also modified the logic related to FlinkJobListener. By exposing the JobGraph, we can obtain the lineage information of the job by parsing the JobGraph. To sum up, unified metadata management is convenient for managing different systems and connecting to Flink, and at the same time, it is convenient for unified permission management and obtaining table-related lineage information. On Fri, Jan 14, 2022 at 3:14 AM Maciej Obuchowski < obuchowski.mac...@gmail.com> wrote: > Hello, > > I'm an OpenLineage committer - and previously, a minor Flink contributor. > OpenLineage community is very interested in conversation about Flink > metadata, and we'll be happy to cooperate with the Flink community. > > Best, > Maciej Obuchowski > > > > czw., 13 sty 2022 o 18:12 Martijn Visser > napisał(a): > > > > Hi all, > > > > @Andrew thanks for sharing that! > > > > @Tero good point, I should have clarified the purpose. I want to > understand > > what "metadata platforms" tools are used or evaluated by the Flink > > community, what's their purpose for using such a tool (is it as a generic > > catalogue, as a data discovery tool, is lineage the important part etc) > and > > what problems are people trying to solve with them. This space is > > developing rapidly and there are many open source and commercial tools > > popping up/growing, which is also why I'm trying to keep an open vision > on > > how this space is evolving. > > > > If the Flink community wants to integrate with metadata tools, I fully > > agree that ideally we do that via standards. My perception is at this > > moment that no clear standard has yet been established. You mentioned > > open-metadata.org, but I believe https://openlineage.io/ is also an > > alternative standard. > > > > Best regards, > > > > Martijn > > > > On Thu, 13 Jan 2022 at 17:00, Tero Paananen > wrote: > > > > > > I'm currently checking out different metadata platforms, such as > > > Amundsen [1] and Datahub [2]. In short, these types of tools try to > address > > > problems related to topics such as data discovery, data lineage and an > > > overall data catalogue. > > > > > > > > I'm reaching out to the Dev and User mailing lists to get some > feedback. > > > It would really help if you could spend a couple of minutes to let me > know > > > if you already use either one of the two mentioned metadata platforms > or > > > another one, or are you evaluating such tools? If so, is that for the > > > purpose as a catalogue, for lineage or anything else? Any type of > feedback > > > on these types of tools is appreciated. > > > > > > I hope you don't mind answers off-list. > > > > > > You didn't say what purpose you're evaluating these tools for, but if > > > you're evaluating platforms for integration with Flink, I wouldn't > > > approach it with a particular product in mind. Rather I'd create some > > > sort of facility to propagate metadata and/or lineage information in a > > > generic way and allow Flink users to plug in their favorite metadata > > > tool. Using standards like OpenLineage, for example. I believe Egeria > > > is also trying to create an open standard for metadata.; > > > > > > If you're evaluating data catalogs for personal use or use in a > > > particular project, Andrew's answer about the Wikimedia evaluation is > > > a good start. It's missing OpenMetadata (https://open-metadata.org/). > > > That one is showing a LOT of promise. Wikimedia's evaluation is also > > > missing industry leading commercial products (understandably, given > > > their mission). Collibra and Alation probably the ones that pop up > > > most often. > > > > > > I have personally looked into both DataHub and Amundsen. My high level > > > feedback is that DataHub is overengineered, and using proprietary > > > LinkedIn technology platform(s), which aren't widely used anywhere. > > > Amundsen is much less flexible than DataHub and quite basic in its > > > functionality. If you need anything beyond wh
Unsubscribe
Unsubscribe
FlinkKafkaConsumer and FlinkKafkaProducer and Kafka Cluster Migration
Hello, Currently we are using FlinkKafkaConsumer and FlinkKafkaProducer and planning to migrate to different Kafka cluster. Are boostrap servers, username and passwords part of FlinkKafkaConsumer and FlinkKafkaProducer ? So if we take savepoint change boostrap server and credentials and start job from savepoint, will it use new connection properties and old one from savepoint? Assuming that we connected to new Kafka cluster, I think that FlinkKafkaConsumer offsets will be reset, because new Kafka cluster will be empty and FlinkKafkaConsumer will not be able to seek to stored offsets, am I right? Thanks, Alexey
Re: Flink native k8s integration vs. operator
Thanks Thomas, Gyula, and Marton for driving this effort! It would greatly ease the adoption of Apache Flink on Kubernetes and help to address the current operational pain points as mentioned. Look forward to the proposal and more discussions! Best, Chenya On Thu, Jan 13, 2022 at 12:15 PM Márton Balassi wrote: > Hi All, > > I am pleased to see the level of enthusiasm and technical consideration > already emerging in this thread. I wholeheartedly support building an > operator and endorsing it via placing it under the Apache Flink umbrella > (as a separate repository) as the current lack of it is clearly becoming an > adoption bottleneck for large scale Flink users. The next logical step is > to write a FLIP to agree on the technical details, so that we can put > forward the proposal to the Flink PMC for creating a new repository with a > clear purpose in mind. I volunteer to work with Thomas and Gyula on the > initial wording on the proposal which we will put up for public discussion > in the coming weeks. > > Best, > Marton > > On Thu, Jan 13, 2022 at 9:22 AM Konstantin Knauf > wrote: > > > Hi Thomas, > > > > Yes, I was referring to a separate repository under Apache Flink. > > > > Cheers, > > > > Konstantin > > > > On Thu, Jan 13, 2022 at 6:19 AM Thomas Weise wrote: > > > >> Hi everyone, > >> > >> Thanks for the feedback and discussion. A few additional thoughts: > >> > >> [Konstantin] > With respect to common lifecycle management operations: > >> these features are > >> > not available (within Apache Flink) for any of the other resource > >> providers > >> > (YARN, Standalone) either. From this perspective, I wouldn't consider > >> this > >> > a shortcoming of the Kubernetes integration. > >> > >> I think time and evolution of the ecosystem are factors to consider as > >> well. The state and usage of Flink was much different when YARN > >> integration was novel. Expectations are different today and the > >> lifecycle functionality provided by an operator may as well be > >> considered essential to support the concept of a Flink application on > >> k8s. After few years learning from operator experience outside of > >> Flink it might be a good time to fill the gap. > >> > >> [Konstantin] > I still believe that we should keep this focus on low > >> > level composable building blocks (like Jobs and Snapshots) in Apache > >> Flink > >> > to make it easy for everyone to build fitting higher level > abstractions > >> > like a FlinkApplication Custom Resource on top of it. > >> > >> I completely agree that it is important that the basic functions of > >> Flink are solid and continued focus is necessary. Thanks for sharing > >> the pointers, these are great improvements. At the same time, > >> ecosystem, contributor base and user spectrum are growing. There have > >> been significant additions in many areas of Flink including connectors > >> and higher level abstractions like statefun, SQL and Python. It's also > >> evident from additional repositories/subprojects that we have in Flink > >> today. > >> > >> [Konstantin] > Having said this, if others in the community have the > >> capacity to push and > >> > *maintain* a somewhat minimal "reference" Kubernetes Operator for > Apache > >> > Flink, I don't see any blockers. If or when this happens, I'd see some > >> > clear benefits of using a separate repository (easier independent > >> > versioning and releases, different build system & tooling (go, I > >> assume)). > >> > >> Naturally different contributors to the project have different focus. > >> Let's find out if there is strong enough interest to take this on and > >> strong enough commitment to maintain. As I see it, there is a > >> tremendous amount of internal investment going into operationalizing > >> Flink within many companies. Improvements to the operational side of > >> Flink like the operator would complement Flink nicely. I assume that > >> you are referring to a separate repository within Apache Flink, which > >> would give it the chance to achieve better sustainability than the > >> existing external operator efforts. There is also the fact that some > >> organizations which are heavily invested in operationalizing Flink are > >> allowing contributing to Apache Flink itself but less so to arbitrary > >> github projects. Regarding the tooling, it could well turn out that > >> Java is a good alternative given the ecosystem focus and that there is > >> an opportunity for reuse in certain aspects (metrics, logging etc.). > >> > >> [Yang] > I think Xintong has given a strong point why we introduced > >> the native K8s integration, which is active resource management. > >> > I have a concrete example for this in the production. When a K8s node > >> is down, the standalone K8s deployment will take longer > >> > recovery time based on the K8s eviction time(IIRC, default is 5 > >> minutes). For the native K8s integration, Flink RM could be aware of the > >> > TM heartbeat lost and allocate a new one
Re: Flink native k8s integration vs. operator
Hi All, I am pleased to see the level of enthusiasm and technical consideration already emerging in this thread. I wholeheartedly support building an operator and endorsing it via placing it under the Apache Flink umbrella (as a separate repository) as the current lack of it is clearly becoming an adoption bottleneck for large scale Flink users. The next logical step is to write a FLIP to agree on the technical details, so that we can put forward the proposal to the Flink PMC for creating a new repository with a clear purpose in mind. I volunteer to work with Thomas and Gyula on the initial wording on the proposal which we will put up for public discussion in the coming weeks. Best, Marton On Thu, Jan 13, 2022 at 9:22 AM Konstantin Knauf wrote: > Hi Thomas, > > Yes, I was referring to a separate repository under Apache Flink. > > Cheers, > > Konstantin > > On Thu, Jan 13, 2022 at 6:19 AM Thomas Weise wrote: > >> Hi everyone, >> >> Thanks for the feedback and discussion. A few additional thoughts: >> >> [Konstantin] > With respect to common lifecycle management operations: >> these features are >> > not available (within Apache Flink) for any of the other resource >> providers >> > (YARN, Standalone) either. From this perspective, I wouldn't consider >> this >> > a shortcoming of the Kubernetes integration. >> >> I think time and evolution of the ecosystem are factors to consider as >> well. The state and usage of Flink was much different when YARN >> integration was novel. Expectations are different today and the >> lifecycle functionality provided by an operator may as well be >> considered essential to support the concept of a Flink application on >> k8s. After few years learning from operator experience outside of >> Flink it might be a good time to fill the gap. >> >> [Konstantin] > I still believe that we should keep this focus on low >> > level composable building blocks (like Jobs and Snapshots) in Apache >> Flink >> > to make it easy for everyone to build fitting higher level abstractions >> > like a FlinkApplication Custom Resource on top of it. >> >> I completely agree that it is important that the basic functions of >> Flink are solid and continued focus is necessary. Thanks for sharing >> the pointers, these are great improvements. At the same time, >> ecosystem, contributor base and user spectrum are growing. There have >> been significant additions in many areas of Flink including connectors >> and higher level abstractions like statefun, SQL and Python. It's also >> evident from additional repositories/subprojects that we have in Flink >> today. >> >> [Konstantin] > Having said this, if others in the community have the >> capacity to push and >> > *maintain* a somewhat minimal "reference" Kubernetes Operator for Apache >> > Flink, I don't see any blockers. If or when this happens, I'd see some >> > clear benefits of using a separate repository (easier independent >> > versioning and releases, different build system & tooling (go, I >> assume)). >> >> Naturally different contributors to the project have different focus. >> Let's find out if there is strong enough interest to take this on and >> strong enough commitment to maintain. As I see it, there is a >> tremendous amount of internal investment going into operationalizing >> Flink within many companies. Improvements to the operational side of >> Flink like the operator would complement Flink nicely. I assume that >> you are referring to a separate repository within Apache Flink, which >> would give it the chance to achieve better sustainability than the >> existing external operator efforts. There is also the fact that some >> organizations which are heavily invested in operationalizing Flink are >> allowing contributing to Apache Flink itself but less so to arbitrary >> github projects. Regarding the tooling, it could well turn out that >> Java is a good alternative given the ecosystem focus and that there is >> an opportunity for reuse in certain aspects (metrics, logging etc.). >> >> [Yang] > I think Xintong has given a strong point why we introduced >> the native K8s integration, which is active resource management. >> > I have a concrete example for this in the production. When a K8s node >> is down, the standalone K8s deployment will take longer >> > recovery time based on the K8s eviction time(IIRC, default is 5 >> minutes). For the native K8s integration, Flink RM could be aware of the >> > TM heartbeat lost and allocate a new one timely. >> >> Thanks for sharing this, we should evaluate it as part of a proposal. >> If we can optimize recovery or scaling with active resource management >> then perhaps it is worth to support it through the operator. >> Previously mentioned operators all rely on the standalone model. >> >> Cheers, >> Thomas >> >> On Wed, Jan 12, 2022 at 3:21 AM Konstantin Knauf >> wrote: >> > >> > cc dev@ >> > >> > Hi Thomas, Hi everyone, >> > >> > Thank you for starting this discussion and sorry for chiming
Re: [FEEDBACK] Metadata Platforms / Catalogs / Lineage integration
Hello, I'm an OpenLineage committer - and previously, a minor Flink contributor. OpenLineage community is very interested in conversation about Flink metadata, and we'll be happy to cooperate with the Flink community. Best, Maciej Obuchowski czw., 13 sty 2022 o 18:12 Martijn Visser napisał(a): > > Hi all, > > @Andrew thanks for sharing that! > > @Tero good point, I should have clarified the purpose. I want to understand > what "metadata platforms" tools are used or evaluated by the Flink > community, what's their purpose for using such a tool (is it as a generic > catalogue, as a data discovery tool, is lineage the important part etc) and > what problems are people trying to solve with them. This space is > developing rapidly and there are many open source and commercial tools > popping up/growing, which is also why I'm trying to keep an open vision on > how this space is evolving. > > If the Flink community wants to integrate with metadata tools, I fully > agree that ideally we do that via standards. My perception is at this > moment that no clear standard has yet been established. You mentioned > open-metadata.org, but I believe https://openlineage.io/ is also an > alternative standard. > > Best regards, > > Martijn > > On Thu, 13 Jan 2022 at 17:00, Tero Paananen wrote: > > > > I'm currently checking out different metadata platforms, such as > > Amundsen [1] and Datahub [2]. In short, these types of tools try to address > > problems related to topics such as data discovery, data lineage and an > > overall data catalogue. > > > > > > I'm reaching out to the Dev and User mailing lists to get some feedback. > > It would really help if you could spend a couple of minutes to let me know > > if you already use either one of the two mentioned metadata platforms or > > another one, or are you evaluating such tools? If so, is that for the > > purpose as a catalogue, for lineage or anything else? Any type of feedback > > on these types of tools is appreciated. > > > > I hope you don't mind answers off-list. > > > > You didn't say what purpose you're evaluating these tools for, but if > > you're evaluating platforms for integration with Flink, I wouldn't > > approach it with a particular product in mind. Rather I'd create some > > sort of facility to propagate metadata and/or lineage information in a > > generic way and allow Flink users to plug in their favorite metadata > > tool. Using standards like OpenLineage, for example. I believe Egeria > > is also trying to create an open standard for metadata.; > > > > If you're evaluating data catalogs for personal use or use in a > > particular project, Andrew's answer about the Wikimedia evaluation is > > a good start. It's missing OpenMetadata (https://open-metadata.org/). > > That one is showing a LOT of promise. Wikimedia's evaluation is also > > missing industry leading commercial products (understandably, given > > their mission). Collibra and Alation probably the ones that pop up > > most often. > > > > I have personally looked into both DataHub and Amundsen. My high level > > feedback is that DataHub is overengineered, and using proprietary > > LinkedIn technology platform(s), which aren't widely used anywhere. > > Amundsen is much less flexible than DataHub and quite basic in its > > functionality. If you need anything beyond what it already offers, > > good luck. > > > > We dumped Amundsen in favor of OpenMetadata a few months back. We > > don't have enough data points to fully evaluate OpenMetadata yet. > > > > -TPP > >
Re: [FEEDBACK] Metadata Platforms / Catalogs / Lineage integration
Hi all, @Andrew thanks for sharing that! @Tero good point, I should have clarified the purpose. I want to understand what "metadata platforms" tools are used or evaluated by the Flink community, what's their purpose for using such a tool (is it as a generic catalogue, as a data discovery tool, is lineage the important part etc) and what problems are people trying to solve with them. This space is developing rapidly and there are many open source and commercial tools popping up/growing, which is also why I'm trying to keep an open vision on how this space is evolving. If the Flink community wants to integrate with metadata tools, I fully agree that ideally we do that via standards. My perception is at this moment that no clear standard has yet been established. You mentioned open-metadata.org, but I believe https://openlineage.io/ is also an alternative standard. Best regards, Martijn On Thu, 13 Jan 2022 at 17:00, Tero Paananen wrote: > > I'm currently checking out different metadata platforms, such as > Amundsen [1] and Datahub [2]. In short, these types of tools try to address > problems related to topics such as data discovery, data lineage and an > overall data catalogue. > > > > I'm reaching out to the Dev and User mailing lists to get some feedback. > It would really help if you could spend a couple of minutes to let me know > if you already use either one of the two mentioned metadata platforms or > another one, or are you evaluating such tools? If so, is that for the > purpose as a catalogue, for lineage or anything else? Any type of feedback > on these types of tools is appreciated. > > I hope you don't mind answers off-list. > > You didn't say what purpose you're evaluating these tools for, but if > you're evaluating platforms for integration with Flink, I wouldn't > approach it with a particular product in mind. Rather I'd create some > sort of facility to propagate metadata and/or lineage information in a > generic way and allow Flink users to plug in their favorite metadata > tool. Using standards like OpenLineage, for example. I believe Egeria > is also trying to create an open standard for metadata.; > > If you're evaluating data catalogs for personal use or use in a > particular project, Andrew's answer about the Wikimedia evaluation is > a good start. It's missing OpenMetadata (https://open-metadata.org/). > That one is showing a LOT of promise. Wikimedia's evaluation is also > missing industry leading commercial products (understandably, given > their mission). Collibra and Alation probably the ones that pop up > most often. > > I have personally looked into both DataHub and Amundsen. My high level > feedback is that DataHub is overengineered, and using proprietary > LinkedIn technology platform(s), which aren't widely used anywhere. > Amundsen is much less flexible than DataHub and quite basic in its > functionality. If you need anything beyond what it already offers, > good luck. > > We dumped Amundsen in favor of OpenMetadata a few months back. We > don't have enough data points to fully evaluate OpenMetadata yet. > > -TPP >
Re: [DISCUSS] Future of Per-Job Mode
Regarding session mode: ## Session Mode * main() method executed in client Session mode also supports execution of the main method on Jobmanager with submission through REST API. That's how Flinkk k8s operators like [1] work. It's actually an important capability because it allows for allocation of the cluster resources prior to taking down the previous job during upgrade when the goal is optimization for availability. Thanks, Thomas [1] https://github.com/lyft/flinkk8soperator On Thu, Jan 13, 2022 at 12:32 AM Konstantin Knauf wrote: > > Hi everyone, > > I would like to discuss and understand if the benefits of having Per-Job > Mode in Apache Flink outweigh its drawbacks. > > > *# Background: Flink's Deployment Modes* > Flink currently has three deployment modes. They differ in the following > dimensions: > * main() method executed on Jobmanager or Client > * dependencies shipped by client or bundled with all nodes > * number of jobs per cluster & relationship between job and cluster > lifecycle* (supported resource providers) > > ## Application Mode > * main() method executed on Jobmanager > * dependencies already need to be available on all nodes > * dedicated cluster for all jobs executed from the same main()-method > (Note: applications with more than one job, currently still significant > limitations like missing high-availability). Technically, a session cluster > dedicated to all jobs submitted from the same main() method. > * supported by standalone, native kubernetes, YARN > > ## Session Mode > * main() method executed in client > * dependencies are distributed from and by the client to all nodes > * cluster is shared by multiple jobs submitted from different clients, > independent lifecycle > * supported by standalone, Native Kubernetes, YARN > > ## Per-Job Mode > * main() method executed in client > * dependencies are distributed from and by the client to all nodes > * dedicated cluster for a single job > * supported by YARN only > > > *# Reasons to Keep** There are use cases where you might need the > combination of a single job per cluster, but main() method execution in the > client. This combination is only supported by per-job mode. > * It currently exists. Existing users will need to migrate to either > session or application mode. > > > *# Reasons to Drop** With Per-Job Mode and Application Mode we have two > modes that for most users probably do the same thing. Specifically, for > those users that don't care where the main() method is executed and want to > submit a single job per cluster. Having two ways to do the same thing is > confusing. > * Per-Job Mode is only supported by YARN anyway. If we keep it, we should > work towards support in Kubernetes and Standalone, too, to reduce special > casing. > * Dropping per-job mode would reduce complexity in the code and allow us to > dedicate more resources to the other two deployment modes. > * I believe with session mode and application mode we have to easily > distinguishable and understandable deployment modes that cover Flink's use > cases: >* session mode: olap-style, interactive jobs/queries, short lived batch > jobs, very small jobs, traditional cluster-centric deployment mode (fits > the "Hadoop world") >* application mode: long-running streaming jobs, large scale & > heterogenous jobs (resource isolation!), application-centric deployment > mode (fits the "Kubernetes world") > > > *# Call to Action* > * Do you use per-job mode? If so, why & would you be able to migrate to one > of the other methods? > * Am I missing any pros/cons? > * Are you in favor of dropping per-job mode midterm? > > Cheers and thank you, > > Konstantin > > -- > > Konstantin Knauf > > https://twitter.com/snntrable > > https://github.com/knaufk
Re: [FEEDBACK] Metadata Platforms / Catalogs / Lineage integration
Hello, I'm part of the DataHub community and working in collaboration with the company behind it: http://acryldata.io Happy to have a conversation or clarify any questions you may have on DataHub :) Have a nice day! Em qui., 13 de jan. de 2022 às 15:33, Andrew Otto escreveu: > Hello! The Wikimedia Foundation is currently doing a similar evaluation > (although we are not currently including any Flink considerations). > > > https://wikitech.wikimedia.org/wiki/Data_Catalog_Application_Evaluation_Rubric > > More details will be published there as folks keep working on this. > Hope that helps a little bit! :) > > -Andrew Otto > > On Thu, Jan 13, 2022 at 10:27 AM Martijn Visser > wrote: > >> Hi everyone, >> >> I'm currently checking out different metadata platforms, such as Amundsen >> [1] and Datahub [2]. In short, these types of tools try to address problems >> related to topics such as data discovery, data lineage and an overall data >> catalogue. >> >> I'm reaching out to the Dev and User mailing lists to get some feedback. >> It would really help if you could spend a couple of minutes to let me know >> if you already use either one of the two mentioned metadata platforms or >> another one, or are you evaluating such tools? If so, is that for >> the purpose as a catalogue, for lineage or anything else? Any type of >> feedback on these types of tools is appreciated. >> >> Best regards, >> >> Martijn >> >> [1] https://github.com/amundsen-io/amundsen/ >> [2] https://github.com/linkedin/datahub >> >> >>
Re: [FEEDBACK] Metadata Platforms / Catalogs / Lineage integration
Hello! The Wikimedia Foundation is currently doing a similar evaluation (although we are not currently including any Flink considerations). https://wikitech.wikimedia.org/wiki/Data_Catalog_Application_Evaluation_Rubric More details will be published there as folks keep working on this. Hope that helps a little bit! :) -Andrew Otto On Thu, Jan 13, 2022 at 10:27 AM Martijn Visser wrote: > Hi everyone, > > I'm currently checking out different metadata platforms, such as Amundsen > [1] and Datahub [2]. In short, these types of tools try to address problems > related to topics such as data discovery, data lineage and an overall data > catalogue. > > I'm reaching out to the Dev and User mailing lists to get some feedback. > It would really help if you could spend a couple of minutes to let me know > if you already use either one of the two mentioned metadata platforms or > another one, or are you evaluating such tools? If so, is that for > the purpose as a catalogue, for lineage or anything else? Any type of > feedback on these types of tools is appreciated. > > Best regards, > > Martijn > > [1] https://github.com/amundsen-io/amundsen/ > [2] https://github.com/linkedin/datahub > > >
[FEEDBACK] Metadata Platforms / Catalogs / Lineage integration
Hi everyone, I'm currently checking out different metadata platforms, such as Amundsen [1] and Datahub [2]. In short, these types of tools try to address problems related to topics such as data discovery, data lineage and an overall data catalogue. I'm reaching out to the Dev and User mailing lists to get some feedback. It would really help if you could spend a couple of minutes to let me know if you already use either one of the two mentioned metadata platforms or another one, or are you evaluating such tools? If so, is that for the purpose as a catalogue, for lineage or anything else? Any type of feedback on these types of tools is appreciated. Best regards, Martijn [1] https://github.com/amundsen-io/amundsen/ [2] https://github.com/linkedin/datahub
Re: Upgrade to flink 1.14.2 and using new Data Source and Sink API
Hi Daniel, These logs look pretty normal. As for the -1 epochs, depending on which version you're using, I think that this might apply: "For a producer which is being initialized for the first time, the producerId and epoch will be set to -1. For a producer which is reinitializing, a positive valued producerId and epoch must be provided." (From: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=89068820#:~:text=For%20a%20producer%20which%20is,and%20epoch%20must%20be%20provided) I think that these logs are being created when a producer is being initialized, likely because the logging mode is INFO, which is quite verbose. Kind regards, Mika On 13.01.2022 13:40, Daniel Peled wrote: Hi everyone, We have upgraded our flink version from 1.13.5 to 1.14.2 We are using the new kafkaSource and KafkaSink (instead of FlinkKafkaConsumer and FlinkKafkaProducer) After the upgrade, we *keep seeing* these log messages in TM logs Is this OK ? Are we doing something wrong ? BR, Danny [image: image.png] Mika Naylor https://autophagy.io
Flink 1.14.2 - Log4j2 -Dlog4j.configurationFile is ignored and falls back to default /opt/flink/conf/log4j-console.properties
Hey All I'm Running Flink 1.14.2, it seems like it ignores system property -Dlog4j.configurationFile and falls back to /opt/flink/conf/log4j-console.properties I enabled debug log for log4j2 ( -Dlog4j2.debug) DEBUG StatusLogger Catching java.io.FileNotFoundException: file:/opt/flink/conf/log4j-console.properties (No such file or directory) at java.base/java.io.FileInputStream.open0(Native Method) at java.base/java.io.FileInputStream.open(Unknown Source) at java.base/java.io.FileInputStream.(Unknown Source) at org.apache.logging.log4j.core.config.ConfigurationFactory.getInputFromString(ConfigurationFactory.java:370) at org.apache.logging.log4j.core.config.ConfigurationFactory$Factory.getConfiguration(ConfigurationFactory.java:513) at org.apache.logging.log4j.core.config.ConfigurationFactory$Factory.getConfiguration(ConfigurationFactory.java:499) at org.apache.logging.log4j.core.config.ConfigurationFactory$Factory.getConfiguration(ConfigurationFactory.java:422) at org.apache.logging.log4j.core.config.ConfigurationFactory.getConfiguration(ConfigurationFactory.java:322) at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:695) at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:716) at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:270) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:155) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:47) at org.apache.logging.log4j.LogManager.getContext(LogManager.java:196) at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:137) at org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:55) at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:47) at org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:33) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:329) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:349) at org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils.(AkkaRpcServiceUtils.java:55) at org.apache.flink.runtime.rpc.akka.AkkaRpcSystem.remoteServiceBuilder(AkkaRpcSystem.java:42) at org.apache.flink.runtime.rpc.akka.CleanupOnCloseRpcSystem.remoteServiceBuilder(CleanupOnCloseRpcSystem.java:77) at org.apache.flink.runtime.rpc.RpcUtils.createRemoteRpcService(RpcUtils.java:184) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:300) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:243) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:193) at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:190) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:617) Where I see the property is being loaded while deploying the cluster source:{ class:org.apache.flink.configuration.GlobalConfiguration method:loadYAMLResource file:GlobalConfiguration.java line:213 } message:Loading configuration property: env.java.opts, -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/dumps -Dlog4j.configurationFile=/opt/log4j2/log4j2.xml -Dlog4j2.debug=true in addition, following the documentation[1], it seems like Flink comes with default log4j properties files located in /opt/flink/conf looking into that dir once the cluster is deployed, only flink-conf.yaml is there. [cid:08bf37ec-7fed-4caf-a08d-3d27f2edb5d5] Docker file content FROM flink:1.14.2-scala_2.12-java11 ARG JAR_FILE COPY target/${JAR_FILE} $FLINK_HOME/usrlib/flink-job.jar ADD log4j2.xml /opt/log4j2/log4j2.xml It perfectly works in 1.12.2 with the same log4j2.xml file and system property. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/advanced/logging/#configuring-log4j-2 Best, Tamir Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender by e-mail immediately. Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. Viruses: Although we have taken steps toward ensuring that this e-mail and attachments are free from any virus, we advise that in keeping with good computing practice the recipient should ensure they are actually virus free.
Re: [DISCUSS] Future of Per-Job Mode
Hi Konstantin, Thanks a lot for starting this discussion! I hope my thoughts and experiences why users use Per-Job Mode, especially in YARN can help: #1. Per-job mode makes managing dependencies easier: I have met some customers who used Per-Job Mode to submit jobs with a lot of local user-defined jars by using '-C' option directly. They do not need to upload these jars to some remote file system(e.g. HDFS) first, which makes their life easier. #2. In YARN mode, currently, there are some limitations of Application Mode: in this jira(https://issues.apache.org/jira/browse/FLINK-24897) that I am working on, we find that YARN Application Mode do not support `usrlib` very well, which makes it hard to use FlinkUserCodeClassLoader to load classes in user-defined jars. I believe above 2 points, especially #2, can be reassured as we enhance the YARN Application Mode later but I think it is worthwhile to consider dependency management more carefully before we make decisions. Best, Biao Geng Konstantin Knauf 于2022年1月13日周四 16:32写道: > Hi everyone, > > I would like to discuss and understand if the benefits of having Per-Job > Mode in Apache Flink outweigh its drawbacks. > > > *# Background: Flink's Deployment Modes* > Flink currently has three deployment modes. They differ in the following > dimensions: > * main() method executed on Jobmanager or Client > * dependencies shipped by client or bundled with all nodes > * number of jobs per cluster & relationship between job and cluster > lifecycle* (supported resource providers) > > ## Application Mode > * main() method executed on Jobmanager > * dependencies already need to be available on all nodes > * dedicated cluster for all jobs executed from the same main()-method > (Note: applications with more than one job, currently still significant > limitations like missing high-availability). Technically, a session cluster > dedicated to all jobs submitted from the same main() method. > * supported by standalone, native kubernetes, YARN > > ## Session Mode > * main() method executed in client > * dependencies are distributed from and by the client to all nodes > * cluster is shared by multiple jobs submitted from different clients, > independent lifecycle > * supported by standalone, Native Kubernetes, YARN > > ## Per-Job Mode > * main() method executed in client > * dependencies are distributed from and by the client to all nodes > * dedicated cluster for a single job > * supported by YARN only > > > *# Reasons to Keep** There are use cases where you might need the > combination of a single job per cluster, but main() method execution in the > client. This combination is only supported by per-job mode. > * It currently exists. Existing users will need to migrate to either > session or application mode. > > > *# Reasons to Drop** With Per-Job Mode and Application Mode we have two > modes that for most users probably do the same thing. Specifically, for > those users that don't care where the main() method is executed and want to > submit a single job per cluster. Having two ways to do the same thing is > confusing. > * Per-Job Mode is only supported by YARN anyway. If we keep it, we should > work towards support in Kubernetes and Standalone, too, to reduce special > casing. > * Dropping per-job mode would reduce complexity in the code and allow us > to dedicate more resources to the other two deployment modes. > * I believe with session mode and application mode we have to easily > distinguishable and understandable deployment modes that cover Flink's use > cases: >* session mode: olap-style, interactive jobs/queries, short lived batch > jobs, very small jobs, traditional cluster-centric deployment mode (fits > the "Hadoop world") >* application mode: long-running streaming jobs, large scale & > heterogenous jobs (resource isolation!), application-centric deployment > mode (fits the "Kubernetes world") > > > *# Call to Action* > * Do you use per-job mode? If so, why & would you be able to migrate to > one of the other methods? > * Am I missing any pros/cons? > * Are you in favor of dropping per-job mode midterm? > > Cheers and thank you, > > Konstantin > > -- > > Konstantin Knauf > > https://twitter.com/snntrable > > https://github.com/knaufk >
[DISCUSS] Future of Per-Job Mode
Hi everyone, I would like to discuss and understand if the benefits of having Per-Job Mode in Apache Flink outweigh its drawbacks. *# Background: Flink's Deployment Modes* Flink currently has three deployment modes. They differ in the following dimensions: * main() method executed on Jobmanager or Client * dependencies shipped by client or bundled with all nodes * number of jobs per cluster & relationship between job and cluster lifecycle* (supported resource providers) ## Application Mode * main() method executed on Jobmanager * dependencies already need to be available on all nodes * dedicated cluster for all jobs executed from the same main()-method (Note: applications with more than one job, currently still significant limitations like missing high-availability). Technically, a session cluster dedicated to all jobs submitted from the same main() method. * supported by standalone, native kubernetes, YARN ## Session Mode * main() method executed in client * dependencies are distributed from and by the client to all nodes * cluster is shared by multiple jobs submitted from different clients, independent lifecycle * supported by standalone, Native Kubernetes, YARN ## Per-Job Mode * main() method executed in client * dependencies are distributed from and by the client to all nodes * dedicated cluster for a single job * supported by YARN only *# Reasons to Keep** There are use cases where you might need the combination of a single job per cluster, but main() method execution in the client. This combination is only supported by per-job mode. * It currently exists. Existing users will need to migrate to either session or application mode. *# Reasons to Drop** With Per-Job Mode and Application Mode we have two modes that for most users probably do the same thing. Specifically, for those users that don't care where the main() method is executed and want to submit a single job per cluster. Having two ways to do the same thing is confusing. * Per-Job Mode is only supported by YARN anyway. If we keep it, we should work towards support in Kubernetes and Standalone, too, to reduce special casing. * Dropping per-job mode would reduce complexity in the code and allow us to dedicate more resources to the other two deployment modes. * I believe with session mode and application mode we have to easily distinguishable and understandable deployment modes that cover Flink's use cases: * session mode: olap-style, interactive jobs/queries, short lived batch jobs, very small jobs, traditional cluster-centric deployment mode (fits the "Hadoop world") * application mode: long-running streaming jobs, large scale & heterogenous jobs (resource isolation!), application-centric deployment mode (fits the "Kubernetes world") *# Call to Action* * Do you use per-job mode? If so, why & would you be able to migrate to one of the other methods? * Am I missing any pros/cons? * Are you in favor of dropping per-job mode midterm? Cheers and thank you, Konstantin -- Konstantin Knauf https://twitter.com/snntrable https://github.com/knaufk
Re: OutOfMemoryError: Java heap space while implmentating flink sql api
Hi Ronak, As mentioned in the Flink Community & Project information [1] the primary place for support are the mailing lists and user support should go to the User mailing list. Keep in mind that this is still done by the community and set up for asynchronous handling. If you want to have quick acknowledgment or SLAs, there are vendors that can offer commercial support on Flink. You can't compare the two statements, because in your first join you're also applying a TUMBLE. That means that you're not only maintaining state for your join, but also for your window. You're also using the old Group Window Aggregation function and it's recommended to use Window TVFs due to better performance optimizations [2] Best regards, Martijn [1] https://flink.apache.org/community.html#how-do-i-get-help-from-apache-flink [2] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-agg/#group-window-aggregation On Thu, 13 Jan 2022 at 06:33, Ronak Beejawat (rbeejawa) wrote: > HI Martijn, > > > > I posted the below query both the places(flink mailing list and stack > overflow) to get a quick response on it. > > Please let me know the exact poc / mailing list to post my quries if it is > causing trouble, so at least we can get quick acknowledgement on the issues > reported. > > > > Ok let me ask the below question in a simpler way > > > > *Join 1 * > > > > select * from cdrTable left join ccmversionsumapTable cvsm ON > (cdrTable.version = ccmversionsumapTable.ccmversion) group by > TUMBLE(PROCTIME(), INTERVAL '1' MINUTE), … > > (2.5 million left join with 23 records it is failing to compute and > throwing heap error) > > Note: This is small join example as compared to Join2 condition as shown > below. here we are using different connector for reading cdrTable -> kafka > connector and ccmversionsumapTable -> jdbc connector > > > > *Join 2* > > > > select * from cdrTable left join left join cmrTable cmr on (cdr.org_id = > cmr.org_id AND cdr.cluster_id = cmr.cluster_id AND > cdr.globalcallid_callmanagerid = cmr.globalcallid_callmanagerid AND > cdr.globalcallid_callid = cmr.globalcallid_callid AND > (cdr.origlegcallidentifier = cmr.callidentifier OR > cdr.destlegcallidentifier = cmr.callidentifier)), … (2.5 million left join > with 5 million it is computing properly without any heap error ) > > Note: This is bigger join example as compared to Join1 condition as shown > above. here we are using same connector for reading cdrTable , cmrTable -> > kafka connector > > > > So the question is with small join condition it is throwing heap error and > with bigger set of join it is working properly . Please help us on this > > > > Thanks > > Ronak Beejawat > > > > *From: *Martijn Visser > *Date: *Wednesday, 12 January 2022 at 7:43 PM > *To: *dev > *Cc: *commun...@flink.apache.org , > user@flink.apache.org , Hang Ruan < > ruanhang1...@gmail.com>, Shrinath Shenoy K (sshenoyk) , > Jayaprakash Kuravatti (jkuravat) , Krishna Singitam > (ksingita) , Nabhonil Sinha (nasinha) < > nasi...@cisco.com>, Vibhor Jain (vibhjain) , > Raghavendra Jsv (rjsv) , Arun Yadav (aruny) < > ar...@cisco.com>, Avi Sanwal (asanwal) > *Subject: *Re: OutOfMemoryError: Java heap space while implmentating > flink sql api > > Hi Ronak, > > > > I would like to ask you to stop cross-posting to all the Flink mailing > lists and then also post the same question to Stackoverflow. Both the > mailing lists and Stackoverflow are designed for asynchronous communication > and you should allow the community some days to address your question. > > > > Joins are state heavy. As mentioned in the documentation [1] "Thus, the > required state for computing the query result might grow infinitely > depending on the number of distinct input rows of all input tables and > intermediate join results." > > > > Best regards, > > > > Martijn > > > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/ > > > > > > On Wed, 12 Jan 2022 at 11:06, Ronak Beejawat (rbeejawa) < > rbeej...@cisco.com.invalid> wrote: > > Hi Team, > > I was trying to implement flink sql api join with 2 tables it is throwing > error OutOfMemoryError: Java heap space . PFB screenshot for flink cluster > memory details. > [Flink Memory Model][1] > > > [1]: https://i.stack.imgur.com/AOnQI.png > > **PFB below code snippet which I was trying:** > ``` > EnvironmentSettings settings = > EnvironmentSettings.newInstance().inStreamingMode().build(); > TableEnvironment tableEnv = TableEnvironment.create(settings); > > > tableEnv.getConfig().getConfiguration().setString("table.optimizer.agg-phase-strategy", > "TWO_PHASE"); > tableEnv.getConfig().getConfiguration().setString("table.optimizer.join-reorder-enabled", > "true"); > tableEnv.getConfig().getConfiguration().setString("table.exec.resource.default-parallelism", > "16"); > > tableEnv.executeSql("CREATE TEMPORARY TABLE ccmversionsumapTable (\r\n" > + " suna
Re: Flink native k8s integration vs. operator
Hi Thomas, Yes, I was referring to a separate repository under Apache Flink. Cheers, Konstantin On Thu, Jan 13, 2022 at 6:19 AM Thomas Weise wrote: > Hi everyone, > > Thanks for the feedback and discussion. A few additional thoughts: > > [Konstantin] > With respect to common lifecycle management operations: > these features are > > not available (within Apache Flink) for any of the other resource > providers > > (YARN, Standalone) either. From this perspective, I wouldn't consider > this > > a shortcoming of the Kubernetes integration. > > I think time and evolution of the ecosystem are factors to consider as > well. The state and usage of Flink was much different when YARN > integration was novel. Expectations are different today and the > lifecycle functionality provided by an operator may as well be > considered essential to support the concept of a Flink application on > k8s. After few years learning from operator experience outside of > Flink it might be a good time to fill the gap. > > [Konstantin] > I still believe that we should keep this focus on low > > level composable building blocks (like Jobs and Snapshots) in Apache > Flink > > to make it easy for everyone to build fitting higher level abstractions > > like a FlinkApplication Custom Resource on top of it. > > I completely agree that it is important that the basic functions of > Flink are solid and continued focus is necessary. Thanks for sharing > the pointers, these are great improvements. At the same time, > ecosystem, contributor base and user spectrum are growing. There have > been significant additions in many areas of Flink including connectors > and higher level abstractions like statefun, SQL and Python. It's also > evident from additional repositories/subprojects that we have in Flink > today. > > [Konstantin] > Having said this, if others in the community have the > capacity to push and > > *maintain* a somewhat minimal "reference" Kubernetes Operator for Apache > > Flink, I don't see any blockers. If or when this happens, I'd see some > > clear benefits of using a separate repository (easier independent > > versioning and releases, different build system & tooling (go, I > assume)). > > Naturally different contributors to the project have different focus. > Let's find out if there is strong enough interest to take this on and > strong enough commitment to maintain. As I see it, there is a > tremendous amount of internal investment going into operationalizing > Flink within many companies. Improvements to the operational side of > Flink like the operator would complement Flink nicely. I assume that > you are referring to a separate repository within Apache Flink, which > would give it the chance to achieve better sustainability than the > existing external operator efforts. There is also the fact that some > organizations which are heavily invested in operationalizing Flink are > allowing contributing to Apache Flink itself but less so to arbitrary > github projects. Regarding the tooling, it could well turn out that > Java is a good alternative given the ecosystem focus and that there is > an opportunity for reuse in certain aspects (metrics, logging etc.). > > [Yang] > I think Xintong has given a strong point why we introduced > the native K8s integration, which is active resource management. > > I have a concrete example for this in the production. When a K8s node is > down, the standalone K8s deployment will take longer > > recovery time based on the K8s eviction time(IIRC, default is 5 > minutes). For the native K8s integration, Flink RM could be aware of the > > TM heartbeat lost and allocate a new one timely. > > Thanks for sharing this, we should evaluate it as part of a proposal. > If we can optimize recovery or scaling with active resource management > then perhaps it is worth to support it through the operator. > Previously mentioned operators all rely on the standalone model. > > Cheers, > Thomas > > On Wed, Jan 12, 2022 at 3:21 AM Konstantin Knauf > wrote: > > > > cc dev@ > > > > Hi Thomas, Hi everyone, > > > > Thank you for starting this discussion and sorry for chiming in late. > > > > I agree with Thomas' and David's assessment of Flink's "Native Kubernetes > > Integration", in particular, it does actually not integrate well with the > > Kubernetes ecosystem despite being called "native" (tooling, security > > concerns). > > > > With respect to common lifecycle management operations: these features > are > > not available (within Apache Flink) for any of the other resource > providers > > (YARN, Standalone) either. From this perspective, I wouldn't consider > this > > a shortcoming of the Kubernetes integration. Instead, we have been > focusing > > our efforts in Apache Flink on the operations of a single Job, and left > > orchestration and lifecycle management that spans multiple Jobs to > > ecosystem projects. I still believe that we should keep this focus on low > > level composable building blocks (like J