Re: [DISCUSS] FLIP-403: High Availability Services for OLAP Scenarios

Yangze Guo Mon, 08 Jan 2024 18:55:24 -0800

Thank you for your comments, Zhu!

1. I would treat refactoring as a technical debt and a side effect of
this FLIP. The idea is inspired by Matthias' comments in [1]. It
suggests having a single implementation of HighAvailabilityServices
that requires a factory method for persistence services and leader
services. After this, we will achieve a clearer class hierarchy for
HAServices and eliminate code duplication.


2. While FLINK-24038 does eliminate the leader election time cost for
each job, it still involves creating a znode or writing to the
configmap for each job, which can negatively impact performance under
higher workloads. This also applies to all other persistence services
such as checkpoint and blob storage except for the job graph store.

WDYT?

[1] 
https://issues.apache.org/jira/browse/FLINK-31816?focusedCommentId=17741054&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17741054

Best,
Yangze Guo

On Mon, Jan 8, 2024 at 7:37 PM Zhu Zhu <reed...@gmail.com> wrote:
>
> Thanks for creating the FLIP and starting the discussion, Yangze. It makes
> sense to me to improve the job submission performance in OLAP scenarios.
>
> I have a few questions regarding the proposed changes:
>
> 1. How about skipping the job graph persistence if the proposed config
> 'high-availability.enable-job-recovery' is set to false? In this way,
> we do not need to do the refactoring work.
>
> 2. Instead of using different HA services for Dispatcher and JobMaster.
> Can we leverage the work of FLINK-24038 to eliminate the leader election
> time cost of each job? Honestly I had thought it was already the truth but
> seems it is not. This improvement can also benefit non-OLAP jobs.
>
> Thanks,
> Zhu
>
> Yangze Guo <karma...@gmail.com> 于2024年1月8日周一 17:11写道：
>
> > Thanks for the pointer, Rui!
> >
> > I have reviewed FLIP-383, and based on my understanding, this feature
> > should be enabled by default for batch jobs in the future. Therefore,
> > +1 for checking the parameters and issuing log warnings when the user
> > explicitly configures execution.batch.job-recovery.enabled to true.
> >
> > +1 for high-availability.job-recovery.enabled, which would be more
> > suitable with YAML hierarchy.
> >
> >
> > Best,
> > Yangze Guo
> >
> > On Mon, Jan 8, 2024 at 3:43 PM Rui Fan <1996fan...@gmail.com> wrote:
> > >
> > > Thanks to Yangze driving this proposal!
> > >
> > > Overall looks good to me! This proposal is useful for
> > > the performance when the job doesn't need the failover.
> > >
> > > I have some minor questions:
> > >
> > > 1. How does it work with FLIP-383[1]?
> > >
> > > This FLIP introduces a high-availability.enable-job-recovery,
> > > and FLIP-383 introduces a execution.batch.job-recovery.enabled.
> > >
> > > IIUC, when high-availability.enable-job-recovery is false, the job
> > > cannot recover even if execution.batch.job-recovery.enabled = true,
> > > right?
> > >
> > > If so, could we check some parameters and warn some logs? Or
> > > disable the execution.batch.job-recovery.enabled directly when
> > > high-availability.enable-job-recovery = false.
> > >
> > > 2. Could we rename it to high-availability.job-recovery.enabled to unify
> > > the naming?
> > >
> > > WDYT?
> > >
> > > [1] https://cwiki.apache.org/confluence/x/QwqZE
> > >
> > > Best,
> > > Rui
> > >
> > > On Mon, Jan 8, 2024 at 2:04 PM Yangze Guo <karma...@gmail.com> wrote:
> > >
> > > > Thanks for your comment, Yong.
> > > >
> > > > Here are my thoughts on the splitting of HighAvailableServices:
> > > > Firstly, I would treat this separation as a result of technical debt
> > > > and a side effect of the FLIP. In order to achieve a cleaner interface
> > > > hierarchy for High Availability before Flink 2.0, the design decision
> > > > should not be limited to OLAP scenarios.
> > > > I agree that the current HAServices can be divided based on either the
> > > > actual target (cluster & job) or the type of functionality (leader
> > > > election & persistence). From a conceptual perspective, I do not see
> > > > one approach being better than the other. However, I have chosen the
> > > > current separation for a clear separation of concerns. After FLIP-285,
> > > > each process has a dedicated LeaderElectionService responsible for
> > > > leader election of all the components within it. This
> > > > LeaderElectionService has its own lifecycle management. If we were to
> > > > split the HAServices into 'ClusterHighAvailabilityService' and
> > > > 'JobHighAvailabilityService', we would need to couple the lifecycle
> > > > management of these two interfaces, as they both rely on the
> > > > LeaderElectionService and other relevant classes. This coupling and
> > > > implicit design assumption will increase the complexity and testing
> > > > difficulty of the system. WDYT?
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Mon, Jan 8, 2024 at 12:08 PM Yong Fang <zjur...@gmail.com> wrote:
> > > > >
> > > > > Thanks Yangze for starting this discussion. I have one comment: why
> > do we
> > > > > need to abstract two services as `LeaderServices` and
> > > > > `PersistenceServices`?
> > > > >
> > > > > From the content, the purpose of this FLIP is to make job failover
> > more
> > > > > lightweight, so it would be more appropriate to abstract two
> > services as
> > > > > `ClusterHighAvailabilityService` and `JobHighAvailabilityService`
> > instead
> > > > > of `LeaderServices` and `PersistenceServices` based on leader and
> > store.
> > > > In
> > > > > this way, we can create a `JobHighAvailabilityService` that has a
> > leader
> > > > > service and store for the job that meets the requirements based on
> > the
> > > > > configuration in the zk/k8s high availability service.
> > > > >
> > > > > WDYT?
> > > > >
> > > > > Best,
> > > > > Fang Yong
> > > > >
> > > > > On Fri, Dec 29, 2023 at 8:10 PM xiangyu feng <xiangyu...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Thanks Yangze for restart this discussion.
> > > > > >
> > > > > > +1 for the overall idea. By splitting the HighAvailabilityServices
> > into
> > > > > > LeaderServices and PersistenceServices, we may support configuring
> > > > > > different storage behind them in the future.
> > > > > >
> > > > > > We did run into real problems in production where too much job
> > > > metadata was
> > > > > > being stored on ZK, causing system instability.
> > > > > >
> > > > > >
> > > > > > Yangze Guo <karma...@gmail.com> 于2023年12月29日周五 10:21写道：
> > > > > >
> > > > > > > Thanks for the response, Zhanghao.
> > > > > > >
> > > > > > > PersistenceServices sounds good to me.
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > > On Wed, Dec 27, 2023 at 11:30 AM Zhanghao Chen
> > > > > > > <zhanghao.c...@outlook.com> wrote:
> > > > > > > >
> > > > > > > > Thanks for driving this effort, Yangze! The proposal overall
> > LGTM.
> > > > > > Other
> > > > > > > from the throughput enhancement in the OLAP scenario, the
> > separation
> > > > of
> > > > > > > leader election/discovery services and the metadata persistence
> > > > services
> > > > > > > will also make the HA impl clearer and easier to maintain. Just a
> > > > minor
> > > > > > > comment on naming: would it better to rename PersistentServices
> > to
> > > > > > > PersistenceServices, as usually we put a noun before Services?
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Zhanghao Chen
> > > > > > > > ________________________________
> > > > > > > > From: Yangze Guo <karma...@gmail.com>
> > > > > > > > Sent: Tuesday, December 19, 2023 17:33
> > > > > > > > To: dev <dev@flink.apache.org>
> > > > > > > > Subject: [DISCUSS] FLIP-403: High Availability Services for
> > OLAP
> > > > > > > Scenarios
> > > > > > > >
> > > > > > > > Hi, there,
> > > > > > > >
> > > > > > > > We would like to start a discussion thread on "FLIP-403: High
> > > > > > > > Availability Services for OLAP Scenarios"[1].
> > > > > > > >
> > > > > > > > Currently, Flink's high availability service consists of two
> > > > > > > > mechanisms: leader election/retrieval services for JobManager
> > and
> > > > > > > > persistent services for job metadata. However, these
> > mechanisms are
> > > > > > > > set up in an "all or nothing" manner. In OLAP scenarios, we
> > > > typically
> > > > > > > > only require leader election/retrieval services for JobManager
> > > > > > > > components since jobs usually do not have a restart strategy.
> > > > > > > > Additionally, the persistence of job states can negatively
> > impact
> > > > the
> > > > > > > > cluster's throughput, especially for short query jobs.
> > > > > > > >
> > > > > > > > To address these issues, this FLIP proposes splitting the
> > > > > > > > HighAvailabilityServices into LeaderServices and
> > > > PersistentServices,
> > > > > > > > and enable users to independently configure the high
> > availability
> > > > > > > > strategies specifically related to jobs.
> > > > > > > >
> > > > > > > > Please find more details in the FLIP wiki document [1]. Looking
> > > > > > > > forward to your feedback.
> > > > > > > >
> > > > > > > > [1]
> > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-403+High+Availability+Services+for+OLAP+Scenarios
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yangze Guo
> > > > > > >
> > > > > >
> > > >
> >

Re: [DISCUSS] FLIP-403: High Availability Services for OLAP Scenarios

Reply via email to