Re: Pull #9224 - Druid Coordinator Pause Feature

Maytas Monsereenusorn Mon, 20 Jan 2020 17:06:04 -0800

I'm still pretty new to Druid and might be wrong but I notice the following
points in the documentation for the Coordinator (
https://druid.apache.org/docs/latest/design/coordinator.html):

*"The Druid Coordinator runs periodically and the time between each run is
a configurable parameter. Each time the Druid Coordinator runs, it assesses
the current state of the cluster before deciding on the appropriate actions
to take."*
Is it possible to use this configuration and set to a really large number
to do what you wanted?

"
*If the Druid Coordinator is not started up, no new segments will be loaded
in the cluster and outdated segments will not be dropped. However, the
Coordinator process can be started up at any time, and after a configurable
delay, will start running Coordinator tasks. This also means that if you
have a working cluster and all of your Coordinators die, the cluster will
continue to function, it just won’t experience any changes to its data
topology."*From this, it seems like the Coordinator does not to be running
both when other processes is starting up and if they are already up.

Best Regards,
Maytas

On Mon, Jan 20, 2020 at 2:10 PM Will Lauer <[email protected]>
wrote:

> I have no idea about the implementation, but the concept is certainly one
> we have been looking for for quite a while in the several clusters I
> manage. I'm excited to see this capability added to the system.
>
> Will
>
> On Mon, Jan 20, 2020, 1:55 PM Lucas Capistrant <[email protected]
> >
> wrote:
>
> > Hi all,
> >
> > Looking for some feedback on the idea of creating a new dynamic config
> for
> > the coordinator that allows cluster admins to pause coordination by
> setting
> > the new config to true (default is false). By pause coordination, I mean
> to
> > skip running any coordinator helpers every time the coordinator runs.
> Some
> > more details are included below as well as a link to a PR with the
> initial
> > implementation that I came up with. Any feedback helps, we want to make
> > sure we are not overlooking any negative side effects!
> >
> > My organization is preparing to undergo some heavy maintenance on our
> HDFS
> > cluster that backs our production Druid clusters. This involves HDFS
> > downtime. Our plan was to stop the coordinators and overlords and rolling
> > restart the Historical nodes during the outage to lay down the new site
> > files and retain a static picture of the world for client queries to run
> > against. During our tests in stage we realized the Historical's check in
> > with the coordinator when starting up. Therefore, we wanted to find a way
> > to leave the coordinator up, but not actually coordinate segments on the
> > cluster, try run kill tasks, etc. (because HDFS is offline and we don't
> > want to be talking with it until we know it is back up and healthy).
> Thus,
> > Pull
> > 9224 <https://github.com/apache/druid/pull/9224/files> was born. This
> > seemed like an easy and effective way to halt coordination and keep the
> API
> > up.
> >
> > We've done some small scale testing in a dev environment and I am
> currently
> > looking into writing some time of integration test that flexes this code
> > path. Despite the changes perceived simplicity, it would be nice to have
> > something there.
> >
> > Thanks!
> > Lucas Capistrant
> >
>

Re: Pull #9224 - Druid Coordinator Pause Feature

Reply via email to