Hello!

I have a job that have multiple instances (>100) that'd I like to spread
across the hosts in a cluster. Using a constraint such as "limit=host:1"
doesn't work quite well, as I have more instances than nodes.

As a workaround I increased the limit value to something like
ceil(instances/nodes). But now the problem happens if a bunch of nodes go
down (think a whole rack dies) because the instances will not run until
them are back, even though we may have spare capacity on the rest of the
hosts that we'd like to use. In that scenario, the job availability may be
affected because it's running with fewer instances than expected. On a
smaller scale, the former approach would also apply if you want to spread
tasks in racks or availability zones. I'd like to have one instance of a
job per rack (failure domain) but in the case of it going down, the
instance can be spawn on a different rack.

I thought we could have a scheduling constraint to "spread" instances
across a particular host attribute; instead of vetoing an offer right away
we check where the other instances of a task are running, looking for a
particular attribute of the host. We try to maximize the different values
of a particular attribute (rack, hostname, etc) on the task instances
assignment.

what do you think? did something like this came up in the past? is it
feasible?


Mauricio

Reply via email to