Hi All,

I created PIP 212: https://github.com/apache/pulsar/issues/18012. It
already has a PR https://github.com/apache/pulsar/pull/15640. It is a
minor change that I view as more of a bug fix, but because it changes
a default value, it needs a PIP.

The main point for discussion is whether we can cherry pick this to
active release branches. My view is that it should be cherry picked
because the feature doesn't work out of the box.

If there isn't much discussion, I'll start a vote later this week.

Thanks,
Michael


### Motivation

The current Bookkeeper configuration defaults to using
`org.apache.bookkeeper.net.ScriptBasedMapping` for the
`DNSToSwitchMapping` implementation. However, this default
configuration does not align with the Broker's default configuration,
which is `org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping`. As
such, the default configuration for a Pulsar cluster does not lead to
ideal rack awareness when ledgers need to be recovered. The result is
that a user can configure a cluster for rack awareness and the brokers
will honor that configuration, but the autorecovery process will not
because it does not have the correct bookkeeper cluster topology view.

I propose we configure bookkeeper to use the broker's
`ZkBookieRackAffinityMapping` class. That way, autorecovery will honor
the operator's configured rack awareness policies out of the box.

### Goal

Ensure consistent rack awareness policies.

I propose this is a bug fix that requires patching all active versions
of Pulsar.

### API Changes

None

### Implementation

See https://github.com/apache/pulsar/pull/15640.

Add default value for `reppDnsResolverClass` to the
`conf/bookkeeper.conf` configuration. This change effectively switches
the default from `org.apache.bookkeeper.net.ScriptBasedMapping` to
`org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping`.

### Alternatives

The tradeoff is that a user relying on the `ScriptBasedMapping`
default might accidentally get switched to using the
`ZkBookieRackAffinityMapping` implementation. Given that
`ScriptBasedMapping` doesn't work out of the box, and that the
broker's default to `ZkBookieRackAffinityMapping`, I think this is an
acceptable tradeoff.

### Anything else?

I manually verified that the `ZkBookieRackAffinityMapping` works by
running some tests in a minikube cluster deployed with the DataStax
helm chart for Apache Pulsar. I set up 3 racks, 4 bookies, and a topic
with a E=2, Qw=2, and Qa=2. I then verified that the autorecovery pod
correctly discovered the racks and then identified when an ensemble
was not following the rack placement policy after two bookies were
removed. I documented my testing a bit more here:
https://github.com/datastax/pulsar-helm-chart/pull/214.

Reply via email to