Hi,

On 18/10/2021 23:34, Gregory Farnum wrote:
On Fri, Oct 15, 2021 at 8:22 AM Matthew Vernon <mver...@wikimedia.org> wrote:

Also, if I'm using RGWs, will they do the right thing location-wise?
i.e. DC A RGWs will talk to DC A OSDs wherever possible?

Stretch clusters are entirely a feature of the RADOS layer at this
point; setting up RGW/RBD/CephFS to use them efficiently is left as an
exercise to the user. Sorry. :/

That said, I don't think it's too complicated — you want your CRUSH
rule to specify a single site as the primary and to run your active
RGWs on that side, or else to configure read-from-replica and local
reads if your workloads support them. But so far the expectation is
definitely that anybody deploying this will have their own
orchestration systems around it (you can't really do HA from just the
storage layer), whether it's home-brewed or Rook in Kubernetes, so we
haven't discussed pushing it out more within Ceph itself.

We do have existing HA infrastructure which can e.g. make sure our S3 clients in DC A talk to our RGWs in DC A.

But I think I understand you to be saying that in a stretch cluster (other than in stretch degraded mode) each pg will still have 1 primary which will serve all reads - so ~50% of our RGWs in DC B will end up reading from DC A (and vice versa). And that there's no way round this. Is that correct?

Relatedly, I infer this means that the inter-DC link will continue to be a bottleneck for write latency as if I were just running a "normal" cluster that happens to be in two DCs? [because the primary OSD will only ACK the write once all four replicas are complete]

Thanks,

Matthew
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to