Re: [openstack-dev] [Cinder] Static Ceph mon connection info prevents VM restart

Josh Durgin Tue, 12 May 2015 17:27:07 -0700

On 05/12/2015 12:06 AM, Arne Wiebalck wrote:

Here’s Dan’s answer for the exact procedure (he replied, but it bounced):



We have two clusters with mons behind two DNS aliases:

  cephmon.cern.ch: production cluster with five mons A, B, C, D, E

  cephmond.cern.ch: testing cluster with five mons X, Y, Z


The procedure was:

  1. Stop mon on host X. Remove from DNS alias cephmond. Remove from mon map.

  2. Stop mon on host A. Remove from DNS alias cephmon. Remove from mon map.

  3. Add mon on host X to cephmon cluster. mkfs the new mon, start the ceph-mon 
process; after quorum add it to the cephmon alias.

  4. Add mon on host A to cephmond cluster. mkfs the new mon, start the 
ceph-mon process; after quorum add it to the cephmond alias.

  5. Repeat for B/Y and C/Z.



In the end, three of the hosts which were previously running cephmon mon’s were 
then running cephmond mon’s. Hence when a client comes with an config pointing 
to an old mon, they get authentication denied and the client stops there — it 
doesn’t try the next IP in the list of mons. As a workaround we moved all the 
cephmond mon’s to port 6790 — this way the Cinder clients failover to one of 
the two cephmon mon’s which have not changed.


Thanks, it all makes sense now.

Josh


On 12 May 2015, at 01:46, Josh Durgin <jdur...@redhat.com> wrote:

On 05/08/2015 12:41 AM, Arne Wiebalck wrote:

Hi Josh,

In our case adding the monitor hostnames (alias) would have made only a
slight difference:
as we moved the servers to another cluster, the client received an
authorisation failure rather
than a connection failure and did not try to fail over to the next IP in
the list. So, adding the
alias to list would have improved the chances to hit a good monitor, but
it would not have
eliminated the problem.


Could you provide more details on the procedure you followed to move
between clusters? I missed the separate clusters part initially, and
thought you were simply replacing the monitor nodes.

I’m not sure storing IPs in the nova database is a good idea in gerenal.
Replacing (not adding)
these by the hostnames is probably better. Another approach may be to
generate this part of
connection_info (and hence the XML) dynamically from the local ceph.conf
when the connection
is created. I think a mechanism like this is for instance used to select
a free port for the vnc
console when the instance is started.


Yes, with different clusters only using the hostnames is definitely
the way to go. I agree that keeping the information in nova's db may
not be the best idea. It is handy to allow nova to use different
clusters from cinder, so I'd prefer not generating the connection info
locally. The qos_specs are also part of connection_info, and if changed
they would have a similar problem of not applying the new value to
existing instances, even after reboot. Maybe nova should simply refresh
the connection info each time it uses a volume.

Josh



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Cinder] Static Ceph mon connection info prevents VM restart

Reply via email to