Re: [PVE-User] After update Ceph monitor shows wrong version in UI and is down and out of quorum

Uwe Sauter Tue, 05 Jan 2021 11:30:19 -0800

Frank,

Am 05.01.21 um 20:24 schrieb Frank Thommen:

Hi Uwe,
did you look into the log of MON and OSD?
I can't see any specific MON and OSD logs. However the log available in the UI (Ceph -> Log) has lots of messagesregarding scrubbing but no messages regarding issues with starting the monitor


On each host the logs should be in /var/log/ceph. These should be rotated (see 
/etc/logrotate.d/ceph-common for details).

Regards,

        Uwe

Can you provide the list of installed packages of the affected host and the 
rest of the cluster?
let me compile the lists and post them somewhere.  They are quite long.
Is the output of "ceph status" the same for all hosts?
yes

Frank
Regards,

     Uwe

Am 05.01.21 um 20:01 schrieb Frank Thommen:
On 04.01.21 12:44, Frank Thommen wrote:
Dear all,
one of our three PVE hypervisors in the cluster crashed (it was fenced successfully) and rebooted automatically. Itook the chance to do a complete dist-upgrade and rebooted again.
The PVE Ceph dashboard now reports, that

   * the monitor on the host is down (out of quorum), and
   * "A newer version was installed but old version still running, please 
restart"
The Ceph UI reports monitor version 14.2.11 while in fact 14.2.16 is installed. The hypervisor has been rebootedtwice since the upgrade, so it should be basically impossible that the old version is still running.
`systemctl restart ceph.target` and restarting the monitor through the PVE Ceph UI didn't help. The hypervisor isrunning PVE 6.3-3 (the other two are running 6.3-2 with monitor 14.2.15)
What to do in this situation?
I am happy with either UI or commandline instructions, but I have no Ceph experience besides setting up it upfollowing the PVE instructions.
Any help or hint is appreciated.
Cheers, Frank
In an attempt to fix the issue I destroyed the monitor through the UI and recreated it. Unfortunately it can stillnot be started. A popup tells me that the monitor has been started, but the overview still shows "stopped" and thereis no version number any more.
Then I stopped and started Ceph on the node (`pveceph stop; pveceph start`) which resulted in a degraded cluster (1host down, 7 of 21 OSDs down). OSDs cannot be started through the UI either.
I feel extremely uncomfortable with this situation and would appreciate any hint as to how I should proceed with theproblem.
Cheers, Frank

_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user


_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Re: [PVE-User] After update Ceph monitor shows wrong version in UI and is down and out of quorum

Reply via email to