Re: [ceph-users] Lessons learned upgrading Hammer -> Jewel

Adrian Saul Sun, 17 Jul 2016 17:20:59 -0700

I have SELinux disabled and it does the restorecon on /var/lib/ceph regardless 
from the RPM post upgrade scripts.


In my case I chose to kill the restorecon processes to save outage time – it 
didn’t affect the upgrade package completion.


From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mykola 
Dvornik
Sent: Friday, 15 July 2016 6:54 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Lessons learned upgrading Hammer -> Jewel

I would also advice people to mind the SELinux if it is enabled on the OSD's 
nodes.
The re-labeling should be done as the part of the upgrade and this is rather 
time consuming process.


-----Original Message-----
From: Mart van Santen 
<m...@greenhost.nl<mailto:mart%20van%20santen%20%3cm...@greenhost.nl%3e>>
To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Lessons learned upgrading Hammer -> Jewel
Date: Fri, 15 Jul 2016 10:48:40 +0200


Hi Wido,

Thank you, we are currently in the same process so this information is very 
usefull. Can you share why you upgraded from hammer directly to jewel, is there 
a reason to skip infernalis? So, I wonder why you didn't do a 
hammer->infernalis->jewel upgrade, as that seems the logical path for me.

(we did indeed saw the same errors "Failed to encode map eXXX with expected 
crc" when upgrading to the latest hammer)


Regards,

Mart






On 07/15/2016 03:08 AM, 席智勇 wrote:
good job, thank you for sharing, Wido~
it's very useful~

2016-07-14 14:33 GMT+08:00 Wido den Hollander 
<w...@42on.com<mailto:w...@42on.com>>:

To add, the RGWs upgraded just fine as well.

No regions in use here (yet!), so that upgraded as it should.

Wido

> Op 13 juli 2016 om 16:56 schreef Wido den Hollander 
> <w...@42on.com<mailto:w...@42on.com>>:
>
>
> Hello,
>
> The last 3 days I worked at a customer with a 1800 OSD cluster which had to 
> be upgraded from Hammer 0.94.5 to Jewel 10.2.2
>
> The cluster in this case is 99% RGW, but also some RBD.
>
> I wanted to share some of the things we encountered during this upgrade.
>
> All 180 nodes are running CentOS 7.1 on a IPv6-only network.
>
> ** Hammer Upgrade **
> At first we upgraded from 0.94.5 to 0.94.7, this went well except for the 
> fact that the monitors got spammed with these kind of messages:
>
>   "Failed to encode map eXXX with expected crc"
>
> Some searching on the list brought me to:
>
>   ceph tell osd.* injectargs -- --clog_to_monitors=false
>
>  This reduced the load on the 5 monitors and made recovery succeed smoothly.
>
>  ** Monitors to Jewel **
>  The next step was to upgrade the monitors from Hammer to Jewel.
>
>  Using Salt we upgraded the packages and afterwards it was simple:
>
>    killall ceph-mon
>    chown -R ceph:ceph /var/lib/ceph
>    chown -R ceph:ceph /var/log/ceph
>
> Now, a systemd quirck. 'systemctl start ceph.target' does not work, I had to 
> manually enabled the monitor and start it:
>
>   systemctl enable 
> ceph-mon@srv-zmb04-05.service<mailto:ceph-mon@srv-zmb04-05.service>
>   systemctl start 
> ceph-mon@srv-zmb04-05.service<mailto:ceph-mon@srv-zmb04-05.service>
>
> Afterwards the monitors were running just fine.
>
> ** OSDs to Jewel **
> To upgrade the OSDs to Jewel we initially used Salt to update the packages on 
> all systems to 10.2.2, we then used a Shell script which we ran on one node 
> at a time.
>
> The failure domain here is 'rack', so we executed this in one rack, then the 
> next one, etc, etc.
>
> Script can be found on Github: 
> https://gist.github.com/wido/06eac901bd42f01ca2f4f1a1d76c49a6
>
> Be aware that the chown can take a long, long, very long time!
>
> We ran into the issue that some OSDs crashed after start. But after trying 
> again they would start.
>
>   "void FileStore::init_temp_collections()"
>
> I reported this in the tracker as I'm not sure what is happening here: 
> http://tracker.ceph.com/issues/16672
>
> ** New OSDs with Jewel **
> We also had some new nodes which we wanted to add to the Jewel cluster.
>
> Using Salt and ceph-disk we ran into a partprobe issue in combination with 
> ceph-disk. There was already a Pull Request for the fix, but that was not 
> included in Jewel 10.2.2.
>
> We manually applied the PR and it fixed our issues: 
> https://github.com/ceph/ceph/pull/9330
>
> Hope this helps other people with their upgrades to Jewel!
>
> Wido
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






_______________________________________________

ceph-users mailing list

ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________

ceph-users mailing list

ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Lessons learned upgrading Hammer -> Jewel

Reply via email to