Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience
cephmailinglist writes: > e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0 chown ceph:ceph > [...] > [...] Also at that time one of our pools got a lot of extra data, > those files where stored with root permissions since we did not > restarted the Ceph daemons yet, the 'find' in step e found so much > files that xargs (the shell) could not handle it (too many arguments). I've always found it disappointing that xargs behaves like this on many GNU/Linux distributions. I always thought xargs's main purpose in life was to know how many arguments can safely be passed to a process... Anyway, you should be able to limit the number of arguments per invocation by adding something like "-n 100" to the xargs command line. Thanks for sharing your upgrade experiences! -- Simon. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience
Hi, We initially upgraded from Hammer to Jewel while keeping the ownership unchanged, by adding "setuser match path = /var/lib/ceph/$type/$cluster-$id" in ceph.conf Later, we used the following steps to change from running as root to running as ceph. On the storage nodes, we ran the following command that doesn't change permissions, but caches the filesystem (based on http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-November/006013.html ) find /var/lib/ceph/osd -maxdepth 1 -mindepth 1 -print | xargs -P12 -n1 chown -R root:root Set noout: ceph osd set noout On Storage node: Edited "/etc/ceph/ceph.conf" and commented out #setuser match path = /var/lib/ceph/$type/$cluster-$id stop ceph-osd-all find /var/lib/ceph/osd -maxdepth 1 -mindepth 1 -print | xargs -P12 -n1 chown -R ceph:ceph chown -R ceph:ceph /var/lib/ceph/ start ceph-osd-all Check that all the Ceph OSD processes are running: ps aux | grep ceph | egrep –v grep Unset "noout": ceph osd unset noout Wait till ceph is healthy again and continue with the next storage node. The OSDs were down for about 2 min because we ran the find command before hand and used xargs with 12 parallel processes, so recovery time was quick as well. We have more than 850 OSDs and the entire process went pretty smooth by doing one storage server at a time. On Tue, Mar 14, 2017 at 3:27 AM, Richard Arends wrote: > On 03/13/2017 02:02 PM, Christoph Adomeit wrote: > > Christoph, > > Thanks for the detailed upgrade report. >> >> We have another scenario: We have allready upgraded to jewel 10.2.6 but >> we are still running all our monitors and osd daemons as root using the >> setuser match path directive. >> >> What would be the recommended way to have all daemons running as >> ceph:ceph user ? >> >> Could we chown -R the monitor and osd data directories under >> /var/lib/ceph one by one while keeping up service ? >> > > Yes. To minimize the down time, you can do the chown twice. Once before > restarting the daemons, while they are running with root user permissions. > Then stop the daemons, do the chown again, but then only on the changed > files (find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0 chown ceph:ceph) > and start the Ceph daemons with setuser and setgroup set to ceph > > > > -- > With regards, > > Richard Arends. > Snow BV / http://snow.nl > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience
On 03/13/2017 02:02 PM, Christoph Adomeit wrote: Christoph, Thanks for the detailed upgrade report. We have another scenario: We have allready upgraded to jewel 10.2.6 but we are still running all our monitors and osd daemons as root using the setuser match path directive. What would be the recommended way to have all daemons running as ceph:ceph user ? Could we chown -R the monitor and osd data directories under /var/lib/ceph one by one while keeping up service ? Yes. To minimize the down time, you can do the chown twice. Once before restarting the daemons, while they are running with root user permissions. Then stop the daemons, do the chown again, but then only on the changed files (find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0 chown ceph:ceph) and start the Ceph daemons with setuser and setgroup set to ceph -- With regards, Richard Arends. Snow BV / http://snow.nl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience
On 03/12/2017 07:54 PM, Florian Haas wrote: Florian, For others following this thread who still have the hammer→jewel upgrade ahead: there is a ceph.conf option you can use here; no need to fiddle with the upstart scripts. setuser match path = /var/lib/ceph/$type/$cluster-$id Ah, i did not know this option. Good tip! -- With regards, Richard Arends. Snow BV / http://snow.nl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience
Thanks for the detailed upgrade report. We have another scenario: We have allready upgraded to jewel 10.2.6 but we are still running all our monitors and osd daemons as root using the setuser match path directive. What would be the recommended way to have all daemons running as ceph:ceph user ? Could we chown -R the monitor and osd data directories under /var/lib/ceph one by one while keeping up service ? Thanks Christoph On Sat, Mar 11, 2017 at 12:21:38PM +0100, cephmailingl...@mosibi.nl wrote: > Hello list, > > A week ago we upgraded our Ceph clusters from Hammer to Jewel and with this > email we want to share our experiences. > -- Christoph Adomeit GATWORKS GmbH Reststrauch 191 41199 Moenchengladbach Sitz: Moenchengladbach Amtsgericht Moenchengladbach, HRB 6303 Geschaeftsfuehrer: Christoph Adomeit, Hans Wilhelm Terstappen christoph.adom...@gatworks.de Internetloesungen vom Feinsten Fon. +49 2166 9149-32 Fax. +49 2166 9149-10 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience
On 03/13/2017 11:07 AM, Dan van der Ster wrote: On Sat, Mar 11, 2017 at 12:21 PM, wrote: The next and biggest problem we encountered had to do with the CRC errors on the OSD map. On every map update, the OSDs that were not upgraded yet, got that CRC error and asked the monitor for a full OSD map instead of just a delta update. At first we did not understand what exactly happened, we ran the upgrade per node using a script and in that script we watch the state of the cluster and when the cluster is healthy again, we upgrade the next host. Every time we started the script (skipping the already upgraded hosts) the first host(s) upgraded without issues and then we got blocked I/O on the cluster. The blocked I/O went away within a minute of 2 (not measured). After investigation we found out that the blocked I/O happened when nodes where asking the monitor for a (full) OSD map and that resulted shortly in a full saturated network link on our monitor. Thanks for the detailed upgrade report. I wanted to zoom in on this CRC/fullmap issue because it could be quite disruptive for us when we upgrade from hammer to jewel. I've read various reports that the fool proof way to avoid the full map DoS would be to upgrade all OSDs to jewel before the mon's. Did anyone have success with that workaround? I'm cc'ing Bryan because he knows this issue very well. With https://github.com/ceph/ceph/pull/13131 merged into 10.2.6, this issue shouldn't be a problem (at least we don't see it anymore). -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience
On Sat, Mar 11, 2017 at 12:21 PM, wrote: > > The next and biggest problem we encountered had to do with the CRC errors on > the OSD map. On every map update, the OSDs that were not upgraded yet, got > that CRC error and asked the monitor for a full OSD map instead of just a > delta update. At first we did not understand what exactly happened, we ran > the upgrade per node using a script and in that script we watch the state of > the cluster and when the cluster is healthy again, we upgrade the next host. > Every time we started the script (skipping the already upgraded hosts) the > first host(s) upgraded without issues and then we got blocked I/O on the > cluster. The blocked I/O went away within a minute of 2 (not measured). After > investigation we found out that the blocked I/O happened when nodes where > asking the monitor for a (full) OSD map and that resulted shortly in a full > saturated network link on our monitor. Thanks for the detailed upgrade report. I wanted to zoom in on this CRC/fullmap issue because it could be quite disruptive for us when we upgrade from hammer to jewel. I've read various reports that the fool proof way to avoid the full map DoS would be to upgrade all OSDs to jewel before the mon's. Did anyone have success with that workaround? I'm cc'ing Bryan because he knows this issue very well. Cheers, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience
Hello, On Sun, 12 Mar 2017 19:54:10 +0100 Florian Haas wrote: > On Sat, Mar 11, 2017 at 12:21 PM, wrote: > > The upgrade of our biggest cluster, nr 4, did not go without > > problems. Since we where expecting a lot of "failed to encode map > > e with expected crc" messages, we disabled clog to monitors > > with 'ceph tell osd.* injectargs -- --clog_to_monitors=false' so our > > monitors would not choke in those messages. The upgrade of the > > monitors did go as expected, without any problem, the problems > > started when we started the upgrade of the OSDs. In the upgrade > > procedure, we had to change the ownership of the files from root to > > the user ceph and that process was taking so long on our cluster that > > completing the upgrade would take more then a week. We decided to > > keep the permissions as they where for now, so in the upstart init > > script /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup > > ceph' to '--setuser root --setgroup root' and fix that OSD by OSD > > after the upgrade was completely done > > For others following this thread who still have the hammer→jewel upgrade > ahead: there is a ceph.conf option you can use here; no need to fiddle > with the upstart scripts. > > setuser match path = /var/lib/ceph/$type/$cluster-$id > Yes, I was thinking about mentioning this, too. Alas in my experience with a wonky test cluster this failed with MDS, maybe because of an odd name, maybe because nobody ever tested it. MONs and OSDs were fine. > What this will do is it will check which user owns files in the > respective directories, and then start your Ceph daemons under the > appropriate user and group IDs. In other words, if you enable this and > you upgrade from Hammer to Jewel, and your files are still owned by > root, your daemons will also continue run as root:root (as they did in > hammer). Then, you can stop your OSDs, run the recursive chown, and > restart the OSDs one-by-one. When they come back up, they will just > automatically switch to running as ceph:ceph. > Though if you have external journals and didn't use ceph-deploy, you're boned with the whole ceph:ceph approach. Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience
Hello, On Sun, 12 Mar 2017 19:52:12 +1000 Brad Hubbard wrote: > On Sun, Mar 12, 2017 at 6:36 AM, Christian Theune > wrote: > > Hi, > > > > thanks for that report! Glad to hear a mostly happy report. I’m still on the > > fence … ;) > > > > I have had reports that Qemu (librbd connections) will require > > updates/restarts before upgrading. What was your experience on that side? > > Did you upgrade the clients? Did you start using any of the new RBD > > features, like fast diff? > > You don't need to restart qemu-kvm instances *before* upgrading but > you do need to restart or migrate them *after* updating. The updated > binaries are only loaded into the qemu process address space at > start-up so to load the newly installed binaries (libraries) you need > to restart or do a migration to an upgraded host. > Well, the OP wrote about live migration problems, but those were not in the qemu part of things but libvirt/openstack related. To wit, I did upgrade a test cluster from hammer to Jewel and live migration under ganeti worked fine. I've also not seen any problems on other instances that since have not been restarted, nor would I hope that an upgrade from one stable version to the next should EVER require such a step (at least immediately). Christian > > > > What’s your experience with load/performance after the upgrade? Found any > > new issues that indicate shifted hotspots? > > > > Cheers and thanks again, > > Christian > > > > On Mar 11, 2017, at 12:21 PM, cephmailingl...@mosibi.nl wrote: > > > > Hello list, > > > > A week ago we upgraded our Ceph clusters from Hammer to Jewel and with this > > email we want to share our experiences. > > > > > > We have four clusters: > > > > 1) Test cluster for all the fun things, completely virtual. > > > > 2) Test cluster for Openstack: 3 monitors and 9 OSDs, all baremetal > > > > 3) Cluster where we store backups: 3 monitors and 153 OSDs. 554 TB storage > > > > 4) Main cluster (used for our custom software stack and openstack): 5 > > monitors and 1917 OSDs. 8 PB storage > > > > > > All the clusters are running on Ubuntu 14.04 LTS and we use the Ceph > > packages from ceph.com. On every cluster we upgraded the monitors first and > > after that, the OSDs. Our backup cluster is the only cluster that also > > serves S3 via the RadosGW and that service is upgraded at the same time as > > the OSDs in that cluster. The upgrade of clusters 1, 2 and 3 went without > > any problem, just an apt-get upgrade on every component. We did see the > > message "failed to encode map e with expected crc", but that > > message disappeared when all the OSDs where upgraded. > > > > The upgrade of our biggest cluster, nr 4, did not go without problems. Since > > we where expecting a lot of "failed to encode map e with expected > > crc" messages, we disabled clog to monitors with 'ceph tell osd.* injectargs > > -- --clog_to_monitors=false' so our monitors would not choke in those > > messages. The upgrade of the monitors did go as expected, without any > > problem, the problems started when we started the upgrade of the OSDs. In > > the upgrade procedure, we had to change the ownership of the files from root > > to the user ceph and that process was taking so long on our cluster that > > completing the upgrade would take more then a week. We decided to keep the > > permissions as they where for now, so in the upstart init script > > /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup ceph' to > > '--setuser root --setgroup root' and fix that OSD by OSD after the upgrade > > was completely done > > > > On cluster 3 (backup) we could change the permissions in a shorter time with > > the following procedure: > > > > a) apt-get -y install ceph-common > > b) mount|egrep 'on \/var.*ceph.*osd'|awk '{print $3}'|while read P; do > > echo chown -R ceph:ceph $P \&;done > t ; bash t ; rm t > > c) (wait for all the chown's to complete) > > d) stop ceph-all > > e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0 chown ceph:ceph > > f) start ceph-all > > > > This procedure did not work on our main (4) cluster because the load on the > > OSDs became 100% in step b and that resulted in blocked I/O on some virtual > > instances in the Openstack cluster. Also at that time one of our pools got a > > lot of extra data, those files where stored with root permissions since we > > did not restarted the Ceph daemons yet, the 'find' in step e found so much > > files that xargs (the shell) could not handle it (too many arguments). At > > that time we decided to keep the permissions on root in the upgrade phase. > > > > The next and biggest problem we encountered had to do with the CRC errors on > > the OSD map. On every map update, the OSDs that were not upgraded yet, got > > that CRC error and asked the monitor for a full OSD map instead of just a > > delta update. At first we did not understand what exactly happened, we ran > > the upgrade per node using a script a
Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience
On Sat, Mar 11, 2017 at 12:21 PM, wrote: > The upgrade of our biggest cluster, nr 4, did not go without > problems. Since we where expecting a lot of "failed to encode map > e with expected crc" messages, we disabled clog to monitors > with 'ceph tell osd.* injectargs -- --clog_to_monitors=false' so our > monitors would not choke in those messages. The upgrade of the > monitors did go as expected, without any problem, the problems > started when we started the upgrade of the OSDs. In the upgrade > procedure, we had to change the ownership of the files from root to > the user ceph and that process was taking so long on our cluster that > completing the upgrade would take more then a week. We decided to > keep the permissions as they where for now, so in the upstart init > script /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup > ceph' to '--setuser root --setgroup root' and fix that OSD by OSD > after the upgrade was completely done For others following this thread who still have the hammer→jewel upgrade ahead: there is a ceph.conf option you can use here; no need to fiddle with the upstart scripts. setuser match path = /var/lib/ceph/$type/$cluster-$id What this will do is it will check which user owns files in the respective directories, and then start your Ceph daemons under the appropriate user and group IDs. In other words, if you enable this and you upgrade from Hammer to Jewel, and your files are still owned by root, your daemons will also continue run as root:root (as they did in hammer). Then, you can stop your OSDs, run the recursive chown, and restart the OSDs one-by-one. When they come back up, they will just automatically switch to running as ceph:ceph. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience
On Sat, 11 Mar 2017, Udo Lembke wrote: > On 11.03.2017 12:21, cephmailingl...@mosibi.nl wrote: > > ... > > > > > > e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0 chown ceph:ceph > > ... the 'find' in step e found so much files that xargs (the shell) > > could not handle it (too many arguments). At that time we decided to > > keep the permissions on root in the upgrade phase. > > > > > Perhaps would an "find /var/lib/ceph/ ! -uid 64045 -exec chown > ceph:ceph" do an better job?! Spawning a new chown process for every single file would be extremely inefficient, and xargs was designed to handle this scenario (see the -n option). What I did when I faced the same problem was something like this: cd /var/lib/ceph/osd for i in *; do chown -R ceph:ceph $i & done This will utilize most of the IO bw available while not wasting too much CPU. I assumed every file should be owned by ceph. (Of course care needs to be taken if there're other types of ceph files on the node to chown them as well.) Matyas ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience
On Sun, Mar 12, 2017 at 6:36 AM, Christian Theune wrote: > Hi, > > thanks for that report! Glad to hear a mostly happy report. I’m still on the > fence … ;) > > I have had reports that Qemu (librbd connections) will require > updates/restarts before upgrading. What was your experience on that side? > Did you upgrade the clients? Did you start using any of the new RBD > features, like fast diff? You don't need to restart qemu-kvm instances *before* upgrading but you do need to restart or migrate them *after* updating. The updated binaries are only loaded into the qemu process address space at start-up so to load the newly installed binaries (libraries) you need to restart or do a migration to an upgraded host. > > What’s your experience with load/performance after the upgrade? Found any > new issues that indicate shifted hotspots? > > Cheers and thanks again, > Christian > > On Mar 11, 2017, at 12:21 PM, cephmailingl...@mosibi.nl wrote: > > Hello list, > > A week ago we upgraded our Ceph clusters from Hammer to Jewel and with this > email we want to share our experiences. > > > We have four clusters: > > 1) Test cluster for all the fun things, completely virtual. > > 2) Test cluster for Openstack: 3 monitors and 9 OSDs, all baremetal > > 3) Cluster where we store backups: 3 monitors and 153 OSDs. 554 TB storage > > 4) Main cluster (used for our custom software stack and openstack): 5 > monitors and 1917 OSDs. 8 PB storage > > > All the clusters are running on Ubuntu 14.04 LTS and we use the Ceph > packages from ceph.com. On every cluster we upgraded the monitors first and > after that, the OSDs. Our backup cluster is the only cluster that also > serves S3 via the RadosGW and that service is upgraded at the same time as > the OSDs in that cluster. The upgrade of clusters 1, 2 and 3 went without > any problem, just an apt-get upgrade on every component. We did see the > message "failed to encode map e with expected crc", but that > message disappeared when all the OSDs where upgraded. > > The upgrade of our biggest cluster, nr 4, did not go without problems. Since > we where expecting a lot of "failed to encode map e with expected > crc" messages, we disabled clog to monitors with 'ceph tell osd.* injectargs > -- --clog_to_monitors=false' so our monitors would not choke in those > messages. The upgrade of the monitors did go as expected, without any > problem, the problems started when we started the upgrade of the OSDs. In > the upgrade procedure, we had to change the ownership of the files from root > to the user ceph and that process was taking so long on our cluster that > completing the upgrade would take more then a week. We decided to keep the > permissions as they where for now, so in the upstart init script > /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup ceph' to > '--setuser root --setgroup root' and fix that OSD by OSD after the upgrade > was completely done > > On cluster 3 (backup) we could change the permissions in a shorter time with > the following procedure: > > a) apt-get -y install ceph-common > b) mount|egrep 'on \/var.*ceph.*osd'|awk '{print $3}'|while read P; do > echo chown -R ceph:ceph $P \&;done > t ; bash t ; rm t > c) (wait for all the chown's to complete) > d) stop ceph-all > e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0 chown ceph:ceph > f) start ceph-all > > This procedure did not work on our main (4) cluster because the load on the > OSDs became 100% in step b and that resulted in blocked I/O on some virtual > instances in the Openstack cluster. Also at that time one of our pools got a > lot of extra data, those files where stored with root permissions since we > did not restarted the Ceph daemons yet, the 'find' in step e found so much > files that xargs (the shell) could not handle it (too many arguments). At > that time we decided to keep the permissions on root in the upgrade phase. > > The next and biggest problem we encountered had to do with the CRC errors on > the OSD map. On every map update, the OSDs that were not upgraded yet, got > that CRC error and asked the monitor for a full OSD map instead of just a > delta update. At first we did not understand what exactly happened, we ran > the upgrade per node using a script and in that script we watch the state of > the cluster and when the cluster is healthy again, we upgrade the next host. > Every time we started the script (skipping the already upgraded hosts) the > first host(s) upgraded without issues and then we got blocked I/O on the > cluster. The blocked I/O went away within a minute of 2 (not measured). > After investigation we found out that the blocked I/O happened when nodes > where asking the monitor for a (full) OSD map and that resulted shortly in a > full saturated network link on our monitor. > > In the next graph the statistics for one of our Ceph monitor is shown. Our > hosts are equipped with 10 gbit/s NIC's and every time at the highest peaks, > the
Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience
On 03/11/2017 09:49 PM, Udo Lembke wrote: Hi Udo, Perhaps would an "find /var/lib/ceph/ ! -uid 64045 -exec chown ceph:ceph" do an better job?! We did exactly that (and also tried other combinations) and that is a workaround for the 'argument too long' problem, but then it would call an exec for every file it finds. All those forks took forever... :) -- With regards, Richard Arends. Snow BV / http://snow.nl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience
On 03/11/2017 09:36 PM, Christian Theune wrote: Hello, I have had reports that Qemu (librbd connections) will require updates/restarts before upgrading. What was your experience on that side? Did you upgrade the clients? Did you start using any of the new RBD features, like fast diff? We have two types of clients, 1) Openstack hosts and components like Cinder and 2) clients that use librbd (from Java and C). We combine Ceph and Openstack on the same host, meaning that when we upgraded Ceph for the OSDs, the libraries for Openstack was updated at the same time. The other type of clients where already using the Jewel libraries and binaries for some time. We did not changed anything on the clients, so we are not using the newly introduced features (yet) What’s your experience with load/performance after the upgrade? Found any new issues that indicate shifted hotspots? We did not see any difference. -- With regards, Richard Arends. Snow BV / http://snow.nl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience
Hi, thanks for the usefull infos. On 11.03.2017 12:21, cephmailingl...@mosibi.nl wrote: > > Hello list, > > A week ago we upgraded our Ceph clusters from Hammer to Jewel and with > this email we want to share our experiences. > > ... > > > e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0 chown ceph:ceph > ... the 'find' in step e found so much files that xargs (the shell) > could not handle it (too many arguments). At that time we decided to > keep the permissions on root in the upgrade phase. > > Perhaps would an "find /var/lib/ceph/ ! -uid 64045 -exec chown ceph:ceph" do an better job?! Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience
Hi, thanks for that report! Glad to hear a mostly happy report. I’m still on the fence … ;) I have had reports that Qemu (librbd connections) will require updates/restarts before upgrading. What was your experience on that side? Did you upgrade the clients? Did you start using any of the new RBD features, like fast diff? What’s your experience with load/performance after the upgrade? Found any new issues that indicate shifted hotspots? Cheers and thanks again, Christian > On Mar 11, 2017, at 12:21 PM, cephmailingl...@mosibi.nl wrote: > > Hello list, > > A week ago we upgraded our Ceph clusters from Hammer to Jewel and with this > email we want to share our experiences. > > We have four clusters: > > 1) Test cluster for all the fun things, completely virtual. > > 2) Test cluster for Openstack: 3 monitors and 9 OSDs, all baremetal > > 3) Cluster where we store backups: 3 monitors and 153 OSDs. 554 TB storage > > 4) Main cluster (used for our custom software stack and openstack): 5 > monitors and 1917 OSDs. 8 PB storage > > > All the clusters are running on Ubuntu 14.04 LTS and we use the Ceph packages > from ceph.com. On every cluster we upgraded the monitors first and after > that, the OSDs. Our backup cluster is the only cluster that also serves S3 > via the RadosGW and that service is upgraded at the same time as the OSDs in > that cluster. The upgrade of clusters 1, 2 and 3 went without any problem, > just an apt-get upgrade on every component. We did see the message "failed > to encode map e with expected crc", but that message disappeared > when all the OSDs where upgraded. > The upgrade of our biggest cluster, nr 4, did not go without problems. Since > we where expecting a lot of "failed to encode map e with expected > crc" messages, we disabled clog to monitors with 'ceph tell osd.* injectargs > -- --clog_to_monitors=false' so our monitors would not choke in those > messages. The upgrade of the monitors did go as expected, without any > problem, the problems started when we started the upgrade of the OSDs. In the > upgrade procedure, we had to change the ownership of the files from root to > the user ceph and that process was taking so long on our cluster that > completing the upgrade would take more then a week. We decided to keep the > permissions as they where for now, so in the upstart init script > /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup ceph' to > '--setuser root --setgroup root' and fix that OSD by OSD after the upgrade > was completely done > > On cluster 3 (backup) we could change the permissions in a shorter time with > the following procedure: > > a) apt-get -y install ceph-common > b) mount|egrep 'on \/var.*ceph.*osd'|awk '{print $3}'|while read P; do > echo chown -R ceph:ceph $P \&;done > t ; bash t ; rm t > c) (wait for all the chown's to complete) > d) stop ceph-all > e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0 chown ceph:ceph > f) start ceph-all > > This procedure did not work on our main (4) cluster because the load on the > OSDs became 100% in step b and that resulted in blocked I/O on some virtual > instances in the Openstack cluster. Also at that time one of our pools got a > lot of extra data, those files where stored with root permissions since we > did not restarted the Ceph daemons yet, the 'find' in step e found so much > files that xargs (the shell) could not handle it (too many arguments). At > that time we decided to keep the permissions on root in the upgrade phase. > > The next and biggest problem we encountered had to do with the CRC errors on > the OSD map. On every map update, the OSDs that were not upgraded yet, got > that CRC error and asked the monitor for a full OSD map instead of just a > delta update. At first we did not understand what exactly happened, we ran > the upgrade per node using a script and in that script we watch the state of > the cluster and when the cluster is healthy again, we upgrade the next host. > Every time we started the script (skipping the already upgraded hosts) the > first host(s) upgraded without issues and then we got blocked I/O on the > cluster. The blocked I/O went away within a minute of 2 (not measured). After > investigation we found out that the blocked I/O happened when nodes where > asking the monitor for a (full) OSD map and that resulted shortly in a full > saturated network link on our monitor. > > In the next graph the statistics for one of our Ceph monitor is shown. Our > hosts are equipped with 10 gbit/s NIC's and every time at the highest peaks, > the problems occurred. We could work around this problem by waiting four > minutes between every host and after that time (14:20) we did not have any > issues any more. Of course the number of not upgraded OSDs decreased, so the > number of full OSD map requests also got smaller in time. > > > > > The day after the upgrade we had issues with live
[ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience
Hello list, A week ago we upgraded our Ceph clusters from Hammer to Jewel and with this email we want to share our experiences. We have four clusters: 1) Test cluster for all the fun things, completely virtual. 2) Test cluster for Openstack: 3 monitors and 9 OSDs, all baremetal 3) Cluster where we store backups: 3 monitors and 153 OSDs. 554 TB storage 4) Main cluster (used for our custom software stack and openstack): 5 monitors and 1917 OSDs. 8 PB storage All the clusters are running on Ubuntu 14.04 LTS and we use the Ceph packages from ceph.com. On every cluster we upgraded the monitors first and after that, the OSDs. Our backup cluster is the only cluster that also serves S3 via the RadosGW and that service is upgraded at the same time as the OSDs in that cluster. The upgrade of clusters 1, 2 and 3 went without any problem, just an apt-get upgrade on every component. We did see the message "failed to encode map e with expected crc", but that message disappeared when all the OSDs where upgraded. The upgrade of our biggest cluster, nr 4, did not go without problems. Since we where expecting a lot of "failed to encode map e with expected crc" messages, we disabled clog to monitors with 'ceph tell osd.* injectargs -- --clog_to_monitors=false' so our monitors would not choke in those messages. The upgrade of the monitors did go as expected, without any problem, the problems started when we started the upgrade of the OSDs. In the upgrade procedure, we had to change the ownership of the files from root to the user ceph and that process was taking so long on our cluster that completing the upgrade would take more then a week. We decided to keep the permissions as they where for now, so in the upstart init script /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup ceph' to '--setuser root --setgroup root' and fix that OSD by OSD after the upgrade was completely done On cluster 3 (backup) we could change the permissions in a shorter time with the following procedure: a) apt-get -y install ceph-common b) mount|egrep 'on \/var.*ceph.*osd'|awk '{print $3}'|while read P; do echo chown -R ceph:ceph $P \&;done > t ; bash t ; rm t c) (wait for all the chown's to complete) d) stop ceph-all e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0 chown ceph:ceph f) start ceph-all This procedure did not work on our main (4) cluster because the load on the OSDs became 100% in step b and that resulted in blocked I/O on some virtual instances in the Openstack cluster. Also at that time one of our pools got a lot of extra data, those files where stored with root permissions since we did not restarted the Ceph daemons yet, the 'find' in step e found so much files that xargs (the shell) could not handle it (too many arguments). At that time we decided to keep the permissions on root in the upgrade phase. The next and biggest problem we encountered had to do with the CRC errors on the OSD map. On every map update, the OSDs that were not upgraded yet, got that CRC error and asked the monitor for a full OSD map instead of just a delta update. At first we did not understand what exactly happened, we ran the upgrade per node using a script and in that script we watch the state of the cluster and when the cluster is healthy again, we upgrade the next host. Every time we started the script (skipping the already upgraded hosts) the first host(s) upgraded without issues and then we got blocked I/O on the cluster. The blocked I/O went away within a minute of 2 (not measured). After investigation we found out that the blocked I/O happened when nodes where asking the monitor for a (full) OSD map and that resulted shortly in a full saturated network link on our monitor. In the next graph the statistics for one of our Ceph monitor is shown. Our hosts are equipped with 10 gbit/s NIC's and every time at the highest peaks, the problems occurred. We could work around this problem by waiting four minutes between every host and after that time (14:20) we did not have any issues any more. Of course the number of not upgraded OSDs decreased, so the number of full OSD map requests also got smaller in time. The day after the upgrade we had issues with live migrations of Openstack instances. We got this message, "OSError: /usr/lib/librbd.so.1: undefined symbol: _ZN8librados5Rados15aio_watch_flushEPNS_13AioCompletionE". This is resolved by restarting libvirt-bin and nova-compute on every compute node. Please notice that the upgrade of our biggest cluster was not a 100% success, but the problems where relative small and the cluster stayed on-line and there where only a few virtual openstack instances that did not like the blocked I/O and had to be restarted. -- With regards, Richard Arends. Snow BV / http://snow.nl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com