subject:"\[ceph\-users\] Upgrading 2K OSDs from Hammer to Jewel. Our experience"

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-19 Thread Simon Leinen

cephmailinglist  writes:
> e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  chown ceph:ceph
> [...]

> [...] Also at that time one of our pools got a lot of extra data,
> those files where stored with root permissions since we did not
> restarted the Ceph daemons yet, the 'find' in step e found so much
> files that xargs (the shell) could not handle it (too many arguments).

I've always found it disappointing that xargs behaves like this on many
GNU/Linux distributions.  I always thought xargs's main purpose in life
was to know how many arguments can safely be passed to a process...

Anyway, you should be able to limit the number of arguments per
invocation by adding something like "-n 100" to the xargs command line.

Thanks for sharing your upgrade experiences!
-- 
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-14 Thread George Mihaiescu

Hi,

We initially upgraded from Hammer to Jewel while keeping the ownership
unchanged, by adding  "setuser match path =
/var/lib/ceph/$type/$cluster-$id" in ceph.conf


Later, we used the following steps to change from running as root to
running as ceph.

On the storage nodes, we ran the following command that doesn't change
permissions, but caches the filesystem (based on
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-November/006013.html
)

find /var/lib/ceph/osd -maxdepth 1 -mindepth 1 -print | xargs -P12 -n1
chown -R root:root

Set noout:
ceph osd set noout

On Storage node:
Edited "/etc/ceph/ceph.conf" and commented out #setuser match path =
/var/lib/ceph/$type/$cluster-$id
stop ceph-osd-all
find /var/lib/ceph/osd -maxdepth 1 -mindepth 1 -print | xargs -P12 -n1
chown -R ceph:ceph
chown -R ceph:ceph /var/lib/ceph/
start ceph-osd-all

Check that all the Ceph OSD processes are running:
ps aux | grep ceph | egrep –v grep

Unset "noout":
ceph osd unset noout

Wait till ceph is healthy again and continue with the next storage node.

The OSDs were down for about 2 min because we ran the find command before
hand and used xargs with 12 parallel processes, so recovery time was quick
as well.

We have more than 850 OSDs and the entire process went pretty smooth by
doing one storage server at a time.



On Tue, Mar 14, 2017 at 3:27 AM, Richard Arends 
wrote:

> On 03/13/2017 02:02 PM, Christoph Adomeit wrote:
>
> Christoph,
>
> Thanks for the detailed upgrade report.
>>
>> We have another scenario: We have allready upgraded to jewel 10.2.6 but
>> we are still running all our monitors and osd daemons as root using the
>> setuser match path directive.
>>
>> What would be the recommended way to have all daemons running as
>> ceph:ceph user ?
>>
>> Could we chown -R the monitor and osd data directories under
>> /var/lib/ceph one by one while keeping up service ?
>>
>
> Yes. To minimize the down time, you can do the chown twice. Once before
> restarting the daemons, while they are running with root user permissions.
> Then stop the daemons, do the chown again, but then only on the changed
> files (find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  chown ceph:ceph)
> and start the Ceph daemons with setuser and setgroup set to ceph
>
>
>
> --
> With regards,
>
> Richard Arends.
> Snow BV / http://snow.nl
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-14 Thread Richard Arends


On 03/13/2017 02:02 PM, Christoph Adomeit wrote:

Christoph,


Thanks for the detailed upgrade report.

We have another scenario: We have allready upgraded to jewel 10.2.6 but
we are still running all our monitors and osd daemons as root using the
setuser match path directive.

What would be the recommended way to have all daemons running as ceph:ceph user 
?

Could we chown -R the monitor and osd data directories under /var/lib/ceph one 
by one while keeping up service ?


Yes. To minimize the down time, you can do the chown twice. Once before 
restarting the daemons, while they are running with root user 
permissions. Then stop the daemons, do the chown again, but then only on 
the changed files (find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  
chown ceph:ceph) and start the Ceph daemons with setuser and setgroup 
set to ceph




--
With regards,

Richard Arends.
Snow BV / http://snow.nl

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-14 Thread Richard Arends


On 03/12/2017 07:54 PM, Florian Haas wrote:

Florian,



For others following this thread who still have the hammer→jewel upgrade
ahead: there is a ceph.conf option you can use here; no need to fiddle
with the upstart scripts.

setuser match path = /var/lib/ceph/$type/$cluster-$id

Ah, i did not know this option. Good tip!


--
With regards,

Richard Arends.
Snow BV / http://snow.nl

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-13 Thread Christoph Adomeit

Thanks for the detailed upgrade report.

We have another scenario: We have allready upgraded to jewel 10.2.6 but 
we are still running all our monitors and osd daemons as root using the
setuser match path directive. 

What would be the recommended way to have all daemons running as ceph:ceph user 
?

Could we chown -R the monitor and osd data directories under /var/lib/ceph one 
by one while keeping up service ?

Thanks
  Christoph

On Sat, Mar 11, 2017 at 12:21:38PM +0100, cephmailingl...@mosibi.nl wrote:
> Hello list,
> 
> A week ago we upgraded our Ceph clusters from Hammer to Jewel and with this
> email we want to share our experiences.
> 
-- 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-13 Thread Piotr Dałek


On 03/13/2017 11:07 AM, Dan van der Ster wrote:

On Sat, Mar 11, 2017 at 12:21 PM,  wrote:


The next and biggest problem we encountered had to do with the CRC errors on 
the OSD map. On every map update, the OSDs that were not upgraded yet, got that 
CRC error and asked the monitor for a full OSD map instead of just a delta 
update. At first we did not understand what exactly happened, we ran the 
upgrade per node using a script and in that script we watch the state of the 
cluster and when the cluster is healthy again, we upgrade the next host. Every 
time we started the script (skipping the already upgraded hosts) the first 
host(s) upgraded without issues and then we got blocked I/O on the cluster. The 
blocked I/O went away within a minute of 2 (not measured). After investigation 
we found out that the blocked I/O happened when nodes where asking the monitor 
for a (full) OSD map and that resulted shortly in a full saturated network link 
on our monitor.



Thanks for the detailed upgrade report. I wanted to zoom in on this
CRC/fullmap issue because it could be quite disruptive for us when we
upgrade from hammer to jewel.

I've read various reports that the fool proof way to avoid the full
map DoS would be to upgrade all OSDs to jewel before the mon's.
Did anyone have success with that workaround? I'm cc'ing Bryan because
he knows this issue very well.


With https://github.com/ceph/ceph/pull/13131 merged into 10.2.6, this issue 
shouldn't be a problem (at least we don't see it anymore).


--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-13 Thread Dan van der Ster

On Sat, Mar 11, 2017 at 12:21 PM,  wrote:
>
> The next and biggest problem we encountered had to do with the CRC errors on 
> the OSD map. On every map update, the OSDs that were not upgraded yet, got 
> that CRC error and asked the monitor for a full OSD map instead of just a 
> delta update. At first we did not understand what exactly happened, we ran 
> the upgrade per node using a script and in that script we watch the state of 
> the cluster and when the cluster is healthy again, we upgrade the next host. 
> Every time we started the script (skipping the already upgraded hosts) the 
> first host(s) upgraded without issues and then we got blocked I/O on the 
> cluster. The blocked I/O went away within a minute of 2 (not measured). After 
> investigation we found out that the blocked I/O happened when nodes where 
> asking the monitor for a (full) OSD map and that resulted shortly in a full 
> saturated network link on our monitor.


Thanks for the detailed upgrade report. I wanted to zoom in on this
CRC/fullmap issue because it could be quite disruptive for us when we
upgrade from hammer to jewel.

I've read various reports that the fool proof way to avoid the full
map DoS would be to upgrade all OSDs to jewel before the mon's.
Did anyone have success with that workaround? I'm cc'ing Bryan because
he knows this issue very well.

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread Christian Balzer


Hello,

On Sun, 12 Mar 2017 19:54:10 +0100 Florian Haas wrote:

> On Sat, Mar 11, 2017 at 12:21 PM,  wrote:
> > The upgrade of our biggest cluster, nr 4, did not go without
> > problems. Since we where expecting a lot of "failed to encode map
> > e with expected crc" messages, we disabled clog to monitors
> > with 'ceph tell osd.* injectargs -- --clog_to_monitors=false' so our
> > monitors would not choke in those messages. The upgrade of the
> > monitors did go as expected, without any problem, the problems
> > started when we started the upgrade of the OSDs. In the upgrade
> > procedure, we had to change the ownership of the files from root to
> > the user ceph and that process was taking so long on our cluster that
> > completing the upgrade would take more then a week. We decided to
> > keep the permissions as they where for now, so in the upstart init
> > script /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup
> > ceph' to  '--setuser root --setgroup root' and fix that OSD by OSD
> > after the upgrade was completely done  
> 
> For others following this thread who still have the hammer→jewel upgrade
> ahead: there is a ceph.conf option you can use here; no need to fiddle
> with the upstart scripts.
> 
> setuser match path = /var/lib/ceph/$type/$cluster-$id
>

Yes, I was thinking about mentioning this, too.
Alas in my experience with a wonky test cluster this failed with MDS,
maybe because of an odd name, maybe because nobody ever tested it.
MONs and OSDs were fine.
 
> What this will do is it will check which user owns files in the
> respective directories, and then start your Ceph daemons under the
> appropriate user and group IDs. In other words, if you enable this and
> you upgrade from Hammer to Jewel, and your files are still owned by
> root, your daemons will also continue run as root:root (as they did in
> hammer). Then, you can stop your OSDs, run the recursive chown, and
> restart the OSDs one-by-one. When they come back up, they will just
> automatically switch to running as ceph:ceph.
> 
Though if you have external journals and didn't use ceph-deploy, you're
boned with the whole ceph:ceph approach.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread Christian Balzer


Hello,

On Sun, 12 Mar 2017 19:52:12 +1000 Brad Hubbard wrote:

> On Sun, Mar 12, 2017 at 6:36 AM, Christian Theune  
> wrote:
> > Hi,
> >
> > thanks for that report! Glad to hear a mostly happy report. I’m still on the
> > fence … ;)
> >
> > I have had reports that Qemu (librbd connections) will require
> > updates/restarts before upgrading. What was your experience on that side?
> > Did you upgrade the clients? Did you start using any of the new RBD
> > features, like fast diff?  
> 
> You don't need to restart qemu-kvm instances *before* upgrading but
> you do need to restart or migrate them *after* updating. The updated
> binaries are only loaded into the qemu process address space at
> start-up so to load the newly installed binaries (libraries) you need
> to restart or do a migration to an upgraded host.
> 

Well, the OP wrote about live migration problems, but those were not in the
qemu part of things but libvirt/openstack related.

To wit, I did upgrade a test cluster from hammer to Jewel and live
migration under ganeti worked fine.

I've also not seen any problems on other instances that since have not
been restarted, nor would I hope that an upgrade from one stable version
to the next should EVER require such a step (at least immediately). 

Christian

> >
> > What’s your experience with load/performance after the upgrade? Found any
> > new issues that indicate shifted hotspots?
> >
> > Cheers and thanks again,
> > Christian
> >
> > On Mar 11, 2017, at 12:21 PM, cephmailingl...@mosibi.nl wrote:
> >
> > Hello list,
> >
> > A week ago we upgraded our Ceph clusters from Hammer to Jewel and with this
> > email we want to share our experiences.
> >
> >
> > We have four clusters:
> >
> > 1) Test cluster for all the fun things, completely virtual.
> >
> > 2) Test cluster for Openstack: 3 monitors and 9 OSDs, all baremetal
> >
> > 3) Cluster where we store backups: 3 monitors and 153 OSDs. 554 TB storage
> >
> > 4) Main cluster (used for our custom software stack and openstack): 5
> > monitors and 1917 OSDs. 8 PB storage
> >
> >
> > All the clusters are running on Ubuntu 14.04 LTS and we use the Ceph
> > packages from ceph.com. On every cluster we upgraded the monitors first and
> > after that, the OSDs. Our backup cluster is the only cluster that also
> > serves S3 via the RadosGW and that service is upgraded at the same time as
> > the OSDs in that cluster. The upgrade of clusters 1, 2 and 3 went without
> > any problem, just an apt-get upgrade on every component. We did  see the
> > message "failed to encode map e with expected crc", but that
> > message disappeared when all the OSDs where upgraded.
> >
> > The upgrade of our biggest cluster, nr 4, did not go without problems. Since
> > we where expecting a lot of "failed to encode map e with expected
> > crc" messages, we disabled clog to monitors with 'ceph tell osd.* injectargs
> > -- --clog_to_monitors=false' so our monitors would not choke in those
> > messages. The upgrade of the monitors did go as expected, without any
> > problem, the problems started when we started the upgrade of the OSDs. In
> > the upgrade procedure, we had to change the ownership of the files from root
> > to the user ceph and that process was taking so long on our cluster that
> > completing the upgrade would take more then a week. We decided to keep the
> > permissions as they where for now, so in the upstart init script
> > /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup ceph' to
> > '--setuser root --setgroup root' and fix that OSD by OSD after the upgrade
> > was completely done
> >
> > On cluster 3 (backup) we could change the permissions in a shorter time with
> > the following procedure:
> >
> > a) apt-get -y install ceph-common
> > b) mount|egrep 'on \/var.*ceph.*osd'|awk '{print $3}'|while read P; do
> > echo chown -R ceph:ceph $P \&;done > t ; bash t ; rm t
> > c) (wait for all the chown's to complete)
> > d) stop ceph-all
> > e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  chown ceph:ceph
> > f) start ceph-all
> >
> > This procedure did not work on our main (4) cluster because the load on the
> > OSDs became 100% in step b and that resulted in blocked I/O on some virtual
> > instances in the Openstack cluster. Also at that time one of our pools got a
> > lot of extra data, those files where stored with root permissions since we
> > did not restarted the Ceph daemons yet, the 'find' in step e found so much
> > files that xargs (the shell) could not handle it (too many arguments). At
> > that time we decided to keep the permissions on root in the upgrade phase.
> >
> > The next and biggest problem we encountered had to do with the CRC errors on
> > the OSD map. On every map update, the OSDs that were not upgraded yet, got
> > that CRC error and asked the monitor for a full OSD map instead of just a
> > delta update. At first we did not understand what exactly happened, we ran
> > the upgrade per node using a script a

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread Florian Haas

On Sat, Mar 11, 2017 at 12:21 PM,  wrote:
> The upgrade of our biggest cluster, nr 4, did not go without
> problems. Since we where expecting a lot of "failed to encode map
> e with expected crc" messages, we disabled clog to monitors
> with 'ceph tell osd.* injectargs -- --clog_to_monitors=false' so our
> monitors would not choke in those messages. The upgrade of the
> monitors did go as expected, without any problem, the problems
> started when we started the upgrade of the OSDs. In the upgrade
> procedure, we had to change the ownership of the files from root to
> the user ceph and that process was taking so long on our cluster that
> completing the upgrade would take more then a week. We decided to
> keep the permissions as they where for now, so in the upstart init
> script /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup
> ceph' to  '--setuser root --setgroup root' and fix that OSD by OSD
> after the upgrade was completely done

For others following this thread who still have the hammer→jewel upgrade
ahead: there is a ceph.conf option you can use here; no need to fiddle
with the upstart scripts.

setuser match path = /var/lib/ceph/$type/$cluster-$id

What this will do is it will check which user owns files in the
respective directories, and then start your Ceph daemons under the
appropriate user and group IDs. In other words, if you enable this and
you upgrade from Hammer to Jewel, and your files are still owned by
root, your daemons will also continue run as root:root (as they did in
hammer). Then, you can stop your OSDs, run the recursive chown, and
restart the OSDs one-by-one. When they come back up, they will just
automatically switch to running as ceph:ceph.

Cheers,
Florian

signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread Matyas Koszik



On Sat, 11 Mar 2017, Udo Lembke wrote:

> On 11.03.2017 12:21, cephmailingl...@mosibi.nl wrote:
> > ...
> >
> >
> > e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  chown ceph:ceph
> > ... the 'find' in step e found so much files that xargs (the shell)
> > could not handle it (too many arguments). At that time we decided to
> > keep the permissions on root in the upgrade phase.
> >
> >
> Perhaps would an "find /var/lib/ceph/ ! -uid 64045 -exec chown
> ceph:ceph" do an better job?!
Spawning a new chown process for every single file would be extremely
inefficient, and xargs was designed to handle this scenario (see the -n
option).
What I did when I faced the same problem was something like this:
cd /var/lib/ceph/osd
for i in *; do chown -R ceph:ceph $i & done

This will utilize most of the IO bw available while not wasting too much
CPU. I assumed every file should be owned by ceph. (Of course care needs
to be taken if there're other types of ceph files on the node to chown them
as well.)


Matyas


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread Brad Hubbard

On Sun, Mar 12, 2017 at 6:36 AM, Christian Theune  wrote:
> Hi,
>
> thanks for that report! Glad to hear a mostly happy report. I’m still on the
> fence … ;)
>
> I have had reports that Qemu (librbd connections) will require
> updates/restarts before upgrading. What was your experience on that side?
> Did you upgrade the clients? Did you start using any of the new RBD
> features, like fast diff?

You don't need to restart qemu-kvm instances *before* upgrading but
you do need to restart or migrate them *after* updating. The updated
binaries are only loaded into the qemu process address space at
start-up so to load the newly installed binaries (libraries) you need
to restart or do a migration to an upgraded host.

>
> What’s your experience with load/performance after the upgrade? Found any
> new issues that indicate shifted hotspots?
>
> Cheers and thanks again,
> Christian
>
> On Mar 11, 2017, at 12:21 PM, cephmailingl...@mosibi.nl wrote:
>
> Hello list,
>
> A week ago we upgraded our Ceph clusters from Hammer to Jewel and with this
> email we want to share our experiences.
>
>
> We have four clusters:
>
> 1) Test cluster for all the fun things, completely virtual.
>
> 2) Test cluster for Openstack: 3 monitors and 9 OSDs, all baremetal
>
> 3) Cluster where we store backups: 3 monitors and 153 OSDs. 554 TB storage
>
> 4) Main cluster (used for our custom software stack and openstack): 5
> monitors and 1917 OSDs. 8 PB storage
>
>
> All the clusters are running on Ubuntu 14.04 LTS and we use the Ceph
> packages from ceph.com. On every cluster we upgraded the monitors first and
> after that, the OSDs. Our backup cluster is the only cluster that also
> serves S3 via the RadosGW and that service is upgraded at the same time as
> the OSDs in that cluster. The upgrade of clusters 1, 2 and 3 went without
> any problem, just an apt-get upgrade on every component. We did  see the
> message "failed to encode map e with expected crc", but that
> message disappeared when all the OSDs where upgraded.
>
> The upgrade of our biggest cluster, nr 4, did not go without problems. Since
> we where expecting a lot of "failed to encode map e with expected
> crc" messages, we disabled clog to monitors with 'ceph tell osd.* injectargs
> -- --clog_to_monitors=false' so our monitors would not choke in those
> messages. The upgrade of the monitors did go as expected, without any
> problem, the problems started when we started the upgrade of the OSDs. In
> the upgrade procedure, we had to change the ownership of the files from root
> to the user ceph and that process was taking so long on our cluster that
> completing the upgrade would take more then a week. We decided to keep the
> permissions as they where for now, so in the upstart init script
> /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup ceph' to
> '--setuser root --setgroup root' and fix that OSD by OSD after the upgrade
> was completely done
>
> On cluster 3 (backup) we could change the permissions in a shorter time with
> the following procedure:
>
> a) apt-get -y install ceph-common
> b) mount|egrep 'on \/var.*ceph.*osd'|awk '{print $3}'|while read P; do
> echo chown -R ceph:ceph $P \&;done > t ; bash t ; rm t
> c) (wait for all the chown's to complete)
> d) stop ceph-all
> e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  chown ceph:ceph
> f) start ceph-all
>
> This procedure did not work on our main (4) cluster because the load on the
> OSDs became 100% in step b and that resulted in blocked I/O on some virtual
> instances in the Openstack cluster. Also at that time one of our pools got a
> lot of extra data, those files where stored with root permissions since we
> did not restarted the Ceph daemons yet, the 'find' in step e found so much
> files that xargs (the shell) could not handle it (too many arguments). At
> that time we decided to keep the permissions on root in the upgrade phase.
>
> The next and biggest problem we encountered had to do with the CRC errors on
> the OSD map. On every map update, the OSDs that were not upgraded yet, got
> that CRC error and asked the monitor for a full OSD map instead of just a
> delta update. At first we did not understand what exactly happened, we ran
> the upgrade per node using a script and in that script we watch the state of
> the cluster and when the cluster is healthy again, we upgrade the next host.
> Every time we started the script (skipping the already upgraded hosts) the
> first host(s) upgraded without issues and then we got blocked I/O on the
> cluster. The blocked I/O went away within a minute of 2 (not measured).
> After investigation we found out that the blocked I/O happened when nodes
> where asking the monitor for a (full) OSD map and that resulted shortly in a
> full saturated network link on our monitor.
>
> In the next graph the statistics for one of our Ceph monitor is shown. Our
> hosts are equipped with 10 gbit/s NIC's and every time at the highest peaks,
> the

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread cephmailinglist


On 03/11/2017 09:49 PM, Udo Lembke wrote:

Hi Udo,

Perhaps would an "find /var/lib/ceph/ ! -uid 64045 -exec chown
ceph:ceph" do an better job?!


We did exactly that (and also tried other combinations) and that is a 
workaround for the 'argument too long' problem, but then it would call 
an exec for every file it finds. All those forks took forever... :)



--
With regards,

Richard Arends.
Snow BV / http://snow.nl

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread cephmailinglist


On 03/11/2017 09:36 PM, Christian Theune wrote:

Hello,

I have had reports that Qemu (librbd connections) will require 
updates/restarts before upgrading. What was your experience on that 
side? Did you upgrade the clients? Did you start using any of the new 
RBD features, like fast diff?


We have two types of clients, 1) Openstack hosts and components like 
Cinder and 2) clients that use librbd (from Java and C). We combine Ceph 
and Openstack on the same host, meaning that when we upgraded Ceph for 
the OSDs, the libraries for Openstack was updated at the same time. The 
other type of clients where already using the Jewel libraries and 
binaries for some time. We did not changed anything on the clients, so 
we are not using the newly introduced features (yet)


What’s your experience with load/performance after the upgrade? Found 
any new issues that indicate shifted hotspots?


We did not see any difference.

--
With regards,

Richard Arends.
Snow BV / http://snow.nl

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-11 Thread Udo Lembke

Hi,

thanks for the usefull infos.


On 11.03.2017 12:21, cephmailingl...@mosibi.nl wrote:
>
> Hello list,
>
> A week ago we upgraded our Ceph clusters from Hammer to Jewel and with
> this email we want to share our experiences.
>
> ...
>
>
> e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  chown ceph:ceph
> ... the 'find' in step e found so much files that xargs (the shell)
> could not handle it (too many arguments). At that time we decided to
> keep the permissions on root in the upgrade phase.
>
>
Perhaps would an "find /var/lib/ceph/ ! -uid 64045 -exec chown
ceph:ceph" do an better job?!

Udo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-11 Thread Christian Theune

Hi,

thanks for that report! Glad to hear a mostly happy report. I’m still on the 
fence … ;)

I have had reports that Qemu (librbd connections) will require updates/restarts 
before upgrading. What was your experience on that side? Did you upgrade the 
clients? Did you start using any of the new RBD features, like fast diff?

What’s your experience with load/performance after the upgrade? Found any new 
issues that indicate shifted hotspots?

Cheers and thanks again,
Christian

> On Mar 11, 2017, at 12:21 PM, cephmailingl...@mosibi.nl wrote:
> 
> Hello list,
> 
> A week ago we upgraded our Ceph clusters from Hammer to Jewel and with this 
> email we want to share our experiences.
> 
> We have four clusters:
> 
> 1) Test cluster for all the fun things, completely virtual.
> 
> 2) Test cluster for Openstack: 3 monitors and 9 OSDs, all baremetal
> 
> 3) Cluster where we store backups: 3 monitors and 153 OSDs. 554 TB storage
> 
> 4) Main cluster (used for our custom software stack and openstack): 5 
> monitors and 1917 OSDs. 8 PB storage
> 
> 
> All the clusters are running on Ubuntu 14.04 LTS and we use the Ceph packages 
> from ceph.com. On every cluster we upgraded the monitors first and after 
> that, the OSDs. Our backup cluster is the only cluster that also serves S3 
> via the RadosGW and that service is upgraded at the same time as the OSDs in 
> that cluster. The upgrade of clusters 1, 2 and 3 went without any problem, 
> just an apt-get upgrade on every component. We did  see the message "failed 
> to encode map e with expected crc", but that message disappeared 
> when all the OSDs where upgraded.
> The upgrade of our biggest cluster, nr 4, did not go without problems. Since 
> we where expecting a lot of "failed to encode map e with expected 
> crc" messages, we disabled clog to monitors with 'ceph tell osd.* injectargs 
> -- --clog_to_monitors=false' so our monitors would not choke in those 
> messages. The upgrade of the monitors did go as expected, without any 
> problem, the problems started when we started the upgrade of the OSDs. In the 
> upgrade procedure, we had to change the ownership of the files from root to 
> the user ceph and that process was taking so long on our cluster that 
> completing the upgrade would take more then a week. We decided to keep the 
> permissions as they where for now, so in the upstart init script 
> /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup ceph' to  
> '--setuser root --setgroup root' and fix that OSD by OSD after the upgrade 
> was completely done
> 
> On cluster 3 (backup) we could change the permissions in a shorter time with 
> the following procedure:
> 
> a) apt-get -y install ceph-common
> b) mount|egrep 'on \/var.*ceph.*osd'|awk '{print $3}'|while read P; do 
> echo chown -R ceph:ceph $P \&;done > t ; bash t ; rm t
> c) (wait for all the chown's to complete)
> d) stop ceph-all
> e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  chown ceph:ceph
> f) start ceph-all
> 
> This procedure did not work on our main (4) cluster because the load on the 
> OSDs became 100% in step b and that resulted in blocked I/O on some virtual 
> instances in the Openstack cluster. Also at that time one of our pools got a 
> lot of extra data, those files where stored with root permissions since we 
> did not restarted the Ceph daemons yet, the 'find' in step e found so much 
> files that xargs (the shell) could not handle it (too many arguments). At 
> that time we decided to keep the permissions on root in the upgrade phase.
> 
> The next and biggest problem we encountered had to do with the CRC errors on 
> the OSD map. On every map update, the OSDs that were not upgraded yet, got 
> that CRC error and asked the monitor for a full OSD map instead of just a 
> delta update. At first we did not understand what exactly happened, we ran 
> the upgrade per node using a script and in that script we watch the state of 
> the cluster and when the cluster is healthy again, we upgrade the next host. 
> Every time we started the script (skipping the already upgraded hosts) the 
> first host(s) upgraded without issues and then we got blocked I/O on the 
> cluster. The blocked I/O went away within a minute of 2 (not measured). After 
> investigation we found out that the blocked I/O happened when nodes where 
> asking the monitor for a (full) OSD map and that resulted shortly in a full 
> saturated network link on our monitor.
> 
> In the next graph the statistics for one of our Ceph monitor is shown. Our 
> hosts are equipped with 10 gbit/s NIC's and every time at the highest peaks, 
> the problems occurred. We could work around this problem by waiting four 
> minutes between every host and after that time (14:20) we did not have any 
> issues any more. Of course the number of not upgraded OSDs decreased, so the 
> number of full OSD map requests also got smaller in time.
> 
> 
> 
> 
> The day after the upgrade we had issues with live

[ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-11 Thread cephmailinglist


Hello list,

A week ago we upgraded our Ceph clusters from Hammer to Jewel and with 
this email we want to share our experiences.



We have four clusters:

1) Test cluster for all the fun things, completely virtual.

2) Test cluster for Openstack: 3 monitors and 9 OSDs, all baremetal

3) Cluster where we store backups: 3 monitors and 153 OSDs. 554 TB storage

4) Main cluster (used for our custom software stack and openstack): 5 
monitors and 1917 OSDs. 8 PB storage



All the clusters are running on Ubuntu 14.04 LTS and we use the Ceph 
packages from ceph.com. On every cluster we upgraded the monitors first 
and after that, the OSDs. Our backup cluster is the only cluster that 
also serves S3 via the RadosGW and that service is upgraded at the same 
time as the OSDs in that cluster. The upgrade of clusters 1, 2 and 3 
went without any problem, just an apt-get upgrade on every component. We 
did  see the message "failed to encode map e with expected 
crc", but that message disappeared when all the OSDs where upgraded.


The upgrade of our biggest cluster, nr 4, did not go without problems. 
Since we where expecting a lot of "failed to encode map e with 
expected crc" messages, we disabled clog to monitors with 'ceph tell 
osd.* injectargs -- --clog_to_monitors=false' so our monitors would not 
choke in those messages. The upgrade of the monitors did go as expected, 
without any problem, the problems started when we started the upgrade of 
the OSDs. In the upgrade procedure, we had to change the ownership of 
the files from root to the user ceph and that process was taking so long 
on our cluster that completing the upgrade would take more then a week. 
We decided to keep the permissions as they where for now, so in the 
upstart init script /etc/init/ceph-osd.conf, we changed '--setuser ceph 
--setgroup ceph' to  '--setuser root --setgroup root' and fix that OSD 
by OSD after the upgrade was completely done


On cluster 3 (backup) we could change the permissions in a shorter time 
with the following procedure:


a) apt-get -y install ceph-common
b) mount|egrep 'on \/var.*ceph.*osd'|awk '{print $3}'|while read P; 
do echo chown -R ceph:ceph $P \&;done > t ; bash t ; rm t

c) (wait for all the chown's to complete)
d) stop ceph-all
e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  chown ceph:ceph
f) start ceph-all

This procedure did not work on our main (4) cluster because the load on 
the OSDs became 100% in step b and that resulted in blocked I/O on some 
virtual instances in the Openstack cluster. Also at that time one of our 
pools got a lot of extra data, those files where stored with root 
permissions since we did not restarted the Ceph daemons yet, the 'find' 
in step e found so much files that xargs (the shell) could not handle it 
(too many arguments). At that time we decided to keep the permissions on 
root in the upgrade phase.


The next and biggest problem we encountered had to do with the CRC 
errors on the OSD map. On every map update, the OSDs that were not 
upgraded yet, got that CRC error and asked the monitor for a full OSD 
map instead of just a delta update. At first we did not understand what 
exactly happened, we ran the upgrade per node using a script and in that 
script we watch the state of the cluster and when the cluster is healthy 
again, we upgrade the next host. Every time we started the script 
(skipping the already upgraded hosts) the first host(s) upgraded without 
issues and then we got blocked I/O on the cluster. The blocked I/O went 
away within a minute of 2 (not measured). After investigation we found 
out that the blocked I/O happened when nodes where asking the monitor 
for a (full) OSD map and that resulted shortly in a full saturated 
network link on our monitor.


In the next graph the statistics for one of our Ceph monitor is shown. 
Our hosts are equipped with 10 gbit/s NIC's and every time at the 
highest peaks, the problems occurred. We could work around this problem 
by waiting four minutes between every host and after that time (14:20) 
we did not have any issues any more. Of course the number of not 
upgraded OSDs decreased, so the number of full OSD map requests also got 
smaller in time.




The day after the upgrade we had issues with live migrations of 
Openstack instances. We got this message, "OSError: 
/usr/lib/librbd.so.1: undefined symbol: 
_ZN8librados5Rados15aio_watch_flushEPNS_13AioCompletionE". This is 
resolved by restarting libvirt-bin and nova-compute on every compute node.


Please notice that the upgrade of our biggest cluster was not a 100% 
success, but the problems where relative small and the cluster stayed 
on-line and there where only a few virtual openstack instances that did 
not like the blocked I/O and had to be restarted.



--

With regards,

Richard Arends.
Snow BV / http://snow.nl

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

[ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

17 matches

Site Navigation

Mail list logo

Footer information