date:20170311

Re: [ceph-users] Ceph with RDMA

2017-03-11 Thread PR PR

Thanks Haomai. I am also getting below error when I use ceph-disk. Any
pointers?

  File "/usr/bin/ceph-disk", line 5, in 
from pkg_resources import load_entry_point
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line
2927, in 
@_call_aside
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line
2913, in _call_aside
f(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line
2940, in _initialize_master_working_set
working_set = WorkingSet._build_master()
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line
635, in _build_master
ws.require(__requires__)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line
943, in require
needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line
829, in resolve
raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'ceph-disk==1.0.0' distribution was
not found and is required by the application

On Fri, Mar 10, 2017 at 7:10 PM, Haomai Wang  wrote:

> On Sat, Mar 11, 2017 at 10:29 AM, PR PR  wrote:
> > Thanks for the quick reply. I tried it with master as well. Followed
> > instructions on this link - https://community.mellanox.com/docs/DOC-2721
> >
> > Ceph mon fails to start with error "unrecognized ms_type 'async+rdma'"
>
> it must be ceph-mon doesn't compile with rdma support
>
> >
> > Appreciate any pointers.
> >
> > On Thu, Mar 9, 2017 at 5:56 PM, Haomai Wang  wrote:
> >>
> >> On Fri, Mar 10, 2017 at 4:28 AM, PR PR  wrote:
> >> > Hi,
> >> >
> >> > I am trying to use ceph with RDMA. I have a few questions.
> >> >
> >> > 1. Is there a prebuilt package that has rdma support or the only way
> to
> >> > try
> >> > ceph+rdma is to checkout from github and compile from scratch?
> >> >
> >> > 2. Looks like there are two ways of using rdma - xio and async+rdma.
> >> > Which
> >> > is the recommended approach? Also, any insights on the differences
> will
> >> > be
> >> > useful as well.
> >> >
> >> > 3. async+rdma seems to have lot of recent changes. Is 11.2.0 expected
> to
> >> > work for async+rdma? As when I compiled 11.2.0 it fails with following
> >> > error
> >> >
> >>
> >> suggest checkout with master
> >>
> >> > [ 81%] Built target rbd
> >> > /mnt/ceph_compile/ceph/build/lib/libcephfs.so: undefined reference to
> >> > `ibv_free_device_list'
> >> > /mnt/ceph_compile/ceph/build/lib/libcephfs.so: undefined reference to
> >> > `ibv_get_cq_event'
> >> > /mnt/ceph_compile/ceph/build/lib/libcephfs.so: undefined reference to
> >> > `ibv_alloc_pd'
> >> > /mnt/ceph_compile/ceph/build/lib/libcephfs.so: undefined reference to
> >> > `ibv_close_device'
> >> > /mnt/ceph_compile/ceph/build/lib/libcephfs.so: undefined reference to
> >> > `ibv_destroy_qp'
> >> > /mnt/ceph_compile/ceph/build/lib/libcephfs.so: undefined reference to
> >> > `ibv_modify_qp'
> >> > /mnt/ceph_compile/ceph/build/lib/libcephfs.so: undefined reference to
> >> > `ibv_get_async_event'
> >> > ***snipped***
> >> > Link Error: Ceph FS library not found
> >> > src/pybind/cephfs/CMakeFiles/cython_cephfs.dir/build.make:57: recipe
> for
> >> > target 'src/pybind/cephfs/CMakeFiles/cython_cephfs' failed
> >> > make[2]: *** [src/pybind/cephfs/CMakeFiles/cython_cephfs] Error 1
> >> > CMakeFiles/Makefile2:4015: recipe for target
> >> > 'src/pybind/cephfs/CMakeFiles/cython_cephfs.dir/all' failed
> >> > make[1]: *** [src/pybind/cephfs/CMakeFiles/cython_cephfs.dir/all]
> Error
> >> > 2
> >> > make[1]: *** Waiting for unfinished jobs
> >> > [ 85%] Built target rgw_a
> >> >
> >> > Thanks,
> >> > PR
> >> >
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] apologies for the erroneous subject - should have been Re: Unable to boot OS on cluster node

2017-03-11 Thread Anthony D'Atri

A certain someone bumped my elbow as I typed, think in terms of this week’s 
family-bombed video going the rounds on FB.  My ignominy is boundless and my 
door now locked when replying.

— aad




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] pgs stuck inactive

2017-03-11 Thread Brad Hubbard

On Sat, Mar 11, 2017 at 7:43 PM, Laszlo Budai  wrote:
> Hello,
>
> Thank you for your answer.
>
> indeed the min_size is 1:
>
> # ceph osd pool get volumes size
> size: 3
> # ceph osd pool get volumes min_size
> min_size: 1
> #
> I'm gonna try to find the mentioned discussions on the mailing lists, and
> read them. If you have a link at hand, that would be nice if you would send
> it to me.

This thread is one example, there are lots more.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014846.html

>
> In the attached file you can see the contents of the directory containing PG
> data on the different OSDs (all that have appeared in the pg query).
> According to the md5sums the files are identical. What bothers me is the
> directory structure (you can see the ls -R in each dir that contains files).

So I mixed up 63 and 68, my list should have read 2, 28, 35 and 63
since 68 is listed as empty in the pg query.

>
> Where can I read about how/why those DIR# subdirectories have appeared?
>
> Given that the files themselves are identical on the "current" OSDs
> belonging to the PG, and as the osd.63 (currently not belonging to the PG)
> has the same files, is it safe to stop the OSD.2, remove the 3.367_head dir,
> and then restart the OSD? (all these with the noout flag set of course)

*You* need to decide which is the "good" copy and then follow the
instructions in the links I provided to try and recover the pg. Back
those known copies on 2, 28, 35 and 63 up with the
ceph_objectstore_tool before proceeding. They may well be identical
but the peering process still needs to "see" the relevant logs and
currently something is stopping it doing so.

>
> Kind regards,
> Laszlo
>
>
> On 11.03.2017 00:32, Brad Hubbard wrote:
>>
>> So this is why it happened I guess.
>>
>> pool 3 'volumes' replicated size 3 min_size 1
>>
>> min_size = 1 is a recipe for disasters like this and there are plenty
>> of ML threads about not setting it below 2.
>>
>> The past intervals in the pg query show several intervals where a
>> single OSD may have gone rw.
>>
>> How important is this data?
>>
>> I would suggest checking which of these OSDs actually have the data
>> for this pg. From the pg query it looks like 2, 35 and 68 and possibly
>> 28 since it's the primary. Check all OSDs in the pg query output. I
>> would then back up all copies and work out which copy, if any, you
>> want to keep and then attempt something like the following.
>>
>> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg17820.html
>>
>> If you want to abandon the pg see
>>
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012778.html
>> for a possible solution.
>>
>> http://ceph.com/community/incomplete-pgs-oh-my/ may also give some ideas.
>>
>>
>> On Fri, Mar 10, 2017 at 9:44 PM, Laszlo Budai 
>> wrote:
>>>
>>> The OSDs are all there.
>>>
>>> $ sudo ceph osd stat
>>>  osdmap e60609: 72 osds: 72 up, 72 in
>>>
>>> an I have attached the result of ceph osd tree, and ceph osd dump
>>> commands.
>>> I got some extra info about the network problem. A faulty network device
>>> has
>>> flooded the network eating up all the bandwidth so the OSDs were not able
>>> to
>>> properly communicate with each other. This has lasted for almost 1 day.
>>>
>>> Thank you,
>>> Laszlo
>>>
>>>
>>>
>>> On 10.03.2017 12:19, Brad Hubbard wrote:


 To me it looks like someone may have done an "rm" on these OSDs but
 not removed them from the crushmap. This does not happen
 automatically.

 Do these OSDs show up in "ceph osd tree" and "ceph osd dump" ? If so,
 paste the output.

 Without knowing what exactly happened here it may be difficult to work
 out how to proceed.

 In order to go clean the primary needs to communicate with multiple
 OSDs, some of which are marked DNE and seem to be uncontactable.

 This seems to be more than a network issue (unless the outage is still
 happening).



 http://docs.ceph.com/docs/master/rados/operations/pg-states/?highlight=incomplete



 On Fri, Mar 10, 2017 at 6:09 PM, Laszlo Budai 
 wrote:
>
>
> Hello,
>
> I was informed that due to a networking issue the ceph cluster network
> was
> affected. There was a huge packet loss, and network interfaces were
> flipping. That's all I got.
> This outage has lasted a longer period of time. So I assume that some
> OSD
> may have been considered dead and the data from them has been moved
> away
> to
> other PGs (this is what ceph is supposed to do if I'm correct).
> Probably
> that was the point when the listed PGs have appeared into the picture.
> From the query we can see this for one of those OSDs:
> {
> "peer": "14",
> "pgid": "3.367",
> "last_update":

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-11 Thread Udo Lembke

Hi,

thanks for the usefull infos.


On 11.03.2017 12:21, cephmailingl...@mosibi.nl wrote:
>
> Hello list,
>
> A week ago we upgraded our Ceph clusters from Hammer to Jewel and with
> this email we want to share our experiences.
>
> ...
>
>
> e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  chown ceph:ceph
> ... the 'find' in step e found so much files that xargs (the shell)
> could not handle it (too many arguments). At that time we decided to
> keep the permissions on root in the upgrade phase.
>
>
Perhaps would an "find /var/lib/ceph/ ! -uid 64045 -exec chown
ceph:ceph" do an better job?!

Udo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-11 Thread Christian Theune

Hi,

thanks for that report! Glad to hear a mostly happy report. I’m still on the 
fence … ;)

I have had reports that Qemu (librbd connections) will require updates/restarts 
before upgrading. What was your experience on that side? Did you upgrade the 
clients? Did you start using any of the new RBD features, like fast diff?

What’s your experience with load/performance after the upgrade? Found any new 
issues that indicate shifted hotspots?

Cheers and thanks again,
Christian

> On Mar 11, 2017, at 12:21 PM, cephmailingl...@mosibi.nl wrote:
> 
> Hello list,
> 
> A week ago we upgraded our Ceph clusters from Hammer to Jewel and with this 
> email we want to share our experiences.
> 
> We have four clusters:
> 
> 1) Test cluster for all the fun things, completely virtual.
> 
> 2) Test cluster for Openstack: 3 monitors and 9 OSDs, all baremetal
> 
> 3) Cluster where we store backups: 3 monitors and 153 OSDs. 554 TB storage
> 
> 4) Main cluster (used for our custom software stack and openstack): 5 
> monitors and 1917 OSDs. 8 PB storage
> 
> 
> All the clusters are running on Ubuntu 14.04 LTS and we use the Ceph packages 
> from ceph.com. On every cluster we upgraded the monitors first and after 
> that, the OSDs. Our backup cluster is the only cluster that also serves S3 
> via the RadosGW and that service is upgraded at the same time as the OSDs in 
> that cluster. The upgrade of clusters 1, 2 and 3 went without any problem, 
> just an apt-get upgrade on every component. We did  see the message "failed 
> to encode map e with expected crc", but that message disappeared 
> when all the OSDs where upgraded.
> The upgrade of our biggest cluster, nr 4, did not go without problems. Since 
> we where expecting a lot of "failed to encode map e with expected 
> crc" messages, we disabled clog to monitors with 'ceph tell osd.* injectargs 
> -- --clog_to_monitors=false' so our monitors would not choke in those 
> messages. The upgrade of the monitors did go as expected, without any 
> problem, the problems started when we started the upgrade of the OSDs. In the 
> upgrade procedure, we had to change the ownership of the files from root to 
> the user ceph and that process was taking so long on our cluster that 
> completing the upgrade would take more then a week. We decided to keep the 
> permissions as they where for now, so in the upstart init script 
> /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup ceph' to  
> '--setuser root --setgroup root' and fix that OSD by OSD after the upgrade 
> was completely done
> 
> On cluster 3 (backup) we could change the permissions in a shorter time with 
> the following procedure:
> 
> a) apt-get -y install ceph-common
> b) mount|egrep 'on \/var.*ceph.*osd'|awk '{print $3}'|while read P; do 
> echo chown -R ceph:ceph $P \&;done > t ; bash t ; rm t
> c) (wait for all the chown's to complete)
> d) stop ceph-all
> e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  chown ceph:ceph
> f) start ceph-all
> 
> This procedure did not work on our main (4) cluster because the load on the 
> OSDs became 100% in step b and that resulted in blocked I/O on some virtual 
> instances in the Openstack cluster. Also at that time one of our pools got a 
> lot of extra data, those files where stored with root permissions since we 
> did not restarted the Ceph daemons yet, the 'find' in step e found so much 
> files that xargs (the shell) could not handle it (too many arguments). At 
> that time we decided to keep the permissions on root in the upgrade phase.
> 
> The next and biggest problem we encountered had to do with the CRC errors on 
> the OSD map. On every map update, the OSDs that were not upgraded yet, got 
> that CRC error and asked the monitor for a full OSD map instead of just a 
> delta update. At first we did not understand what exactly happened, we ran 
> the upgrade per node using a script and in that script we watch the state of 
> the cluster and when the cluster is healthy again, we upgrade the next host. 
> Every time we started the script (skipping the already upgraded hosts) the 
> first host(s) upgraded without issues and then we got blocked I/O on the 
> cluster. The blocked I/O went away within a minute of 2 (not measured). After 
> investigation we found out that the blocked I/O happened when nodes where 
> asking the monitor for a (full) OSD map and that resulted shortly in a full 
> saturated network link on our monitor.
> 
> In the next graph the statistics for one of our Ceph monitor is shown. Our 
> hosts are equipped with 10 gbit/s NIC's and every time at the highest peaks, 
> the problems occurred. We could work around this problem by waiting four 
> minutes between every host and after that time (14:20) we did not have any 
> issues any more. Of course the number of not upgraded OSDs decreased, so the 
> number of full OSD map requests also got smaller in time.
> 
> 
> 
> 
> The day after the upgrade we had issues with

Re: [ceph-users] osd_disk_thread_ioprio_priority help

2017-03-11 Thread Laszlo Budai

On 11.03.2017 16:25, Nick Fisk wrote:

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Laszlo Budai
Sent: 11 March 2017 13:51
To: ceph-users 
Subject: [ceph-users] osd_disk_thread_ioprio_priority help

Hello,

Can someone explain the meaning of osd_disk_thread_ioprio_priority. I'm
reading the definition from this page:
https://access.redhat.com/documentation/en-
us/red_hat_ceph_storage/1.3/html/configuration_guide/osd_configuration_
reference
it says: "It sets the ioprio_set(2) I/O scheduling priority of the disk

thread

ranging from 0 (highest) to 7 (lowest)." What is the so called disk

thread?

Then I found documents using both extreme values (0 and 7) for achieving
the same stuff make the scrubbing low priority.
https://ceph.com/geen-categorie/ceph-reduce-osd-scrub-priority/ and
http://dachary.org/?p=3268 are using the value 7 for reducing the priority

of

scrub, while here: http://ceph-users.ceph.narkive.com/AMMP3r5s/osd-
scrub-sleep-osd-scrub-chunk-min-max and
https://indico.cern.ch/event/588794/contributions/2374222/attachments/13
83112/2103509/Configuring_Ceph.pdf the value 0 is used.

Now I am confused  :(

Can somebody bring some light here?

Only to confuse you some more. If you are running Jewel or above then
scrubbing is now done in the main operation thread and so setting this value
will have no effect.

There is the hammer version of ceph.

Thank you,
Laszlo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd_disk_thread_ioprio_priority help

2017-03-11 Thread Nick Fisk

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Laszlo Budai
> Sent: 11 March 2017 13:51
> To: ceph-users 
> Subject: [ceph-users] osd_disk_thread_ioprio_priority help
> 
> Hello,
> 
> 
> Can someone explain the meaning of osd_disk_thread_ioprio_priority. I'm
> reading the definition from this page:
> https://access.redhat.com/documentation/en-
> us/red_hat_ceph_storage/1.3/html/configuration_guide/osd_configuration_
> reference
> it says: "It sets the ioprio_set(2) I/O scheduling priority of the disk
thread
> ranging from 0 (highest) to 7 (lowest)." What is the so called disk
thread?
> 
> Then I found documents using both extreme values (0 and 7) for achieving
> the same stuff make the scrubbing low priority.
> https://ceph.com/geen-categorie/ceph-reduce-osd-scrub-priority/ and
> http://dachary.org/?p=3268 are using the value 7 for reducing the priority
of
> scrub, while here: http://ceph-users.ceph.narkive.com/AMMP3r5s/osd-
> scrub-sleep-osd-scrub-chunk-min-max and
> https://indico.cern.ch/event/588794/contributions/2374222/attachments/13
> 83112/2103509/Configuring_Ceph.pdf the value 0 is used.
> 
> Now I am confused  :(
> 
> Can somebody bring some light here?

Only to confuse you some more. If you are running Jewel or above then
scrubbing is now done in the main operation thread and so setting this value
will have no effect.

> 
> Thank you,
> Laszlo
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] osd_disk_thread_ioprio_priority help

2017-03-11 Thread Laszlo Budai


Hello,


Can someone explain the meaning of osd_disk_thread_ioprio_priority. I'm reading 
the definition from this page: 
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/1.3/html/configuration_guide/osd_configuration_reference
it says: "It sets the ioprio_set(2) I/O scheduling priority of the disk thread 
ranging from 0 (highest) to 7 (lowest)." What is the so called disk thread?

Then I found documents using both extreme values (0 and 7) for achieving the 
same stuff make the scrubbing low priority.
https://ceph.com/geen-categorie/ceph-reduce-osd-scrub-priority/ and 
http://dachary.org/?p=3268 are using the value 7 for reducing the priority of 
scrub, while here: 
http://ceph-users.ceph.narkive.com/AMMP3r5s/osd-scrub-sleep-osd-scrub-chunk-min-max
 and 
https://indico.cern.ch/event/588794/contributions/2374222/attachments/1383112/2103509/Configuring_Ceph.pdf
 the value 0 is used.

Now I am confused  :(

Can somebody bring some light here?

Thank you,
Laszlo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-11 Thread cephmailinglist


Hello list,

A week ago we upgraded our Ceph clusters from Hammer to Jewel and with 
this email we want to share our experiences.



We have four clusters:

1) Test cluster for all the fun things, completely virtual.

2) Test cluster for Openstack: 3 monitors and 9 OSDs, all baremetal

3) Cluster where we store backups: 3 monitors and 153 OSDs. 554 TB storage

4) Main cluster (used for our custom software stack and openstack): 5 
monitors and 1917 OSDs. 8 PB storage



All the clusters are running on Ubuntu 14.04 LTS and we use the Ceph 
packages from ceph.com. On every cluster we upgraded the monitors first 
and after that, the OSDs. Our backup cluster is the only cluster that 
also serves S3 via the RadosGW and that service is upgraded at the same 
time as the OSDs in that cluster. The upgrade of clusters 1, 2 and 3 
went without any problem, just an apt-get upgrade on every component. We 
did  see the message "failed to encode map e with expected 
crc", but that message disappeared when all the OSDs where upgraded.


The upgrade of our biggest cluster, nr 4, did not go without problems. 
Since we where expecting a lot of "failed to encode map e with 
expected crc" messages, we disabled clog to monitors with 'ceph tell 
osd.* injectargs -- --clog_to_monitors=false' so our monitors would not 
choke in those messages. The upgrade of the monitors did go as expected, 
without any problem, the problems started when we started the upgrade of 
the OSDs. In the upgrade procedure, we had to change the ownership of 
the files from root to the user ceph and that process was taking so long 
on our cluster that completing the upgrade would take more then a week. 
We decided to keep the permissions as they where for now, so in the 
upstart init script /etc/init/ceph-osd.conf, we changed '--setuser ceph 
--setgroup ceph' to  '--setuser root --setgroup root' and fix that OSD 
by OSD after the upgrade was completely done


On cluster 3 (backup) we could change the permissions in a shorter time 
with the following procedure:


a) apt-get -y install ceph-common
b) mount|egrep 'on \/var.*ceph.*osd'|awk '{print $3}'|while read P; 
do echo chown -R ceph:ceph $P \&;done > t ; bash t ; rm t

c) (wait for all the chown's to complete)
d) stop ceph-all
e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  chown ceph:ceph
f) start ceph-all

This procedure did not work on our main (4) cluster because the load on 
the OSDs became 100% in step b and that resulted in blocked I/O on some 
virtual instances in the Openstack cluster. Also at that time one of our 
pools got a lot of extra data, those files where stored with root 
permissions since we did not restarted the Ceph daemons yet, the 'find' 
in step e found so much files that xargs (the shell) could not handle it 
(too many arguments). At that time we decided to keep the permissions on 
root in the upgrade phase.


The next and biggest problem we encountered had to do with the CRC 
errors on the OSD map. On every map update, the OSDs that were not 
upgraded yet, got that CRC error and asked the monitor for a full OSD 
map instead of just a delta update. At first we did not understand what 
exactly happened, we ran the upgrade per node using a script and in that 
script we watch the state of the cluster and when the cluster is healthy 
again, we upgrade the next host. Every time we started the script 
(skipping the already upgraded hosts) the first host(s) upgraded without 
issues and then we got blocked I/O on the cluster. The blocked I/O went 
away within a minute of 2 (not measured). After investigation we found 
out that the blocked I/O happened when nodes where asking the monitor 
for a (full) OSD map and that resulted shortly in a full saturated 
network link on our monitor.


In the next graph the statistics for one of our Ceph monitor is shown. 
Our hosts are equipped with 10 gbit/s NIC's and every time at the 
highest peaks, the problems occurred. We could work around this problem 
by waiting four minutes between every host and after that time (14:20) 
we did not have any issues any more. Of course the number of not 
upgraded OSDs decreased, so the number of full OSD map requests also got 
smaller in time.




The day after the upgrade we had issues with live migrations of 
Openstack instances. We got this message, "OSError: 
/usr/lib/librbd.so.1: undefined symbol: 
_ZN8librados5Rados15aio_watch_flushEPNS_13AioCompletionE". This is 
resolved by restarting libvirt-bin and nova-compute on every compute node.


Please notice that the upgrade of our biggest cluster was not a 100% 
success, but the problems where relative small and the cluster stayed 
on-line and there where only a few virtual openstack instances that did 
not like the blocked I/O and had to be restarted.



--

With regards,

Richard Arends.
Snow BV / http://snow.nl

___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] Ceph with RDMA

[ceph-users] apologies for the erroneous subject - should have been Re: Unable to boot OS on cluster node

Re: [ceph-users] pgs stuck inactive

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

Re: [ceph-users] osd_disk_thread_ioprio_priority help

Re: [ceph-users] osd_disk_thread_ioprio_priority help

[ceph-users] osd_disk_thread_ioprio_priority help

[ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

9 matches

Site Navigation

Mail list logo

Footer information