Re: [ceph-users] Ceph on Solaris / Illumos
On 04/15/2015 08:16 AM, Jake Young wrote: Has anyone compiled ceph (either osd or client) on a Solaris based OS? The thread on ZFS support for osd got me thinking about using solaris as an osd server. It would have much better ZFS performance and I wonder if the osd performance without a journal would be 2x better. Doubt it. You may be able to do a little better, but you have to pay the piper some how. If you clone from journal you will introduce fragmentation. If you throw the journal away you'll suffer for everything but very large writes unless you throw safety away. I think if we are going to generally beat filestore (not just for optimal benchmarking tests!) it's going to take some very careful cleverness. Thankfully Sage is very clever and is working on it in newstore. Even there, filestore has been proving difficult to beat for writes. A second thought I had was using the Comstar iscsi / fcoe target software that is part of Solaris. Has anyone done anything with a ceph rbd client for Solaris based OSs? No idea! Jake ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph on Solaris / Illumos
Has anyone compiled ceph (either osd or client) on a Solaris based OS? The thread on ZFS support for osd got me thinking about using solaris as an osd server. It would have much better ZFS performance and I wonder if the osd performance without a journal would be 2x better. A second thought I had was using the Comstar iscsi / fcoe target software that is part of Solaris. Has anyone done anything with a ceph rbd client for Solaris based OSs? Jake ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Do I have enough pgs?
On 04/15/2015 08:10 AM, Tony Harris wrote: Hi all, I have a cluster of 3 nodes, 18 OSDs. I used the pgcalc to give a suggested number of PGs - here was my list: Group1 3 rep 18 OSDs 30% data 512PGs Group2 3 rep 18 OSDs 30% data 512PGs Group3 3 rep 18 OSDs 30% data 512PGs Group4 2 rep 18 OSDs 5% data 256PGs Group5 2 rep 18 OSDs 5% data 256PGs My estimated growth is to 27-36 OSDs within the next 18 months, after that probably pretty stagnant for the next several years. I would use more, but I tend to error on the high side for small clusters. The tool I mentioned in the other data distribution thread shows you the most and least subscribed OSDs in each pool. You can use that to determine if you think the distribution looks reasonable. Script is here: https://github.com/ceph/cbt/blob/master/tools/readpgdump.py You can run it by doing ceph pg dump | readpgdump.py Mark Thoughts? -Tony ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Do I have enough pgs?
Hi all, I have a cluster of 3 nodes, 18 OSDs. I used the pgcalc to give a suggested number of PGs - here was my list: Group1 3 rep 18 OSDs 30% data 512PGs Group2 3 rep 18 OSDs 30% data 512PGs Group3 3 rep 18 OSDs 30% data 512PGs Group4 2 rep 18 OSDs 5% data 256PGs Group5 2 rep 18 OSDs 5% data 256PGs My estimated growth is to 27-36 OSDs within the next 18 months, after that probably pretty stagnant for the next several years. Thoughts? -Tony ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph on Solaris / Illumos
On Wednesday, April 15, 2015, Mark Nelson mnel...@redhat.com wrote: On 04/15/2015 08:16 AM, Jake Young wrote: Has anyone compiled ceph (either osd or client) on a Solaris based OS? The thread on ZFS support for osd got me thinking about using solaris as an osd server. It would have much better ZFS performance and I wonder if the osd performance without a journal would be 2x better. Doubt it. You may be able to do a little better, but you have to pay the piper some how. If you clone from journal you will introduce fragmentation. If you throw the journal away you'll suffer for everything but very large writes unless you throw safety away. I think if we are going to generally beat filestore (not just for optimal benchmarking tests!) it's going to take some very careful cleverness. Thankfully Sage is very clever and is working on it in newstore. Even there, filestore has been proving difficult to beat for writes. That's interesting. I've been under the impression that the ideal osd config was using a stable and fast BTRFS (which doesn't exist yet) with no journal. In my specific case, I don't want to use an external journal. I've gone down the path of using RAID controllers with write-back cache and BBUs with each disk in its own RAID0 group, instead of SSD journals. (Thanks for your performance articles BTW, they were very helpful!) My take on your results indicates that IO throughput performance on XFS with same disk journal and WB cache on the RAID card was basically the same or better than BTRFS with no journal. In addition, BTRFS typically used much more CPU. Has BTRFS performance gotten any better since you wrote the performance articles? Have you compared ZFS (ZoL) performance to BTRFS? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph site is very slow
People are working on it but I understand there was/is a DoS attack going on. :/ -Greg On Wed, Apr 15, 2015 at 1:50 AM Ignazio Cassano ignaziocass...@gmail.com wrote: Many thanks 2015-04-15 10:44 GMT+02:00 Wido den Hollander w...@42on.com: On 04/15/2015 10:20 AM, Ignazio Cassano wrote: Hi all, why ceph.com is very slow ? Not known right now. But you can try eu.ceph.com for your packages and downloads. It is impossible download files for installing ceph. Regards Ignazio ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Binding a pool to certain OSDs
So it was a PG problem. I added a couple of OSD per host, reconfigured the CRUSH map and the cluster began to work properly. Thanks Giuseppe 2015-04-14 19:02 GMT+02:00 Saverio Proto ziopr...@gmail.com: No error message. You just finish the RAM memory and you blow up the cluster because of too many PGs. Saverio 2015-04-14 18:52 GMT+02:00 Giuseppe Civitella giuseppe.civite...@gmail.com: Hi Saverio, I first made a test on my test staging lab where I have only 4 OSD. On my mon servers (which run other services) I have 16BG RAM, 15GB used but 5 cached. On the OSD servers I have 3GB RAM, 3GB used but 2 cached. ceph -s tells me nothing about PGs, shouldn't I get an error message from its output? Thanks Giuseppe 2015-04-14 18:20 GMT+02:00 Saverio Proto ziopr...@gmail.com: You only have 4 OSDs ? How much RAM per server ? I think you have already too many PG. Check your RAM usage. Check on Ceph wiki guidelines to dimension the correct number of PGs. Remeber that everytime to create a new pool you add PGs into the system. Saverio 2015-04-14 17:58 GMT+02:00 Giuseppe Civitella giuseppe.civite...@gmail.com: Hi all, I've been following this tutorial to realize my setup: http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ I got this CRUSH map from my test lab: http://paste.openstack.org/show/203887/ then I modified the map and uploaded it. This is the final version: http://paste.openstack.org/show/203888/ When applied the new CRUSH map, after some rebalancing, I get this health status: [- avalon1 root@controller001 Ceph -] # ceph -s cluster af09420b-4032-415e-93fc-6b60e9db064e health HEALTH_WARN crush map has legacy tunables; mon.controller001 low disk space; clock skew detected on mon.controller002 monmap e1: 3 mons at {controller001= 10.235.24.127:6789/0,controller002=10.235.24.128:6789/0,controller003=10.235.24.129:6789/0 }, election epoch 314, quorum 0,1,2 controller001,controller002,controller003 osdmap e3092: 4 osds: 4 up, 4 in pgmap v785873: 576 pgs, 6 pools, 71548 MB data, 18095 objects 8842 MB used, 271 GB / 279 GB avail 576 active+clean and this osd tree: [- avalon1 root@controller001 Ceph -] # ceph osd tree # idweight type name up/down reweight -8 2 root sed -5 1 host ceph001-sed 2 1 osd.2 up 1 -7 1 host ceph002-sed 3 1 osd.3 up 1 -1 2 root default -4 1 host ceph001-sata 0 1 osd.0 up 1 -6 1 host ceph002-sata 1 1 osd.1 up 1 which seems not a bad situation. The problem rise when I try to create a new pool, the command ceph osd pool create sed 128 128 gets stuck. It never ends. And I noticed that my Cinder installation is not able to create volumes anymore. I've been looking in the logs for errors and found nothing. Any hint about how to proceed to restore my ceph cluster? Is there something wrong with the steps I take to update the CRUSH map? Is the problem related to Emperor? Regards, Giuseppe 2015-04-13 18:26 GMT+02:00 Giuseppe Civitella giuseppe.civite...@gmail.com: Hi all, I've got a Ceph cluster which serves volumes to a Cinder installation. It runs Emperor. I'd like to be able to replace some of the disks with OPAL disks and create a new pool which uses exclusively the latter kind of disk. I'd like to have a traditional pool and a secure one coexisting on the same ceph host. I'd then use Cinder multi backend feature to serve them. My question is: how is it possible to realize such a setup? How can I bind a pool to certain OSDs? Thanks Giuseppe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph on Solaris / Illumos
On Wednesday, April 15, 2015, Alexandre Marangone amara...@redhat.com wrote: The LX branded zones might be a way to run OSDs on Illumos: https://wiki.smartos.org/display/DOC/LX+Branded+Zones For fun, I tried a month or so ago, managed to have a quorum. OSDs wouldn't start, I didn't look further as far as debugging. I'll give it a go when I have more time. Hmm. That is a great idea. I'll give LX branded zones a shot for both server and client use cases. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph on Solaris / Illumos
The LX branded zones might be a way to run OSDs on Illumos: https://wiki.smartos.org/display/DOC/LX+Branded+Zones For fun, I tried a month or so ago, managed to have a quorum. OSDs wouldn't start, I didn't look further as far as debugging. I'll give it a go when I have more time. On Wed, Apr 15, 2015 at 7:04 AM, Mark Nelson mnel...@redhat.com wrote: On 04/15/2015 08:16 AM, Jake Young wrote: Has anyone compiled ceph (either osd or client) on a Solaris based OS? The thread on ZFS support for osd got me thinking about using solaris as an osd server. It would have much better ZFS performance and I wonder if the osd performance without a journal would be 2x better. Doubt it. You may be able to do a little better, but you have to pay the piper some how. If you clone from journal you will introduce fragmentation. If you throw the journal away you'll suffer for everything but very large writes unless you throw safety away. I think if we are going to generally beat filestore (not just for optimal benchmarking tests!) it's going to take some very careful cleverness. Thankfully Sage is very clever and is working on it in newstore. Even there, filestore has been proving difficult to beat for writes. A second thought I had was using the Comstar iscsi / fcoe target software that is part of Solaris. Has anyone done anything with a ceph rbd client for Solaris based OSs? No idea! Jake ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados Gateway and keystone
Hi, Despite the creation of ec2 credentials which provides an accesskey and a secretkey for a user, it’s always impossible to connect using S3 (Forbidden/Access denied). All is right using swift (create container, list container, get object, put object, delete object) I use cloudberry client to do so. Does someone know how I can check if the interoperability between keystone and the rgw is correctly set up? In the rgw pools? in the radosgw metadata? Best regards De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de ghislain.cheval...@orange.com Envoyé : mercredi 15 avril 2015 13:16 À : Erik McCormick Cc : ceph-users Objet : Re: [ceph-users] Rados Gateway and keystone Thanks a lot That helps. De : Erik McCormick [mailto:emccorm...@cirrusseven.com] Envoyé : lundi 13 avril 2015 18:32 À : CHEVALIER Ghislain IMT/OLPS Cc : ceph-users Objet : Re: [ceph-users] Rados Gateway and keystone I haven't really used the S3 stuff much, but the credentials should be in keystone already. If you're in horizon, you can download them under Access and Security-API Access. Using the CLI you can use the openstack client like openstack credential list | show | create | delete | set or with the keystone client like keystone ec2-credentials-list, etc. Then you should be able to feed those credentials to the rgw like a normal S3 API call. Cheers, Erik On Mon, Apr 13, 2015 at 10:16 AM, ghislain.cheval...@orange.commailto:ghislain.cheval...@orange.com wrote: Hi all, Coming back to that issue. I successfully used keystone users for the rados gateway and the swift API but I still don't understand how it can work with S3 API and i.e. S3 users (AccessKey/SecretKey) I found a swift3 initiative but I think It's only compliant in a pure OpenStack swift environment by setting up a specific plug-in. https://github.com/stackforge/swift3 A rgw can be, at the same, time under keystone control and standard radosgw-admin if - for swift, you use the right authentication service (keystone or internal) - for S3, you use the internal authentication service So, my questions are still valid. How can a rgw work for S3 users if there are stored in keystone? Which is the accesskey and secretkey? What is the purpose of rgw s3 auth use keystone parameter ? Best regards -- De : ceph-users [mailto:ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com] De la part de ghislain.cheval...@orange.commailto:ghislain.cheval...@orange.com Envoyé : lundi 23 mars 2015 14:03 À : ceph-users Objet : [ceph-users] Rados Gateway and keystone Hi All, I just would to be sure about keystone configuration for Rados Gateway. I read the documentation http://ceph.com/docs/master/radosgw/keystone/ and http://ceph.com/docs/master/radosgw/config-ref/?highlight=keystone but I didn't catch if after having configured the rados gateway (ceph.conf) in order to use keystone, it becomes mandatory to create all the users in it. In other words, can a rgw be, at the same, time under keystone control and standard radosgw-admin ? How does it work for S3 users ? What is the purpose of rgw s3 auth use keystone parameter ? Best regards - - - - - - - - - - - - - - - - - Ghislain Chevalier +33299124432tel:%2B33299124432 +33788624370tel:%2B33788624370 ghislain.cheval...@orange.commailto:ghislain.cheval...@orange.com _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may
Re: [ceph-users] Ceph on Solaris / Illumos
On 04/15/2015 10:36 AM, Jake Young wrote: On Wednesday, April 15, 2015, Mark Nelson mnel...@redhat.com mailto:mnel...@redhat.com wrote: On 04/15/2015 08:16 AM, Jake Young wrote: Has anyone compiled ceph (either osd or client) on a Solaris based OS? The thread on ZFS support for osd got me thinking about using solaris as an osd server. It would have much better ZFS performance and I wonder if the osd performance without a journal would be 2x better. Doubt it. You may be able to do a little better, but you have to pay the piper some how. If you clone from journal you will introduce fragmentation. If you throw the journal away you'll suffer for everything but very large writes unless you throw safety away. I think if we are going to generally beat filestore (not just for optimal benchmarking tests!) it's going to take some very careful cleverness. Thankfully Sage is very clever and is working on it in newstore. Even there, filestore has been proving difficult to beat for writes. That's interesting. I've been under the impression that the ideal osd config was using a stable and fast BTRFS (which doesn't exist yet) with no journal. This is sort of unrelated to the journal specifically, but BTRFS with RBD will start fragmenting terribly due to how COW works (and how it relates to snapshots too). More related to the journal: At one point we were thinking about cloning from the journal on BTRFS, but that also potentially leads to nasty fragmentation even if the initial behavior would look very good. I haven't done any testing that I can remember of BTRFS with no journal. I'm not sure if it even still works... In my specific case, I don't want to use an external journal. I've gone down the path of using RAID controllers with write-back cache and BBUs with each disk in its own RAID0 group, instead of SSD journals. (Thanks for your performance articles BTW, they were very helpful!) My take on your results indicates that IO throughput performance on XFS with same disk journal and WB cache on the RAID card was basically the same or better than BTRFS with no journal. In addition, BTRFS typically used much more CPU. Has BTRFS performance gotten any better since you wrote the performance articles? So the trick with those articles is that the systems are fresh, and most of the initial articles were using rados bench which is always writing out new objects vs something like RBD where you are (usually) doing writes to existing objects that represent the blocks. If you were to do a bunch of random 4k writes and then later try to do sequential reads, you'd see BTRFS sequential read performance tank. We actually did tests like that with emperor during the firefly development cycle. I've included the results. Basically the first iteration of the test cycle looks great on BTRFS, then you see read performance drop way down. Eventually write performance also is likely drop as the disks become extremely fragmented (we may even see a little of that in those tests). Have you compared ZFS (ZoL) performance to BTRFS? I did way back in 2013 when we were working with Brian Behlendorf to fix xattr bugs in ZOL. It was quite a bit slower if you didn't enable SA xattrs. With SA xattrs, it was much closer, but not as fast as btrfs or xfs. I didn't do a lot of tuning though and Ceph wasn't making good use of ZFS features, so it's very possible things have changed. Emeror Raw Performance Data.ods Description: application/vnd.oasis.opendocument.spreadsheet ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph repo - RSYNC?
Sorry for starting a new thread, I've only just subscribed to the list and the archive on the mail listserv is far from complete at the moment. on 8th March David Moreau Simard said http://www.spinics.net/lists/ceph-users/msg16334.html that there was a rsync'able mirror of the ceph repo at http://ceph.mirror.iweb.ca/ My problem is that the repo doesn't include Hammer. Is there someone who can get that added to the mirror? thanks very much Paul ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph on Debian Jessie stopped working
Hi All, Earlier ceph on Debian Jessie was working. Jessie is running 3.16.7 . Now when I modprobe rbd , no /dev/rbd appear. # dmesg | grep -e rbd -e ceph [ 15.814423] Key type ceph registered [ 15.814461] libceph: loaded (mon/osd proto 15/24) [ 15.831092] rbd: loaded [ 22.084573] rbd: no image name provided [ 22.230176] rbd: no image name provided Some files appear under /sys ls /sys/devices/rbd power uevent ceph-fuse /mnt/cephfs just hangs. I haven't changed the ceph config, but possibly there were package updates. I did install a earlier Jessie kernel from a machine which is still working and rebooted. No luck. Any ideas of what to check next? Thanks, Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph repo - RSYNC?
http://eu.ceph.com/ has rsync and Hammer. On Wed, Apr 15, 2015 at 10:17 AM, Paul Mansfield paul.mansfi...@alcatel-lucent.com wrote: Sorry for starting a new thread, I've only just subscribed to the list and the archive on the mail listserv is far from complete at the moment. on 8th March David Moreau Simard said http://www.spinics.net/lists/ceph-users/msg16334.html that there was a rsync'able mirror of the ceph repo at http://ceph.mirror.iweb.ca/ My problem is that the repo doesn't include Hammer. Is there someone who can get that added to the mirror? thanks very much Paul ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] mds crashing
I upgraded to 0.94.1 from 0.94 on Monday, and everything had been going pretty well. Then, about noon today, we had an mds crash. And then the failover mds crashed. And this cascaded through all 4 mds servers we have. If I try to start it ('service ceph start mds' on CentOS 7.1), it appears to be OK for a little while. ceph -w goes through 'replay' 'reconnect' 'rejoin' 'clientreplay' and 'active' but nearly immediately after getting to 'active', it crashes again. I have the mds log at http://people.beocat.cis.ksu.edu/~kylehutson/ceph-mds.hobbit01.log For the possibly, but not necessarily, useful background info. - Yesterday we took our erasure coded pool and increased both pg_num and pgp_num from 2048 to 4096. We still have several objects misplaced (~17%), but those seem to be continuing to clean themselves up. - We are in the midst of a large (300+ TB) rsync from our old (non-ceph) filesystem to this filesystem. - Before we realized the mds crashes, we had just changed the size of our metadata pool from 2 to 4. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph repo - RSYNC?
Hey, you're right. Thanks for bringing that to my attention, it's syncing now :) Should be available soon. David Moreau Simard On 2015-04-15 12:17 PM, Paul Mansfield wrote: Sorry for starting a new thread, I've only just subscribed to the list and the archive on the mail listserv is far from complete at the moment. on 8th March David Moreau Simard said http://www.spinics.net/lists/ceph-users/msg16334.html that there was a rsync'able mirror of the ceph repo at http://ceph.mirror.iweb.ca/ My problem is that the repo doesn't include Hammer. Is there someone who can get that added to the mirror? thanks very much Paul ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade from Giant 0.87-1 to Hammer 0.94-1
Also our calamari web UI won't authenticate anymore, can’t see any issues in any log under /var/log/calamari, any hints on what to look for are appreciated, TIA! # dpkg -l | egrep -i calamari\|ceph ii calamari-clients 1.2.3.1-2-gc1f14b2all Inktank Calamari user interface ii calamari-server1.3-rc-16-g321cd58amd64 Inktank package containing the Calamari management srever ii ceph 0.94.1-1~bpo70+1 amd64 distributed storage and file system ii ceph-common0.94.1-1~bpo70+1 amd64 common utilities to mount and interact with a ceph storage cluster ii ceph-deploy1.5.23~bpo70+1all Ceph-deploy is an easy to use configuration tool ii ceph-fs-common 0.94.1-1~bpo70+1 amd64 common utilities to mount and interact with a ceph file system ii ceph-fuse 0.94.1-1~bpo70+1 amd64 FUSE-based client for the Ceph distributed file system ii ceph-mds 0.94.1-1~bpo70+1 amd64 metadata server for the ceph distributed file system ii curl 7.29.0-1~bpo70+1.ceph amd64 command line tool for transferring data with URL syntax ii libcephfs1 0.94.1-1~bpo70+1 amd64 Ceph distributed file system client library ii libcurl3:amd64 7.29.0-1~bpo70+1.ceph amd64 easy-to-use client-side URL transfer library (OpenSSL flavour) ii libcurl3-gnutls:amd64 7.29.0-1~bpo70+1.ceph amd64 easy-to-use client-side URL transfer library (GnuTLS flavour) ii libleveldb1:amd64 1.12.0-1~bpo70+1.ceph amd64 fast key-value storage library ii python-ceph0.94.1-1~bpo70+1 amd64 Meta-package for python libraries for the Ceph libraries ii python-cephfs 0.94.1-1~bpo70+1 amd64 Python libraries for the Ceph libcephfs library ii python-rados 0.94.1-1~bpo70+1 amd64 Python libraries for the Ceph librados library ii python-rbd 0.94.1-1~bpo70+1 amd64 Python libraries for the Ceph librbd library On 16/04/2015, at 00.41, Steffen W Sørensen ste...@me.com wrote: Hi, Successfully upgrade a small development 4x node Giant 0.87-1 cluster to Hammer 0.94-1, each node with 6x OSD - 146GB, 19 pools, mainly 2 in usage. Only minor thing now ceph -s complaining over too may PGs, previously Giant had complain of too few, so various pools were bumped up till health status was okay as before upgrading. Admit, that after bumping PGs up in Giant we had changed pool sizes from 3 to 2 min 1 in fear of perf. when backfilling/recovering PGs. # ceph -s cluster 16fe2dcf-2629-422f-a649-871deba78bcd health HEALTH_WARN too many PGs per OSD (1237 max 300) monmap e29: 3 mons at {0=10.0.3.4:6789/0,1=10.0.3.2:6789/0,2=10.0.3.1:6789/0} election epoch 1370, quorum 0,1,2 2,1,0 mdsmap e142: 1/1/1 up {0=2=up:active}, 1 up:standby osdmap e3483: 24 osds: 24 up, 24 in pgmap v3719606: 14848 pgs, 19 pools, 530 GB data, 133 kobjects 1055 GB used, 2103 GB / 3159 GB avail 14848 active+clean Can we just reduce PGs again and should we decrement in minor steps one pool at a time… Any thoughts, TIA! /Steffen 1. restart the monitor daemons on each node 2. then, restart the osd daemons on each node 3. then, restart the mds daemons on each node 4. then, restart the radosgw daemon on each node Regards. -- François Lafont ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] many slow requests on different osds (scrubbing disabled)
Hi, From few days we notice on our cluster many slow request. Cluster: ceph version 0.67.11 3 x mon 36 hosts - 10 osd ( 4T ) + 2 SSD (journals) Scrubbing and deep scrubbing is disabled but count of slow requests is still increasing. Disk utilisation is very small after we have disabled scrubbings. Log from one write with slow with debug osd = 20/20 osd.284 - master: http://pastebin.com/xPtpNU6n osd.186 - replica: http://pastebin.com/NS1gmhB0 osd.177 - replica: http://pastebin.com/Ln9L2Z5Z Can you help me find what is reason of it? -- Regards Dominik ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Upgrade from Giant 0.87-1 to Hammer 0.94-1
Hi, Successfully upgrade a small development 4x node Giant 0.87-1 cluster to Hammer 0.94-1, each node with 6x OSD - 146GB, 19 pools, mainly 2 in usage. Only minor thing now ceph -s complaining over too may PGs, previously Giant had complain of too few, so various pools were bumped up till health status was okay as before upgrading. Admit, that after bumping PGs up in Giant we had changed pool sizes from 3 to 2 min 1 in fear of perf. when backfilling/recovering PGs. # ceph -s cluster 16fe2dcf-2629-422f-a649-871deba78bcd health HEALTH_WARN too many PGs per OSD (1237 max 300) monmap e29: 3 mons at {0=10.0.3.4:6789/0,1=10.0.3.2:6789/0,2=10.0.3.1:6789/0} election epoch 1370, quorum 0,1,2 2,1,0 mdsmap e142: 1/1/1 up {0=2=up:active}, 1 up:standby osdmap e3483: 24 osds: 24 up, 24 in pgmap v3719606: 14848 pgs, 19 pools, 530 GB data, 133 kobjects 1055 GB used, 2103 GB / 3159 GB avail 14848 active+clean Can we just reduce PGs again and should we decrement in minor steps one pool at a time… Any thoughts, TIA! /Steffen 1. restart the monitor daemons on each node 2. then, restart the osd daemons on each node 3. then, restart the mds daemons on each node 4. then, restart the radosgw daemon on each node Regards. -- François Lafont ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds crashing
Thank you, John! That was exactly the bug we were hitting. My Google-fu didn't lead me to this one. On Wed, Apr 15, 2015 at 4:16 PM, John Spray john.sp...@redhat.com wrote: On 15/04/2015 20:02, Kyle Hutson wrote: I upgraded to 0.94.1 from 0.94 on Monday, and everything had been going pretty well. Then, about noon today, we had an mds crash. And then the failover mds crashed. And this cascaded through all 4 mds servers we have. If I try to start it ('service ceph start mds' on CentOS 7.1), it appears to be OK for a little while. ceph -w goes through 'replay' 'reconnect' 'rejoin' 'clientreplay' and 'active' but nearly immediately after getting to 'active', it crashes again. I have the mds log at http://people.beocat.cis.ksu. edu/~kylehutson/ceph-mds.hobbit01.log http://people.beocat.cis.ksu. edu/%7Ekylehutson/ceph-mds.hobbit01.log For the possibly, but not necessarily, useful background info. - Yesterday we took our erasure coded pool and increased both pg_num and pgp_num from 2048 to 4096. We still have several objects misplaced (~17%), but those seem to be continuing to clean themselves up. - We are in the midst of a large (300+ TB) rsync from our old (non-ceph) filesystem to this filesystem. - Before we realized the mds crashes, we had just changed the size of our metadata pool from 2 to 4. It looks like you're seeing http://tracker.ceph.com/issues/10449, which is a situation where the SessionMap object becomes too big for the MDS to save.The cause of it in that case was stuck requests from a misbehaving client running a slightly older kernel. Assuming you're using the kernel client and having a similar problem, you could try to work around this situation by forcibly unmounting the clients while the MDS is offline, such that during clientreplay the MDS will remove them from the SessionMap after timing out, and then next time it tries to save the map it won't be oversized. If that works, you could then look into getting newer kernels on the clients to avoid hitting the issue again -- the #10449 ticket has some pointers about which kernel changes were relevant. Cheers, John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds crashing
What is significantly smaller? We have 67 requests in the 16,400,000 range and 250 in the 18,900,000 range. Thanks, Adam On Wed, Apr 15, 2015 at 8:38 PM, Yan, Zheng uker...@gmail.com wrote: On Thu, Apr 16, 2015 at 9:07 AM, Adam Tygart mo...@ksu.edu wrote: We are using 3.18.6-gentoo. Based on that, I was hoping that the kernel bug referred to in the bug report would have been fixed. The bug was supposed to be fixed, but you hit the bug again. could you check if the kernel client has any hang mds request. (check /sys/kernel/debug/ceph/*/mdsc on the machine that contain cephfs mount. If there is any request whose ID is significant smaller than other requests' IDs) Regards Yan, Zheng -- Adam On Wed, Apr 15, 2015 at 8:02 PM, Yan, Zheng uker...@gmail.com wrote: On Thu, Apr 16, 2015 at 5:29 AM, Kyle Hutson kylehut...@ksu.edu wrote: Thank you, John! That was exactly the bug we were hitting. My Google-fu didn't lead me to this one. here is the bug report http://tracker.ceph.com/issues/10449. It's a kernel client bug which causes the session map size increase infinitely. which version of linux kernel are using? Regards Yan, Zheng On Wed, Apr 15, 2015 at 4:16 PM, John Spray john.sp...@redhat.com wrote: On 15/04/2015 20:02, Kyle Hutson wrote: I upgraded to 0.94.1 from 0.94 on Monday, and everything had been going pretty well. Then, about noon today, we had an mds crash. And then the failover mds crashed. And this cascaded through all 4 mds servers we have. If I try to start it ('service ceph start mds' on CentOS 7.1), it appears to be OK for a little while. ceph -w goes through 'replay' 'reconnect' 'rejoin' 'clientreplay' and 'active' but nearly immediately after getting to 'active', it crashes again. I have the mds log at http://people.beocat.cis.ksu.edu/~kylehutson/ceph-mds.hobbit01.log http://people.beocat.cis.ksu.edu/%7Ekylehutson/ceph-mds.hobbit01.log For the possibly, but not necessarily, useful background info. - Yesterday we took our erasure coded pool and increased both pg_num and pgp_num from 2048 to 4096. We still have several objects misplaced (~17%), but those seem to be continuing to clean themselves up. - We are in the midst of a large (300+ TB) rsync from our old (non-ceph) filesystem to this filesystem. - Before we realized the mds crashes, we had just changed the size of our metadata pool from 2 to 4. It looks like you're seeing http://tracker.ceph.com/issues/10449, which is a situation where the SessionMap object becomes too big for the MDS to save.The cause of it in that case was stuck requests from a misbehaving client running a slightly older kernel. Assuming you're using the kernel client and having a similar problem, you could try to work around this situation by forcibly unmounting the clients while the MDS is offline, such that during clientreplay the MDS will remove them from the SessionMap after timing out, and then next time it tries to save the map it won't be oversized. If that works, you could then look into getting newer kernels on the clients to avoid hitting the issue again -- the #10449 ticket has some pointers about which kernel changes were relevant. Cheers, John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds crashing
We are using 3.18.6-gentoo. Based on that, I was hoping that the kernel bug referred to in the bug report would have been fixed. -- Adam On Wed, Apr 15, 2015 at 8:02 PM, Yan, Zheng uker...@gmail.com wrote: On Thu, Apr 16, 2015 at 5:29 AM, Kyle Hutson kylehut...@ksu.edu wrote: Thank you, John! That was exactly the bug we were hitting. My Google-fu didn't lead me to this one. here is the bug report http://tracker.ceph.com/issues/10449. It's a kernel client bug which causes the session map size increase infinitely. which version of linux kernel are using? Regards Yan, Zheng On Wed, Apr 15, 2015 at 4:16 PM, John Spray john.sp...@redhat.com wrote: On 15/04/2015 20:02, Kyle Hutson wrote: I upgraded to 0.94.1 from 0.94 on Monday, and everything had been going pretty well. Then, about noon today, we had an mds crash. And then the failover mds crashed. And this cascaded through all 4 mds servers we have. If I try to start it ('service ceph start mds' on CentOS 7.1), it appears to be OK for a little while. ceph -w goes through 'replay' 'reconnect' 'rejoin' 'clientreplay' and 'active' but nearly immediately after getting to 'active', it crashes again. I have the mds log at http://people.beocat.cis.ksu.edu/~kylehutson/ceph-mds.hobbit01.log http://people.beocat.cis.ksu.edu/%7Ekylehutson/ceph-mds.hobbit01.log For the possibly, but not necessarily, useful background info. - Yesterday we took our erasure coded pool and increased both pg_num and pgp_num from 2048 to 4096. We still have several objects misplaced (~17%), but those seem to be continuing to clean themselves up. - We are in the midst of a large (300+ TB) rsync from our old (non-ceph) filesystem to this filesystem. - Before we realized the mds crashes, we had just changed the size of our metadata pool from 2 to 4. It looks like you're seeing http://tracker.ceph.com/issues/10449, which is a situation where the SessionMap object becomes too big for the MDS to save.The cause of it in that case was stuck requests from a misbehaving client running a slightly older kernel. Assuming you're using the kernel client and having a similar problem, you could try to work around this situation by forcibly unmounting the clients while the MDS is offline, such that during clientreplay the MDS will remove them from the SessionMap after timing out, and then next time it tries to save the map it won't be oversized. If that works, you could then look into getting newer kernels on the clients to avoid hitting the issue again -- the #10449 ticket has some pointers about which kernel changes were relevant. Cheers, John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds crashing
On Thu, Apr 16, 2015 at 9:48 AM, Adam Tygart mo...@ksu.edu wrote: What is significantly smaller? We have 67 requests in the 16,400,000 range and 250 in the 18,900,000 range. that explains the crash. could you help me to debug this issue. send /sys/kernel/debug/ceph/*/mdsc to me. run echo module ceph +p /sys/kernel/debug/dynamic_debug/control on the cephfs mount machine restart the mds and wait until it crash again run echo module ceph -p /sys/kernel/debug/dynamic_debug/control on the cephfs mount machine send kernel message of the cephfs mount machine to me (should in /var/log/kerne.log or /var/log/message) to recover from the crash. you can either force reset the machine contains cephfs mount or add mds wipe sessions = 1 to mds section of ceph.conf Regards Yan, Zheng Thanks, Adam On Wed, Apr 15, 2015 at 8:38 PM, Yan, Zheng uker...@gmail.com wrote: On Thu, Apr 16, 2015 at 9:07 AM, Adam Tygart mo...@ksu.edu wrote: We are using 3.18.6-gentoo. Based on that, I was hoping that the kernel bug referred to in the bug report would have been fixed. The bug was supposed to be fixed, but you hit the bug again. could you check if the kernel client has any hang mds request. (check /sys/kernel/debug/ceph/*/mdsc on the machine that contain cephfs mount. If there is any request whose ID is significant smaller than other requests' IDs) Regards Yan, Zheng -- Adam On Wed, Apr 15, 2015 at 8:02 PM, Yan, Zheng uker...@gmail.com wrote: On Thu, Apr 16, 2015 at 5:29 AM, Kyle Hutson kylehut...@ksu.edu wrote: Thank you, John! That was exactly the bug we were hitting. My Google-fu didn't lead me to this one. here is the bug report http://tracker.ceph.com/issues/10449. It's a kernel client bug which causes the session map size increase infinitely. which version of linux kernel are using? Regards Yan, Zheng On Wed, Apr 15, 2015 at 4:16 PM, John Spray john.sp...@redhat.com wrote: On 15/04/2015 20:02, Kyle Hutson wrote: I upgraded to 0.94.1 from 0.94 on Monday, and everything had been going pretty well. Then, about noon today, we had an mds crash. And then the failover mds crashed. And this cascaded through all 4 mds servers we have. If I try to start it ('service ceph start mds' on CentOS 7.1), it appears to be OK for a little while. ceph -w goes through 'replay' 'reconnect' 'rejoin' 'clientreplay' and 'active' but nearly immediately after getting to 'active', it crashes again. I have the mds log at http://people.beocat.cis.ksu.edu/~kylehutson/ceph-mds.hobbit01.log http://people.beocat.cis.ksu.edu/%7Ekylehutson/ceph-mds.hobbit01.log For the possibly, but not necessarily, useful background info. - Yesterday we took our erasure coded pool and increased both pg_num and pgp_num from 2048 to 4096. We still have several objects misplaced (~17%), but those seem to be continuing to clean themselves up. - We are in the midst of a large (300+ TB) rsync from our old (non-ceph) filesystem to this filesystem. - Before we realized the mds crashes, we had just changed the size of our metadata pool from 2 to 4. It looks like you're seeing http://tracker.ceph.com/issues/10449, which is a situation where the SessionMap object becomes too big for the MDS to save.The cause of it in that case was stuck requests from a misbehaving client running a slightly older kernel. Assuming you're using the kernel client and having a similar problem, you could try to work around this situation by forcibly unmounting the clients while the MDS is offline, such that during clientreplay the MDS will remove them from the SessionMap after timing out, and then next time it tries to save the map it won't be oversized. If that works, you could then look into getting newer kernels on the clients to avoid hitting the issue again -- the #10449 ticket has some pointers about which kernel changes were relevant. Cheers, John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade from Giant 0.87-1 to Hammer 0.94-1
Hello, On Thu, 16 Apr 2015 00:41:29 +0200 Steffen W Sørensen wrote: Hi, Successfully upgrade a small development 4x node Giant 0.87-1 cluster to Hammer 0.94-1, each node with 6x OSD - 146GB, 19 pools, mainly 2 in usage. Only minor thing now ceph -s complaining over too may PGs, previously Giant had complain of too few, so various pools were bumped up till health status was okay as before upgrading. Admit, that after bumping PGs up in Giant we had changed pool sizes from 3 to 2 min 1 in fear of perf. when backfilling/recovering PGs. That later change would have _increased_ the number of recommended PG, not decreased it. With your cluster 2048 PGs total (all pools combined!) would be the sweet spot, see: http://ceph.com/pgcalc/ It seems to me that you increased PG counts assuming that the formula is per pool. # ceph -s cluster 16fe2dcf-2629-422f-a649-871deba78bcd health HEALTH_WARN too many PGs per OSD (1237 max 300) monmap e29: 3 mons at {0=10.0.3.4:6789/0,1=10.0.3.2:6789/0,2=10.0.3.1:6789/0} election epoch 1370, quorum 0,1,2 2,1,0 mdsmap e142: 1/1/1 up {0=2=up:active}, 1 up:standby osdmap e3483: 24 osds: 24 up, 24 in pgmap v3719606: 14848 pgs, 19 pools, 530 GB data, 133 kobjects 1055 GB used, 2103 GB / 3159 GB avail 14848 active+clean This is an insanely high PG count for this cluster and is certain to impact performance and resource requirements (all these PGs need to peer after all). Can we just reduce PGs again and should we decrement in minor steps one pool at a time… No, as per the documentation you can only increase PGs and PGPs. So your options are to totally flatten this cluster or if pools with important data exist to copy them to new, correctly sized, pools and delete all the oversized ones after that. Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds crashing
On Thu, Apr 16, 2015 at 9:07 AM, Adam Tygart mo...@ksu.edu wrote: We are using 3.18.6-gentoo. Based on that, I was hoping that the kernel bug referred to in the bug report would have been fixed. The bug was supposed to be fixed, but you hit the bug again. could you check if the kernel client has any hang mds request. (check /sys/kernel/debug/ceph/*/mdsc on the machine that contain cephfs mount. If there is any request whose ID is significant smaller than other requests' IDs) Regards Yan, Zheng -- Adam On Wed, Apr 15, 2015 at 8:02 PM, Yan, Zheng uker...@gmail.com wrote: On Thu, Apr 16, 2015 at 5:29 AM, Kyle Hutson kylehut...@ksu.edu wrote: Thank you, John! That was exactly the bug we were hitting. My Google-fu didn't lead me to this one. here is the bug report http://tracker.ceph.com/issues/10449. It's a kernel client bug which causes the session map size increase infinitely. which version of linux kernel are using? Regards Yan, Zheng On Wed, Apr 15, 2015 at 4:16 PM, John Spray john.sp...@redhat.com wrote: On 15/04/2015 20:02, Kyle Hutson wrote: I upgraded to 0.94.1 from 0.94 on Monday, and everything had been going pretty well. Then, about noon today, we had an mds crash. And then the failover mds crashed. And this cascaded through all 4 mds servers we have. If I try to start it ('service ceph start mds' on CentOS 7.1), it appears to be OK for a little while. ceph -w goes through 'replay' 'reconnect' 'rejoin' 'clientreplay' and 'active' but nearly immediately after getting to 'active', it crashes again. I have the mds log at http://people.beocat.cis.ksu.edu/~kylehutson/ceph-mds.hobbit01.log http://people.beocat.cis.ksu.edu/%7Ekylehutson/ceph-mds.hobbit01.log For the possibly, but not necessarily, useful background info. - Yesterday we took our erasure coded pool and increased both pg_num and pgp_num from 2048 to 4096. We still have several objects misplaced (~17%), but those seem to be continuing to clean themselves up. - We are in the midst of a large (300+ TB) rsync from our old (non-ceph) filesystem to this filesystem. - Before we realized the mds crashes, we had just changed the size of our metadata pool from 2 to 4. It looks like you're seeing http://tracker.ceph.com/issues/10449, which is a situation where the SessionMap object becomes too big for the MDS to save.The cause of it in that case was stuck requests from a misbehaving client running a slightly older kernel. Assuming you're using the kernel client and having a similar problem, you could try to work around this situation by forcibly unmounting the clients while the MDS is offline, such that during clientreplay the MDS will remove them from the SessionMap after timing out, and then next time it tries to save the map it won't be oversized. If that works, you could then look into getting newer kernels on the clients to avoid hitting the issue again -- the #10449 ticket has some pointers about which kernel changes were relevant. Cheers, John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] live migration fails with image on ceph
The issue is reproducible in svl-3 with rbd cache set to false. On the 5th ping-pong, the instance experienced ping drops and did not recover for 20+ minutes: (os-clients)[root@fedora21 nimbus-env]# nova live-migration lmtest1 (os-clients)[root@fedora21 nimbus-env]# nova show lmtest1 |grep -E 'hypervisor_hostname|task_state|vm_state' | OS-EXT-SRV-ATTR:hypervisor_hostname | svl-3-cc-nova1-002.cisco.com | | OS-EXT-STS:task_state| migrating | | OS-EXT-STS:vm_state | active | (os-clients)[root@fedora21 nimbus-env]# nova show lmtest1 |grep -E 'hypervisor_hostname|task_state|vm_state' | OS-EXT-SRV-ATTR:hypervisor_hostname | svl-3-cc-nova1-001.cisco.com | | OS-EXT-STS:task_state| - | | OS-EXT-STS:vm_state | active | (os-clients)[root@fedora21 nimbus-env]# ping -c3 -S60 10.33.143.215 PING 10.33.143.215 (10.33.143.215) 56(84) bytes of data. --- 10.33.143.215 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2001ms (os-clients)[root@fedora21 nimbus-env]# ping -c3 -S60 10.33.143.215 PING 10.33.143.215 (10.33.143.215) 56(84) bytes of data. --- 10.33.143.215 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 1999ms (os-clients)[root@fedora21 nimbus-env]# ping -c3 -S60 10.33.143.215 PING 10.33.143.215 (10.33.143.215) 56(84) bytes of data. --- 10.33.143.215 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 1999ms ‹ Yuming On 4/10/15, 4:51 PM, Josh Durgin jdur...@redhat.com wrote: On 04/08/2015 09:37 PM, Yuming Ma (yumima) wrote: Josh, I think we are using plain live migration and not mirroring block drives as the other test did. Do you have the migration flags or more from the libvirt log? Also which versions of qemu is this? The libvirt log message about qemuMigrationCancelDriveMirror from your first email is suspicious. Being unable to stop it may mean it was not running (fine, but libvirt shouldn't have tried to stop it), or it kept running (bad esp. if it's trying to copy to the same rbd). What are the chances or scenario that disk image can be corrupted during the live migration for both source and target are connected to the same volume and RBD caches is turned on: Generally rbd caching with live migration is safe. The way to get corruption is to have drive-mirror try to copy over the rbd on the destination while the source is still using the disk... Did you observe fs corruption after a live migration, or just other odd symptoms? Since a reboot fixed it, it sounds more like memory corruption to me, unless it was fsck'd during reboot. Josh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds crashing
On Thu, Apr 16, 2015 at 5:29 AM, Kyle Hutson kylehut...@ksu.edu wrote: Thank you, John! That was exactly the bug we were hitting. My Google-fu didn't lead me to this one. here is the bug report http://tracker.ceph.com/issues/10449. It's a kernel client bug which causes the session map size increase infinitely. which version of linux kernel are using? Regards Yan, Zheng On Wed, Apr 15, 2015 at 4:16 PM, John Spray john.sp...@redhat.com wrote: On 15/04/2015 20:02, Kyle Hutson wrote: I upgraded to 0.94.1 from 0.94 on Monday, and everything had been going pretty well. Then, about noon today, we had an mds crash. And then the failover mds crashed. And this cascaded through all 4 mds servers we have. If I try to start it ('service ceph start mds' on CentOS 7.1), it appears to be OK for a little while. ceph -w goes through 'replay' 'reconnect' 'rejoin' 'clientreplay' and 'active' but nearly immediately after getting to 'active', it crashes again. I have the mds log at http://people.beocat.cis.ksu.edu/~kylehutson/ceph-mds.hobbit01.log http://people.beocat.cis.ksu.edu/%7Ekylehutson/ceph-mds.hobbit01.log For the possibly, but not necessarily, useful background info. - Yesterday we took our erasure coded pool and increased both pg_num and pgp_num from 2048 to 4096. We still have several objects misplaced (~17%), but those seem to be continuing to clean themselves up. - We are in the midst of a large (300+ TB) rsync from our old (non-ceph) filesystem to this filesystem. - Before we realized the mds crashes, we had just changed the size of our metadata pool from 2 to 4. It looks like you're seeing http://tracker.ceph.com/issues/10449, which is a situation where the SessionMap object becomes too big for the MDS to save.The cause of it in that case was stuck requests from a misbehaving client running a slightly older kernel. Assuming you're using the kernel client and having a similar problem, you could try to work around this situation by forcibly unmounting the clients while the MDS is offline, such that during clientreplay the MDS will remove them from the SessionMap after timing out, and then next time it tries to save the map it won't be oversized. If that works, you could then look into getting newer kernels on the clients to avoid hitting the issue again -- the #10449 ticket has some pointers about which kernel changes were relevant. Cheers, John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Managing larger ceph clusters
I'm curious what people managing larger ceph clusters are doing with configuration management and orchestration to simplify their lives? We've been using ceph-deploy to manage our ceph clusters so far, but feel that moving the management of our clusters to standard tools would provide a little more consistency and help prevent some mistakes that have happened while using ceph-deploy. We're looking at using the same tools we use in our OpenStack environment (puppet/ansible), but I'm interested in hearing from people using chef/salt/juju as well. Some of the cluster operation tasks that I can think of along with ideas/concerns I have are: Keyring management Seems like hiera-eyaml is a natural fit for storing the keyrings. ceph.conf I believe the puppet ceph module can be used to manage this file, but I'm wondering if using a template (erb?) might be better method to keeping it organized and properly documented. Pool configuration The puppet module seems to be able to handle managing replicas and the number of placement groups, but I don't see support for erasure coded pools yet. This is probably something we would want the initial configuration to be set up by puppet, but not something we would want puppet changing on a production cluster. CRUSH maps Describing the infrastructure in yaml makes sense. Things like which servers are in which rows/racks/chassis. Also describing the type of server (model, number of HDDs, number of SSDs) makes sense. CRUSH rules I could see puppet managing the various rules based on the backend storage (HDD, SSD, primary affinity, erasure coding, etc). Replacing a failed HDD disk Do you automatically identify the new drive and start using it right away? I've seen people talk about using a combination of udev and special GPT partition IDs to automate this. If you have a cluster with thousands of drives I think automating the replacement makes sense. How do you handle the journal partition on the SSD? Does removing the old journal partition and creating a new one create a hole in the partition map (because the old partition is removed and the new one is created at the end of the drive)? Replacing a failed SSD journal Has anyone automated recreating the journal drive using Sebastien Han's instructions, or do you have to rebuild all the OSDs as well? http://www.sebastien-han.fr/blog/2014/11/27/ceph-recover-osds-after-ssd-jou rnal-failure/ Adding new OSD servers How are you adding multiple new OSD servers to the cluster? I could see an ansible playbook which disables nobackfill, noscrub, and nodeep-scrub followed by adding all the OSDs to the cluster being useful. Upgrading releases I've found an ansible playbook for doing a rolling upgrade which looks like it would work well, but are there other methods people are using? http://www.sebastien-han.fr/blog/2015/03/30/ceph-rolling-upgrades-with-ansi ble/ Decommissioning hardware Seems like another ansible playbook for reducing the OSDs weights to zero, marking the OSDs out, stopping the service, removing the OSD ID, removing the CRUSH entry, unmounting the drives, and finally removing the server would be the best method here. Any other ideas on how to approach this? That's all I can think of right now. Is there any other tasks that people have run into that are missing from this list? Thanks, Bryan This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds crashing
On 15/04/2015 20:02, Kyle Hutson wrote: I upgraded to 0.94.1 from 0.94 on Monday, and everything had been going pretty well. Then, about noon today, we had an mds crash. And then the failover mds crashed. And this cascaded through all 4 mds servers we have. If I try to start it ('service ceph start mds' on CentOS 7.1), it appears to be OK for a little while. ceph -w goes through 'replay' 'reconnect' 'rejoin' 'clientreplay' and 'active' but nearly immediately after getting to 'active', it crashes again. I have the mds log at http://people.beocat.cis.ksu.edu/~kylehutson/ceph-mds.hobbit01.log http://people.beocat.cis.ksu.edu/%7Ekylehutson/ceph-mds.hobbit01.log For the possibly, but not necessarily, useful background info. - Yesterday we took our erasure coded pool and increased both pg_num and pgp_num from 2048 to 4096. We still have several objects misplaced (~17%), but those seem to be continuing to clean themselves up. - We are in the midst of a large (300+ TB) rsync from our old (non-ceph) filesystem to this filesystem. - Before we realized the mds crashes, we had just changed the size of our metadata pool from 2 to 4. It looks like you're seeing http://tracker.ceph.com/issues/10449, which is a situation where the SessionMap object becomes too big for the MDS to save.The cause of it in that case was stuck requests from a misbehaving client running a slightly older kernel. Assuming you're using the kernel client and having a similar problem, you could try to work around this situation by forcibly unmounting the clients while the MDS is offline, such that during clientreplay the MDS will remove them from the SessionMap after timing out, and then next time it tries to save the map it won't be oversized. If that works, you could then look into getting newer kernels on the clients to avoid hitting the issue again -- the #10449 ticket has some pointers about which kernel changes were relevant. Cheers, John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Is ceph.com down?
Can't open at the moment, niever the website or apt. Trying from Brisbane, Australia. -- Lindsay ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.80.8 and librbd performance
On 04/14/2015 08:01 PM, shiva rkreddy wrote: The clusters are in test environment, so its a new deployment of 0.80.9. OS on the cluster nodes is reinstalled as well, so there shouldn't be any fs aging unless the disks are slowing down. The perf measurement is done initiating multiple cinder create/delete commands and tracking the volume to be in available or completely gone from cinder list output. Even running rbd rm command from cinder node results in similar behaviour. I'll try with increasing rbd_concurrent_management in ceph.conf. Is the param name rbd_concurrent_management or rbd-concurrent-management ? 'rbd concurrent management ops' - spaces, hyphens, and underscores are equivalent in ceph configuration. A log with 'debug ms = 1' and 'debug rbd = 20' from 'rbd rm' on both versions might give clues about what's going slower. Josh On Tue, Apr 14, 2015 at 12:36 PM, Josh Durgin jdur...@redhat.com mailto:jdur...@redhat.com wrote: I don't see any commits that would be likely to affect that between 0.80.7 and 0.80.9. Is this after upgrading an existing cluster? Could this be due to fs aging beneath your osds? How are you measuring create/delete performance? You can try increasing rbd concurrent management ops in ceph.conf on the cinder node. This affects delete speed, since rbd tries to delete each object in a volume. Josh *From:* shiva rkreddy shiva.rkre...@gmail.com mailto:shiva.rkre...@gmail.com *Sent:* Apr 14, 2015 5:53 AM *To:* Josh Durgin *Cc:* Ken Dreyer; Sage Weil; Ceph Development; ceph-us...@ceph.com mailto:ceph-us...@ceph.com *Subject:* Re: v0.80.8 and librbd performance Hi Josh, We are using firefly 0.80.9 and see both cinder create/delete numbers slow down compared 0.80.7. I don't see any specific tuning requirements and our cluster is run pretty much on default configuration. Do you recommend any tuning or can you please suggest some log signatures we need to be looking at? Thanks shiva On Wed, Mar 4, 2015 at 1:53 PM, Josh Durgin jdur...@redhat.com mailto:jdur...@redhat.com wrote: On 03/03/2015 03:28 PM, Ken Dreyer wrote: On 03/03/2015 04:19 PM, Sage Weil wrote: Hi, This is just a heads up that we've identified a performance regression in v0.80.8 from previous firefly releases. A v0.80.9 is working it's way through QA and should be out in a few days. If you haven't upgraded yet you may want to wait. Thanks! sage Hi Sage, I've seen a couple Redmine tickets on this (eg http://tracker.ceph.com/__issues/9854 http://tracker.ceph.com/issues/9854 , http://tracker.ceph.com/__issues/10956 http://tracker.ceph.com/issues/10956). It's not totally clear to me which of the 70+ unreleased commits on the firefly branch fix this librbd issue. Is it only the three commits in https://github.com/ceph/ceph/__pull/3410 https://github.com/ceph/ceph/pull/3410 , or are there more? Those are the only ones needed to fix the librbd performance regression, yes. Josh -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org mailto:majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/__majordomo-info.html http://vger.kernel.org/majordomo-info.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Is ceph.com down?
On 04/15/2015 09:30 AM, Lindsay Mathieson wrote: Can't open at the moment, niever the website or apt. Yes, it's down here as well. You can try eu.ceph.com if you need the packages. Or this one: http://ceph.mirror.digitalpacific.com.au/ (working on au.ceph.com) Trying from Brisbane, Australia. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph site is very slow
On 04/15/2015 10:20 AM, Ignazio Cassano wrote: Hi all, why ceph.com is very slow ? Not known right now. But you can try eu.ceph.com for your packages and downloads. It is impossible download files for installing ceph. Regards Ignazio ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph site is very slow
Many thanks 2015-04-15 10:44 GMT+02:00 Wido den Hollander w...@42on.com: On 04/15/2015 10:20 AM, Ignazio Cassano wrote: Hi all, why ceph.com is very slow ? Not known right now. But you can try eu.ceph.com for your packages and downloads. It is impossible download files for installing ceph. Regards Ignazio ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph site is very slow
Hi all, why ceph.com is very slow ? It is impossible download files for installing ceph. Regards Ignazio ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how to compute Ceph durability?
Thanks Mark Loic also gave me this link It would be a good start for sure Best regards -Message d'origine- De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de Mark Nelson Envoyé : mardi 14 avril 2015 14:11 À : ceph-users@lists.ceph.com Objet : Re: [ceph-users] how to compute Ceph durability? Hi Ghislain, Mark Kampe was working on durability models a couple of years ago, but I'm not sure if they ever were completed or if anyone has reviewed them. The source code is available here: https://github.com/ceph/ceph-tools/tree/master/models/reliability This was before EC was in Ceph, so I'm guessing new models would need to be created for that, but this may at least be a good place to start. Mark On 04/14/2015 07:04 AM, ghislain.cheval...@orange.com wrote: Hi All, Am I alone to have this need ? *De :*ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *De la part de* ghislain.cheval...@orange.com *Envoyé :* vendredi 20 mars 2015 11:47 *À :* ceph-users *Objet :* [ceph-users] how to compute Ceph durability? Hi all, I would like to compute the durability of data stored in a ceph environment according to the cluster topology (failure domains) and the data resiliency (replication/erasure coding). Does a tool exist ? Best regards *- - - - - - - - - - - - - - - - -* *Ghislain Chevalier ORANGE* +33299124432 +33788624370 ghislain.cheval...@orange.com mailto:ghislain.cheval...@orange-ftgroup.com __ ___ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. __ ___ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados Gateway and keystone
Thanks a lot That helps. De : Erik McCormick [mailto:emccorm...@cirrusseven.com] Envoyé : lundi 13 avril 2015 18:32 À : CHEVALIER Ghislain IMT/OLPS Cc : ceph-users Objet : Re: [ceph-users] Rados Gateway and keystone I haven't really used the S3 stuff much, but the credentials should be in keystone already. If you're in horizon, you can download them under Access and Security-API Access. Using the CLI you can use the openstack client like openstack credential list | show | create | delete | set or with the keystone client like keystone ec2-credentials-list, etc. Then you should be able to feed those credentials to the rgw like a normal S3 API call. Cheers, Erik On Mon, Apr 13, 2015 at 10:16 AM, ghislain.cheval...@orange.commailto:ghislain.cheval...@orange.com wrote: Hi all, Coming back to that issue. I successfully used keystone users for the rados gateway and the swift API but I still don't understand how it can work with S3 API and i.e. S3 users (AccessKey/SecretKey) I found a swift3 initiative but I think It's only compliant in a pure OpenStack swift environment by setting up a specific plug-in. https://github.com/stackforge/swift3 A rgw can be, at the same, time under keystone control and standard radosgw-admin if - for swift, you use the right authentication service (keystone or internal) - for S3, you use the internal authentication service So, my questions are still valid. How can a rgw work for S3 users if there are stored in keystone? Which is the accesskey and secretkey? What is the purpose of rgw s3 auth use keystone parameter ? Best regards -- De : ceph-users [mailto:ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com] De la part de ghislain.cheval...@orange.commailto:ghislain.cheval...@orange.com Envoyé : lundi 23 mars 2015 14:03 À : ceph-users Objet : [ceph-users] Rados Gateway and keystone Hi All, I just would to be sure about keystone configuration for Rados Gateway. I read the documentation http://ceph.com/docs/master/radosgw/keystone/ and http://ceph.com/docs/master/radosgw/config-ref/?highlight=keystone but I didn't catch if after having configured the rados gateway (ceph.conf) in order to use keystone, it becomes mandatory to create all the users in it. In other words, can a rgw be, at the same, time under keystone control and standard radosgw-admin ? How does it work for S3 users ? What is the purpose of rgw s3 auth use keystone parameter ? Best regards - - - - - - - - - - - - - - - - - Ghislain Chevalier +33299124432tel:%2B33299124432 +33788624370tel:%2B33788624370 ghislain.cheval...@orange.commailto:ghislain.cheval...@orange.com _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _ Ce message et ses