Re: [ceph-users] All pools have size=3 but MB data and MB used ratio is 1 to 5
I will start now to push a lot of data into the cluster to see if the metadata grows a lot or stays costant. There is a way to clean up old metadata ? I pushed a lot of more data to the cluster. Then I lead the cluster sleep for the night. This morning I find this values: 6841 MB data 25814 MB used that is a bit more of 1 to 3. It looks like the extra space is in these folders (for N from 1 to 36): /var/lib/ceph/osd/ceph-N/current/meta/ This meta folders have a lot of data in it. I would really be happy to have pointers to understand what is in there and how to clean that up eventually. The problem is that googling for ceph meta or ceph metadata will produce results for Ceph MDS that is completely unrelated :( thanks Saverio ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Hammer release data and a Design question
Hi, Am 26.03.2015 11:18, schrieb 10 minus: Hi , I 'm just starting on small Ceph implementation and wanted to know the release date for Hammer. Will it coincide with relase of Openstack. My Conf: (using 10G and Jumboframes on Centos 7 / RHEL7 ) 3x Mons (VMs) : CPU - 2 Memory - 4G Storage - 20 GB 4x OSDs : CPU - Haswell Xeon Memory - 8 GB Sata - 3x 2TB (3 OSD per node) SSD - 2x 480 GB ( Journaling and if possible tiering) This is a test environment to see how all the components play . If all goes well then we plan to increase the OSDs to 24 per node and RAM to 32 GB and a dual Socket Haswell Xeons 32GB for 24 OSDs are much to less!! I have 32GB for 12 OSDs - that's ok, but 64GB will be better. CPU depends on you Model (Cores, DualSocket?). Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Fwd: ceph-deploy : Certificate Error using wget on Debian
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I'm trying to create a 4-node Ceph Storage Cluster using ceph-deploy, following the official guide: http://docs.ceph.com/docs/master/start/quick-ceph-deploy/ I'm using debian wheezy 7 (x86_64) on all nodes and on each node, `uname -a` produces: Linux nodeX 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u2 x86_64 GNU/Linux. I'm having trouble getting ceph-deploy to run. The ceph.log reads: [ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.22): /usr/bin/ceph-deploy install node0 node1 node2 node3 [ceph_deploy.install][DEBUG ] Installing stable version giant on cluster ceph hosts node0 node1 node2 node3 [ceph_deploy.install][DEBUG ] Detecting platform for host node0 ... [node0][DEBUG ] connection detected need for sudo [node0][DEBUG ] connected to host: node0 [node0][DEBUG ] detect platform information from remote host [node0][DEBUG ] detect machine type [ceph_deploy.install][INFO ] Distro info: debian 7.8 wheezy [node0][INFO ] installing ceph on node0 [node0][INFO ] Running command: sudo env DEBIAN_FRONTEND=noninteractive apt-get -q install --assume-yes ca-certificates [node0][DEBUG ] Reading package lists... [node0][DEBUG ] Building dependency tree... [node0][DEBUG ] Reading state information... [node0][DEBUG ] ca-certificates is already the newest version. [node0][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. [node0][INFO ] Running command: sudo wget -O release.asc https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc [node0][WARNING] --2015-03-27 13:24:50-- https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc [node0][WARNING] Resolving ceph.com (ceph.com)... 208.113.241.137, 2607:f298:4:147::b05:fe2a [node0][WARNING] Connecting to ceph.com (ceph.com)|208.113.241.137|:443... connected. [node0][WARNING] ERROR: The certificate of `ceph.com' is not trusted. [node0][WARNING] ERROR: The certificate of `ceph.com' hasn't got a known issuer. [node0][WARNING] command returned non-zero exit status: 5 [node0][INFO ] Running command: sudo apt-key add release.asc [node0][WARNING] gpg: no valid OpenPGP data found. [node0][ERROR ] RuntimeError: command returned non-zero exit status: 2 [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: apt-key add release.asc So the problem lies with 'wget'. According to this thread ( http://www.linuxquestions.org/questions/debian-26/wget-certificate-error-4175495817/ ), there is a difference between the ubuntu and the debian versions of 'wget', they seem to be compiled and linked against different libraries. So on Debian the problem occurs, while on Ubuntu it does not. ( I did try to run `wget -O release.asc https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc` on Ubuntu 14.04 and it did finish successfully, while it fails on Debian ). So, my question is, what is the proper way of dealing with this error? As a workaround, I managed to get it to work (again, according to the same thread linked before): 1) visit 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=key /release.asc' using a browser, and export the certificate as ceph.pem 2) on every node run `# apt-get install openssl ca-certificates` 3) copy (scp) ceph.pem to /usr/share/ca-certificates/ceph.pem on every node 4) on every node run `# echo ceph.pem /etc/ca-certificates.pem` 5) on every node run `# update-ca-certificates` After these, I was able to run `wget -O release.asc https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc` on every node, successfully. Additionally, ceph-deploy seems to work fine now. Thank you, Vasilis -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQIcBAEBAgAGBQJVFUj0AAoJENYedgpVEk4LRCMP/jNIKUwnexk6RQtQT71zPIub WRxuHotl0QA3//DoOGYtXYfDJDAqKchwwGPmHxPCZCk1WA++B3ksGrHaszLx8kbd eqOip4+6URctolOVE3YaruwJVSghp9cWlMNySejw2cLioJaz7N4tdLQhXacj8R5j FR8ZD0p6OoKk7LgEGMJ6Qdal3GgQrEbdCQj0Zn8bHWYbURM1Yp1cnE4Ak2a8emGk K+5nmA+MyMhJiystIC1p7d6eEge0IrHtToOmhfsJXuf6swGDX3bywmOthfctnVP9 sOQVYAc+yD++texujZ3ue8Sodl9QbaiLieRRDwiB/8wi7thTh2ibSoDRoSHlSIpf +RXRorpW7auOMoBAtnYReVWXT/MfK5B6+Tad0XmlxCDkscBImgfPPRkUtFa06xpW YKpSULtZMGAc9Cs3K9+e1wN9IIQ8mQ386m7WLjQWEvyHPVyI4rHdV1wvn0dYMhlk 7ndOCwZf6ZMrfjiCVdEY2Ygs+s+4zRCfEws72y3x3+JYtqMPUCtQ/LyzYem0HiJE YfiDNxDJzgOrR2WmwqIss3RlseIv+h91wUrEvGpBty3uYm7LEExGeNkCykanxf7l 1SiNKxrV0F/PTfkE1vaUJbCXgAPtEhIDn71A5O/VK8SLp9E9DKuNQBNLQ6BYob+b KY6zhoGEEj6LDabYIjEK =E/Gz -END PGP SIGNATURE- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?
Hi, On 03/18/2015 03:01 PM, Gregory Farnum wrote: I think it tended to crash rather than hang like this so I'm a bit surprised, but if this op is touching a broken file or something that could explain it. FWIW, the last time I had the issue (on a 3.10.9 kernel), btrfs was freezing, waiting forever and even starting threads rising the host's load to numbers like 400. I can't say much more for now as I had to move to XFS. I will setup a lab someday to try btrfs again, has I heard it was more stable since 3.16.1, but I think I won't enable compression. BTW, it's really a filesystem I would like to use in production someday (as many I think ;)). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrating objects from one pool to another?
Hi Jean You would probably need this ceph osd pool create glance-images-bkp 128 128 rados cppool glance-images glance-images-bkp ceph osd pool rename glance-images glance-images-old ceph osd pool rename glance-images-bkp glance-images ceph osd pool delete glance-images-old glance-images-old --yes-i-really-really-mean-it ( once you are sure data is moved 100% ) I would suggest to stop openstack services that are using the original pool , then copy the data , rename pools , finally start openstack services and check everything is there. I have done this once with success. Karan Singh Systems Specialist , Storage Platforms CSC - IT Center for Science, Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland mobile: +358 503 812758 tel. +358 9 4572001 fax +358 9 4572302 http://www.csc.fi/ On 27 Mar 2015, at 00:01, Gregory Farnum g...@gregs42.com wrote: On Thu, Mar 26, 2015 at 2:53 PM, Steffen W Sørensen ste...@me.com wrote: On 26/03/2015, at 21.07, J-P Methot jpmet...@gtcomm.net wrote: That's a great idea. I know I can setup cinder (the openstack volume manager) as a multi-backend manager and migrate from one backend to the other, each backend linking to different pools of the same ceph cluster. What bugs me though is that I'm pretty sure the image store, glance, wouldn't let me do that. Additionally, since the compute component also has its own ceph pool, I'm pretty sure it won't let me migrate the data through openstack. Hm wouldn’t it be possible to do something similar ala: # list object from src pool rados ls objects loop | filter-obj-id | while read obj; do # export $obj to local disk rados -p pool-wth-too-many-pgs get $obj # import $obj from local disk to new pool rados -p better-sized-pool put $obj done You would also have issues with snapshots if you do this on an RBD pool. That's unfortunately not feasible. -Greg possible split/partition list of objects into multiple concurrent loops, possible from multiple boxes as seems fit for resources at hand, cpu, memory, network, ceph perf. /Steffen On 3/26/2015 3:54 PM, Steffen W Sørensen wrote: On 26/03/2015, at 20.38, J-P Methot jpmet...@gtcomm.net wrote: Lately I've been going back to work on one of my first ceph setup and now I see that I have created way too many placement groups for the pools on that setup (about 10 000 too many). I believe this may impact performances negatively, as the performances on this ceph cluster are abysmal. Since it is not possible to reduce the number of PGs in a pool, I was thinking of creating new pools with a smaller number of PGs, moving the data from the old pools to the new pools and then deleting the old pools. I haven't seen any command to copy objects from one pool to another. Would that be possible? I'm using ceph for block storage with openstack, so surely there must be a way to move block devices from a pool to another, right? What I did a one point was going one layer higher in my storage abstraction, and created new Ceph pools and used those for new storage resources/pool in my VM env. (ProxMox) on top of Ceph RBD and then did a live migration of virtual disks there, assume you could do the same in OpenStack. My 0.02$ /Steffen -- == Jean-Philippe Méthot Administrateur système / System administrator GloboTech Communications Phone: 1-514-907-0050 Toll Free: 1-(888)-GTCOMM1 Fax: 1-(514)-907-0750 jpmet...@gtcomm.net http://www.gtcomm.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fwd: ceph-deploy : Certificate Error using wget on Debian
It looks like ceph.com is having some major issues with their git repository right now.. https://ceph.com/git/ gives a 500 error On 3/27/2015 8:11 AM, Vasilis Souleles wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I'm trying to create a 4-node Ceph Storage Cluster using ceph-deploy, following the official guide: http://docs.ceph.com/docs/master/start/quick-ceph-deploy/ I'm using debian wheezy 7 (x86_64) on all nodes and on each node, `uname -a` produces: Linux nodeX 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u2 x86_64 GNU/Linux. I'm having trouble getting ceph-deploy to run. The ceph.log reads: [ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.22): /usr/bin/ceph-deploy install node0 node1 node2 node3 [ceph_deploy.install][DEBUG ] Installing stable version giant on cluster ceph hosts node0 node1 node2 node3 [ceph_deploy.install][DEBUG ] Detecting platform for host node0 ... [node0][DEBUG ] connection detected need for sudo [node0][DEBUG ] connected to host: node0 [node0][DEBUG ] detect platform information from remote host [node0][DEBUG ] detect machine type [ceph_deploy.install][INFO ] Distro info: debian 7.8 wheezy [node0][INFO ] installing ceph on node0 [node0][INFO ] Running command: sudo env DEBIAN_FRONTEND=noninteractive apt-get -q install --assume-yes ca-certificates [node0][DEBUG ] Reading package lists... [node0][DEBUG ] Building dependency tree... [node0][DEBUG ] Reading state information... [node0][DEBUG ] ca-certificates is already the newest version. [node0][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. [node0][INFO ] Running command: sudo wget -O release.asc https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc [node0][WARNING] --2015-03-27 13:24:50-- https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc [node0][WARNING] Resolving ceph.com (ceph.com)... 208.113.241.137, 2607:f298:4:147::b05:fe2a [node0][WARNING] Connecting to ceph.com (ceph.com)|208.113.241.137|:443... connected. [node0][WARNING] ERROR: The certificate of `ceph.com' is not trusted. [node0][WARNING] ERROR: The certificate of `ceph.com' hasn't got a known issuer. [node0][WARNING] command returned non-zero exit status: 5 [node0][INFO ] Running command: sudo apt-key add release.asc [node0][WARNING] gpg: no valid OpenPGP data found. [node0][ERROR ] RuntimeError: command returned non-zero exit status: 2 [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: apt-key add release.asc So the problem lies with 'wget'. According to this thread ( http://www.linuxquestions.org/questions/debian-26/wget-certificate-error-4175495817/ ), there is a difference between the ubuntu and the debian versions of 'wget', they seem to be compiled and linked against different libraries. So on Debian the problem occurs, while on Ubuntu it does not. ( I did try to run `wget -O release.asc https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc` on Ubuntu 14.04 and it did finish successfully, while it fails on Debian ). So, my question is, what is the proper way of dealing with this error? As a workaround, I managed to get it to work (again, according to the same thread linked before): 1) visit 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=key /release.asc' using a browser, and export the certificate as ceph.pem 2) on every node run `# apt-get install openssl ca-certificates` 3) copy (scp) ceph.pem to /usr/share/ca-certificates/ceph.pem on every node 4) on every node run `# echo ceph.pem /etc/ca-certificates.pem` 5) on every node run `# update-ca-certificates` After these, I was able to run `wget -O release.asc https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc` on every node, successfully. Additionally, ceph-deploy seems to work fine now. Thank you, Vasilis -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQIcBAEBAgAGBQJVFUj0AAoJENYedgpVEk4LRCMP/jNIKUwnexk6RQtQT71zPIub WRxuHotl0QA3//DoOGYtXYfDJDAqKchwwGPmHxPCZCk1WA++B3ksGrHaszLx8kbd eqOip4+6URctolOVE3YaruwJVSghp9cWlMNySejw2cLioJaz7N4tdLQhXacj8R5j FR8ZD0p6OoKk7LgEGMJ6Qdal3GgQrEbdCQj0Zn8bHWYbURM1Yp1cnE4Ak2a8emGk K+5nmA+MyMhJiystIC1p7d6eEge0IrHtToOmhfsJXuf6swGDX3bywmOthfctnVP9 sOQVYAc+yD++texujZ3ue8Sodl9QbaiLieRRDwiB/8wi7thTh2ibSoDRoSHlSIpf +RXRorpW7auOMoBAtnYReVWXT/MfK5B6+Tad0XmlxCDkscBImgfPPRkUtFa06xpW YKpSULtZMGAc9Cs3K9+e1wN9IIQ8mQ386m7WLjQWEvyHPVyI4rHdV1wvn0dYMhlk 7ndOCwZf6ZMrfjiCVdEY2Ygs+s+4zRCfEws72y3x3+JYtqMPUCtQ/LyzYem0HiJE YfiDNxDJzgOrR2WmwqIss3RlseIv+h91wUrEvGpBty3uYm7LEExGeNkCykanxf7l 1SiNKxrV0F/PTfkE1vaUJbCXgAPtEhIDn71A5O/VK8SLp9E9DKuNQBNLQ6BYob+b KY6zhoGEEj6LDabYIjEK =E/Gz -END PGP SIGNATURE- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Snapshots and fstrim with cache tiers ?
Hello, The snapshot with a cache tier part was answered by Greg Farnum (https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18329.html). What about fstrim with a cache tier ? It doesn't seem to work. Also is there a background task that recovers freed blocks ? Best regards, Frédéric. Le 25/03/2015 11:14, Frédéric Nass a écrit : Hello, I have a few questions regarding snapshots and fstrim with cache tiers. In the cache tier and erasure coding FAQ related to ICE 1.2 (based on Firefly), Inktank says Snapshots are not supported in conjunction with cache tiers. What are the risks of using snapshots with cache tiers ? Would this better not use it recommandation still be true with Giant or Hammer ? Regarding the fstrim command, it doesn't seem to work with cache tiers. The freed up blocks don't get back in the ceph cluster. Can someone confirm this ? Is there something we can do to get those freed up blocks back in the cluster ? Also, can we run an fstrim task from the cluster side ? That is, without having to map and mount each rbd image or rely on the client to operate this task ? Best regards, -- Frédéric Nass Sous-direction Infrastructures Direction du Numérique Université de Lorraine email : frederic.n...@univ-lorraine.fr Tél : +33 3 83 68 53 83 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Frédéric Nass Sous direction des Infrastructures, Direction du Numérique, Université de Lorraine. Tél : 03.83.68.53.83 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph -s slow return result
What's the current health of the cluster? It may help to compact the monitors' LevelDB store if they have grown in size http://www.sebastien-han.fr/blog/2014/10/27/ceph-mon-store-taking-up-a-lot-of-space/ Depends on the size of the mon's store size it may take some time to compact, make sure to do only one at a time. *Kobi Laredo* *Cloud Systems Engineer* | (*408) 409-KOBI* On Fri, Mar 27, 2015 at 10:31 AM, Chu Duc Minh chu.ducm...@gmail.com wrote: All my monitors running. But i deleting pool .rgw.buckets, now having 13 million objects (just test data). The reason that i must delete this pool is my cluster become unstable, and sometimes an OSD down, PG peering, incomplete,... Therefore i must delete this pool to re-stablize my cluster. (radosgw is too slow for delete objects when one of my bucket reachs few million objects). Regards, On Sat, Mar 28, 2015 at 12:23 AM, Gregory Farnum g...@gregs42.com wrote: Are all your monitors running? Usually a temporary hang means that the Ceph client tries to reach a monitor that isn't up, then times out and contacts a different one. I have also seen it just be slow if the monitors are processing so many updates that they're behind, but that's usually on a very unhappy cluster. -Greg On Fri, Mar 27, 2015 at 8:50 AM Chu Duc Minh chu.ducm...@gmail.com wrote: On my CEPH cluster, ceph -s return result quite slow. Sometimes it return result immediately, sometimes i hang few seconds before return result. Do you think this problem (ceph -s slow return) only relate to ceph-mon(s) process? or maybe it relate to ceph-osd(s) too? (i deleting a big bucket, .rgw.buckets, and ceph-osd(s) disk util quite high) Regards, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] monitor 0.87.1 crashes
Hi all, In a fully functional ceph installation today we suffer a problem with ceph monitors, that started crashing with following error: include/interval_set.h: 340: FAILED assert(0) Is there any related bug? Thanks a lot in advance, Samuel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS Slow writes with 1MB files
Specifically related to BTRFS, if you have random IO to existing objects it will cause terrible fragmentation due to COW. BTRFS is often faster than XFS initially but after it starts fragmenting can become much slower for sequential reads. You may want to try XFS again and see if you can improve the read performance (increasing read ahead both on the cephfs client and on the underlying OSD block devices to something like 4MB might help). Mark On 03/27/2015 11:47 AM, Barclay Jameson wrote: Opps I should have said that I am not just writing the data but copying it : time cp Small1/* Small2/* Thanks, BJ On Fri, Mar 27, 2015 at 11:40 AM, Barclay Jameson almightybe...@gmail.com wrote: I did a Ceph cluster install 2 weeks ago where I was getting great performance (~= PanFS) where I could write 100,000 1MB files in 61 Mins (Took PanFS 59 Mins). I thought I could increase the performance by adding a better MDS server so I redid the entire build. Now it takes 4 times as long to write the same data as it did before. The only thing that changed was the MDS server. (I even tried moving the MDS back on the old slower node and the performance was the same.) The first install was on CentOS 7. I tried going down to CentOS 6.6 and it's the same results. I use the same scripts to install the OSDs (which I created because I can never get ceph-deploy to behave correctly. Although, I did use ceph-deploy to create the MDS and MON and initial cluster creation.) I use btrfs on the OSDS as I can get 734 MB/s write and 1100 MB/s read with rados bench -p cephfs_data 500 write --no-cleanup rados bench -p cephfs_data 500 seq (xfs was 734 MB/s write but only 200 MB/s read) Could anybody think of a reason as to why I am now getting a huge regression. Hardware Setup: [OSDs] 64 GB 2133 MHz Dual Proc E5-2630 v3 @ 2.40GHz (16 Cores) 40Gb Mellanox NIC [MDS/MON new] 128 GB 2133 MHz Dual Proc E5-2650 v3 @ 2.30GHz (20 Cores) 40Gb Mellanox NIC [MDS/MON old] 32 GB 800 MHz Dual Proc E5472 @ 3.00GHz (8 Cores) 10Gb Intel NIC -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph -s slow return result
Are all your monitors running? Usually a temporary hang means that the Ceph client tries to reach a monitor that isn't up, then times out and contacts a different one. I have also seen it just be slow if the monitors are processing so many updates that they're behind, but that's usually on a very unhappy cluster. -Greg On Fri, Mar 27, 2015 at 8:50 AM Chu Duc Minh chu.ducm...@gmail.com wrote: On my CEPH cluster, ceph -s return result quite slow. Sometimes it return result immediately, sometimes i hang few seconds before return result. Do you think this problem (ceph -s slow return) only relate to ceph-mon(s) process? or maybe it relate to ceph-osd(s) too? (i deleting a big bucket, .rgw.buckets, and ceph-osd(s) disk util quite high) Regards, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph -s slow return result
All my monitors running. But i deleting pool .rgw.buckets, now having 13 million objects (just test data). The reason that i must delete this pool is my cluster become unstable, and sometimes an OSD down, PG peering, incomplete,... Therefore i must delete this pool to re-stablize my cluster. (radosgw is too slow for delete objects when one of my bucket reachs few million objects). Regards, On Sat, Mar 28, 2015 at 12:23 AM, Gregory Farnum g...@gregs42.com wrote: Are all your monitors running? Usually a temporary hang means that the Ceph client tries to reach a monitor that isn't up, then times out and contacts a different one. I have also seen it just be slow if the monitors are processing so many updates that they're behind, but that's usually on a very unhappy cluster. -Greg On Fri, Mar 27, 2015 at 8:50 AM Chu Duc Minh chu.ducm...@gmail.com wrote: On my CEPH cluster, ceph -s return result quite slow. Sometimes it return result immediately, sometimes i hang few seconds before return result. Do you think this problem (ceph -s slow return) only relate to ceph-mon(s) process? or maybe it relate to ceph-osd(s) too? (i deleting a big bucket, .rgw.buckets, and ceph-osd(s) disk util quite high) Regards, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ERROR: missing keyring, cannot use cephx for authentication
the thing is that the devices are not mounting after reboot… Any ideas? [cid:image005.png@01D00809.A6D502D0] Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146 Mobile: +51 1 5538883255 CCIE - 44433 Cisco.comhttp://www.cisco.com/ [cid:image006.gif@01D00809.A6D502D0] Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click herehttp://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. On Mar 23, 2015, at 3:37 PM, Thomas Foster thomas.foste...@gmail.commailto:thomas.foste...@gmail.com wrote: check your server where that osd is located and see if you have created the directory correctly. If you didn't create it correctly you would get that error message. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph -s slow return result
On my CEPH cluster, ceph -s return result quite slow. Sometimes it return result immediately, sometimes i hang few seconds before return result. Do you think this problem (ceph -s slow return) only relate to ceph-mon(s) process? or maybe it relate to ceph-osd(s) too? (i deleting a big bucket, .rgw.buckets, and ceph-osd(s) disk util quite high) Regards, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CephFS Slow writes with 1MB files
I did a Ceph cluster install 2 weeks ago where I was getting great performance (~= PanFS) where I could write 100,000 1MB files in 61 Mins (Took PanFS 59 Mins). I thought I could increase the performance by adding a better MDS server so I redid the entire build. Now it takes 4 times as long to write the same data as it did before. The only thing that changed was the MDS server. (I even tried moving the MDS back on the old slower node and the performance was the same.) The first install was on CentOS 7. I tried going down to CentOS 6.6 and it's the same results. I use the same scripts to install the OSDs (which I created because I can never get ceph-deploy to behave correctly. Although, I did use ceph-deploy to create the MDS and MON and initial cluster creation.) I use btrfs on the OSDS as I can get 734 MB/s write and 1100 MB/s read with rados bench -p cephfs_data 500 write --no-cleanup rados bench -p cephfs_data 500 seq (xfs was 734 MB/s write but only 200 MB/s read) Could anybody think of a reason as to why I am now getting a huge regression. Hardware Setup: [OSDs] 64 GB 2133 MHz Dual Proc E5-2630 v3 @ 2.40GHz (16 Cores) 40Gb Mellanox NIC [MDS/MON new] 128 GB 2133 MHz Dual Proc E5-2650 v3 @ 2.30GHz (20 Cores) 40Gb Mellanox NIC [MDS/MON old] 32 GB 800 MHz Dual Proc E5472 @ 3.00GHz (8 Cores) 10Gb Intel NIC ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS Slow writes with 1MB files
So this is exactly the same test you ran previously, but now it's on faster hardware and the test is slower? Do you have more data in the test cluster? One obvious possibility is that previously you were working entirely in the MDS' cache, but now you've got more dentries and so it's kicking data out to RADOS and then reading it back in. If you've got the memory (you appear to) you can pump up the mds cache size config option quite dramatically from it's default 10. Other things to check are that you've got an appropriately-sized metadata pool, that you've not got clients competing against each other inappropriately, etc. -Greg On Fri, Mar 27, 2015 at 9:47 AM, Barclay Jameson almightybe...@gmail.com wrote: Opps I should have said that I am not just writing the data but copying it : time cp Small1/* Small2/* Thanks, BJ On Fri, Mar 27, 2015 at 11:40 AM, Barclay Jameson almightybe...@gmail.com wrote: I did a Ceph cluster install 2 weeks ago where I was getting great performance (~= PanFS) where I could write 100,000 1MB files in 61 Mins (Took PanFS 59 Mins). I thought I could increase the performance by adding a better MDS server so I redid the entire build. Now it takes 4 times as long to write the same data as it did before. The only thing that changed was the MDS server. (I even tried moving the MDS back on the old slower node and the performance was the same.) The first install was on CentOS 7. I tried going down to CentOS 6.6 and it's the same results. I use the same scripts to install the OSDs (which I created because I can never get ceph-deploy to behave correctly. Although, I did use ceph-deploy to create the MDS and MON and initial cluster creation.) I use btrfs on the OSDS as I can get 734 MB/s write and 1100 MB/s read with rados bench -p cephfs_data 500 write --no-cleanup rados bench -p cephfs_data 500 seq (xfs was 734 MB/s write but only 200 MB/s read) Could anybody think of a reason as to why I am now getting a huge regression. Hardware Setup: [OSDs] 64 GB 2133 MHz Dual Proc E5-2630 v3 @ 2.40GHz (16 Cores) 40Gb Mellanox NIC [MDS/MON new] 128 GB 2133 MHz Dual Proc E5-2650 v3 @ 2.30GHz (20 Cores) 40Gb Mellanox NIC [MDS/MON old] 32 GB 800 MHz Dual Proc E5472 @ 3.00GHz (8 Cores) 10Gb Intel NIC ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Snapshots and fstrim with cache tiers ?
On Wed, Mar 25, 2015 at 3:14 AM, Frédéric Nass frederic.n...@univ-lorraine.fr wrote: Hello, I have a few questions regarding snapshots and fstrim with cache tiers. In the cache tier and erasure coding FAQ related to ICE 1.2 (based on Firefly), Inktank says Snapshots are not supported in conjunction with cache tiers. What are the risks of using snapshots with cache tiers ? Would this better not use it recommandation still be true with Giant or Hammer ? Regarding the fstrim command, it doesn't seem to work with cache tiers. The freed up blocks don't get back in the ceph cluster. Can someone confirm this ? Is there something we can do to get those freed up blocks back in the cluster ? It does work, but there are two effects you're missing here: 1) The object can be deleted in the cache tier, but it won't get deleted from the backing pool until it gets flushed out of the cache pool. Depending on your workload this can take a while. 2) On erasure-coded pool, the OSD makes sure it can roll back a certain number of operations per PG. In the case of deletions, this means keeping the object data around for a while. This can also take a while if you're not doing many operations. This has been discussed on the list before; I think you'll want to look for a thread about rollback and pg log size. -Greg Also, can we run an fstrim task from the cluster side ? That is, without having to map and mount each rbd image or rely on the client to operate this task ? Best regards, -- Frédéric Nass Sous-direction Infrastructures Direction du Numérique Université de Lorraine email : frederic.n...@univ-lorraine.fr Tél : +33 3 83 68 53 83 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] monitor 0.87.1 crashes
You'll want to at least include the backtrace. -Sam On 03/27/2015 10:55 AM, samuel wrote: Hi all, In a fully functional ceph installation today we suffer a problem with ceph monitors, that started crashing with following error: include/interval_set.h: 340: FAILED assert(0) Is there any related bug? Thanks a lot in advance, Samuel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] monitor 0.87.1 crashes
Here it it goes (in case further information is needed, just ask and I would gladly offer it): -5 2015-03-27 19:06:01.168361 7f94b4184700 5 mon.mon01@0(leader).osd e37404 send_incremental [37403..37404] to client.1419434 10.10.200.3:0/280 8592243 -4 2015-03-27 19:06:01.168427 7f94b4184700 1 -- 10.10.200.20:6789/0 -- client.1419434 10.10.200.3:0/2808592243 -- osd_map(37403..37404 src has 36883..37404) v3 -- ?+0 0x3c59d40 -3 2015-03-27 19:06:01.168451 7f94b4184700 1 -- 10.10.200.20:6789/0 -- 10.10.200.3:0/2808592243 -- mon_subscribe_ack(300s) v1 -- ?+0 0x3fa4d00 con 0x3c7e460 -2 2015-03-27 19:06:01.168465 7f94b4184700 1 -- 10.10.200.20:6789/0 == client.1419434 10.10.200.3:0/2808592243 4 pool_op(delete unmanaged snap pool 6 auid 0 tid 2617 name v0) v4 65+0+0 (423335705 0 0) 0x3c33600 con 0x3c7e460 -1 2015-03-27 19:06:01.168475 7f94b4184700 5 mon.mon01@0(leader).paxos(paxos active c 16805455..16806016) is_readable = 1 - now=2015-03-27 19:0 6:01.168476 lease_expire=0.00 has v0 lc 16806016 0 2015-03-27 19:06:01.170738 7f94b4184700 -1 ./include/interval_set.h: In function 'void interval_setT::insert(T, T) [with T = snapid_t]' thread 7f94b4184700 time 2015-03-27 19:06:01.168499 ./include/interval_set.h: 340: FAILED assert(0) ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7d7825] 2: /usr/bin/ceph-mon() [0x88def5] 3: (pg_pool_t::remove_unmanaged_snap(snapid_t)+0x43) [0x886e53] 4: (OSDMonitor::prepare_pool_op(MPoolOp*)+0xac5) [0x628d65] 5: (OSDMonitor::prepare_update(PaxosServiceMessage*)+0x23b) [0x63b4eb] 6: (PaxosService::dispatch(PaxosServiceMessage*)+0xd0f) [0x5f923f] 7: (Monitor::dispatch(MonSession*, Message*, bool)+0x2a3) [0x5c0cf3] 8: (Monitor::_ms_dispatch(Message*)+0x1cd) [0x5c178d] 9: (Monitor::ms_dispatch(Message*)+0x23) [0x5e2443] 10: (DispatchQueue::entry()+0x62a) [0x9194da] 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x7bc0cd] 12: (()+0x7df3) [0x7f94bcfb0df3] 13: (clone()+0x6d) [0x7f94bba931ad] On 27 March 2015 at 19:04, Samuel Just sj...@redhat.com wrote: You'll want to at least include the backtrace. -Sam On 03/27/2015 10:55 AM, samuel wrote: Hi all, In a fully functional ceph installation today we suffer a problem with ceph monitors, that started crashing with following error: include/interval_set.h: 340: FAILED assert(0) Is there any related bug? Thanks a lot in advance, Samuel ___ ceph-users mailing listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] monitor 0.87.1 crashes
apologies for the noise. Host 10.10.200.3 had some issues that made monitors to crash. Thanks a lot for your help, Samuel On 27 March 2015 at 19:09, samuel sam...@gmail.com wrote: Here it it goes (in case further information is needed, just ask and I would gladly offer it): -5 2015-03-27 19:06:01.168361 7f94b4184700 5 mon.mon01@0(leader).osd e37404 send_incremental [37403..37404] to client.1419434 10.10.200.3:0/280 8592243 -4 2015-03-27 19:06:01.168427 7f94b4184700 1 -- 10.10.200.20:6789/0 -- client.1419434 10.10.200.3:0/2808592243 -- osd_map(37403..37404 src has 36883..37404) v3 -- ?+0 0x3c59d40 -3 2015-03-27 19:06:01.168451 7f94b4184700 1 -- 10.10.200.20:6789/0 -- 10.10.200.3:0/2808592243 -- mon_subscribe_ack(300s) v1 -- ?+0 0x3fa4d00 con 0x3c7e460 -2 2015-03-27 19:06:01.168465 7f94b4184700 1 -- 10.10.200.20:6789/0 == client.1419434 10.10.200.3:0/2808592243 4 pool_op(delete unmanaged snap pool 6 auid 0 tid 2617 name v0) v4 65+0+0 (423335705 0 0) 0x3c33600 con 0x3c7e460 -1 2015-03-27 19:06:01.168475 7f94b4184700 5 mon.mon01@0(leader).paxos(paxos active c 16805455..16806016) is_readable = 1 - now=2015-03-27 19:0 6:01.168476 lease_expire=0.00 has v0 lc 16806016 0 2015-03-27 19:06:01.170738 7f94b4184700 -1 ./include/interval_set.h: In function 'void interval_setT::insert(T, T) [with T = snapid_t]' thread 7f94b4184700 time 2015-03-27 19:06:01.168499 ./include/interval_set.h: 340: FAILED assert(0) ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7d7825] 2: /usr/bin/ceph-mon() [0x88def5] 3: (pg_pool_t::remove_unmanaged_snap(snapid_t)+0x43) [0x886e53] 4: (OSDMonitor::prepare_pool_op(MPoolOp*)+0xac5) [0x628d65] 5: (OSDMonitor::prepare_update(PaxosServiceMessage*)+0x23b) [0x63b4eb] 6: (PaxosService::dispatch(PaxosServiceMessage*)+0xd0f) [0x5f923f] 7: (Monitor::dispatch(MonSession*, Message*, bool)+0x2a3) [0x5c0cf3] 8: (Monitor::_ms_dispatch(Message*)+0x1cd) [0x5c178d] 9: (Monitor::ms_dispatch(Message*)+0x23) [0x5e2443] 10: (DispatchQueue::entry()+0x62a) [0x9194da] 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x7bc0cd] 12: (()+0x7df3) [0x7f94bcfb0df3] 13: (clone()+0x6d) [0x7f94bba931ad] On 27 March 2015 at 19:04, Samuel Just sj...@redhat.com wrote: You'll want to at least include the backtrace. -Sam On 03/27/2015 10:55 AM, samuel wrote: Hi all, In a fully functional ceph installation today we suffer a problem with ceph monitors, that started crashing with following error: include/interval_set.h: 340: FAILED assert(0) Is there any related bug? Thanks a lot in advance, Samuel ___ ceph-users mailing listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] adding a new pool causes old pool warning pool x has too few pgs
Weird: After a few hours, health check comes back OK without changing the number of PGS for any pools ! Hi All, To a Healthy cluster I recently added two pools to ceph, 1 replicated and 1 ecpool. Then I made the replicated pool into a cache for the ecpool. Afterwards ceph health check started complaining about a preexisting pool having too few pgs. Previous to adding the new pools there was no warning. Why does adding new pools cause an old pool to have too few pgs? Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS Slow writes with 1MB files
Yes it's the exact same hardware except for the MDS server (although I tried using the MDS on the old node). I have not tried moving the MON back to the old node. My default cache size is mds cache size = 1000 The OSDs (3 of them) have 16 Disks with 4 SSD Journal Disks. I created 2048 for data and metadata: ceph osd pool create cephfs_data 2048 2048 ceph osd pool create cephfs_metadata 2048 2048 To your point on clients competing against each other... how would I check that? Thanks for the input! On Fri, Mar 27, 2015 at 3:04 PM, Gregory Farnum g...@gregs42.com wrote: So this is exactly the same test you ran previously, but now it's on faster hardware and the test is slower? Do you have more data in the test cluster? One obvious possibility is that previously you were working entirely in the MDS' cache, but now you've got more dentries and so it's kicking data out to RADOS and then reading it back in. If you've got the memory (you appear to) you can pump up the mds cache size config option quite dramatically from it's default 10. Other things to check are that you've got an appropriately-sized metadata pool, that you've not got clients competing against each other inappropriately, etc. -Greg On Fri, Mar 27, 2015 at 9:47 AM, Barclay Jameson almightybe...@gmail.com wrote: Opps I should have said that I am not just writing the data but copying it : time cp Small1/* Small2/* Thanks, BJ On Fri, Mar 27, 2015 at 11:40 AM, Barclay Jameson almightybe...@gmail.com wrote: I did a Ceph cluster install 2 weeks ago where I was getting great performance (~= PanFS) where I could write 100,000 1MB files in 61 Mins (Took PanFS 59 Mins). I thought I could increase the performance by adding a better MDS server so I redid the entire build. Now it takes 4 times as long to write the same data as it did before. The only thing that changed was the MDS server. (I even tried moving the MDS back on the old slower node and the performance was the same.) The first install was on CentOS 7. I tried going down to CentOS 6.6 and it's the same results. I use the same scripts to install the OSDs (which I created because I can never get ceph-deploy to behave correctly. Although, I did use ceph-deploy to create the MDS and MON and initial cluster creation.) I use btrfs on the OSDS as I can get 734 MB/s write and 1100 MB/s read with rados bench -p cephfs_data 500 write --no-cleanup rados bench -p cephfs_data 500 seq (xfs was 734 MB/s write but only 200 MB/s read) Could anybody think of a reason as to why I am now getting a huge regression. Hardware Setup: [OSDs] 64 GB 2133 MHz Dual Proc E5-2630 v3 @ 2.40GHz (16 Cores) 40Gb Mellanox NIC [MDS/MON new] 128 GB 2133 MHz Dual Proc E5-2650 v3 @ 2.30GHz (20 Cores) 40Gb Mellanox NIC [MDS/MON old] 32 GB 800 MHz Dual Proc E5472 @ 3.00GHz (8 Cores) 10Gb Intel NIC ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS Slow writes with 1MB files
On Fri, Mar 27, 2015 at 2:46 PM, Barclay Jameson almightybe...@gmail.com wrote: Yes it's the exact same hardware except for the MDS server (although I tried using the MDS on the old node). I have not tried moving the MON back to the old node. My default cache size is mds cache size = 1000 The OSDs (3 of them) have 16 Disks with 4 SSD Journal Disks. I created 2048 for data and metadata: ceph osd pool create cephfs_data 2048 2048 ceph osd pool create cephfs_metadata 2048 2048 To your point on clients competing against each other... how would I check that? Do you have multiple clients mounted? Are they both accessing files in the directory(ies) you're testing? Were they accessing the same pattern of files for the old cluster? If you happen to be running a hammer rc or something pretty new you can use the MDS admin socket to explore a bit what client sessions there are and what they have permissions on and check; otherwise you'll have to figure it out from the client side. -Greg Thanks for the input! On Fri, Mar 27, 2015 at 3:04 PM, Gregory Farnum g...@gregs42.com wrote: So this is exactly the same test you ran previously, but now it's on faster hardware and the test is slower? Do you have more data in the test cluster? One obvious possibility is that previously you were working entirely in the MDS' cache, but now you've got more dentries and so it's kicking data out to RADOS and then reading it back in. If you've got the memory (you appear to) you can pump up the mds cache size config option quite dramatically from it's default 10. Other things to check are that you've got an appropriately-sized metadata pool, that you've not got clients competing against each other inappropriately, etc. -Greg On Fri, Mar 27, 2015 at 9:47 AM, Barclay Jameson almightybe...@gmail.com wrote: Opps I should have said that I am not just writing the data but copying it : time cp Small1/* Small2/* Thanks, BJ On Fri, Mar 27, 2015 at 11:40 AM, Barclay Jameson almightybe...@gmail.com wrote: I did a Ceph cluster install 2 weeks ago where I was getting great performance (~= PanFS) where I could write 100,000 1MB files in 61 Mins (Took PanFS 59 Mins). I thought I could increase the performance by adding a better MDS server so I redid the entire build. Now it takes 4 times as long to write the same data as it did before. The only thing that changed was the MDS server. (I even tried moving the MDS back on the old slower node and the performance was the same.) The first install was on CentOS 7. I tried going down to CentOS 6.6 and it's the same results. I use the same scripts to install the OSDs (which I created because I can never get ceph-deploy to behave correctly. Although, I did use ceph-deploy to create the MDS and MON and initial cluster creation.) I use btrfs on the OSDS as I can get 734 MB/s write and 1100 MB/s read with rados bench -p cephfs_data 500 write --no-cleanup rados bench -p cephfs_data 500 seq (xfs was 734 MB/s write but only 200 MB/s read) Could anybody think of a reason as to why I am now getting a huge regression. Hardware Setup: [OSDs] 64 GB 2133 MHz Dual Proc E5-2630 v3 @ 2.40GHz (16 Cores) 40Gb Mellanox NIC [MDS/MON new] 128 GB 2133 MHz Dual Proc E5-2650 v3 @ 2.30GHz (20 Cores) 40Gb Mellanox NIC [MDS/MON old] 32 GB 800 MHz Dual Proc E5472 @ 3.00GHz (8 Cores) 10Gb Intel NIC ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 0.93 fresh cluster won't create PGs
On Fri, 27 Mar 2015, Robert LeBlanc wrote: I've built Ceph clusters a few times now and I'm completely baffled about what we are seeing. We had a majority of the nodes on a new cluster go down yesterday and we got PGs stuck peering. We checked logs, firewalls, file descriptors, etc and nothing is pointing to what the problem is. We thought we could work around the problem by deleting all the pools and recreating them, but still most of the PGs were in a creating+peering state. Rebooting OSDs, reformatting them, adjusting the CRUSH, etc all proved fruitless. I took min_size and size to 1, tried scrubbing, deep-scrubbing the PGs and OSDs. Nothing seems to get the cluster to progress. As a last ditch effort, we wiped the whole cluster, regenerated UUID, keys, etc and pushed it all through puppet again. After creating the OSDs there are PGs stuck. Here is some info: [ulhglive-root@mon1 ~]# ceph status cluster fa158fa8-3e5d-47b1-a7bc-98a41f510ac0 health HEALTH_WARN 1214 pgs peering 1216 pgs stuck inactive 1216 pgs stuck unclean monmap e2: 3 mons at {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0} election epoch 6, quorum 0,1,2 mon1,mon2,mon3 osdmap e161: 130 osds: 130 up, 130 in pgmap v468: 2048 pgs, 2 pools, 0 bytes data, 0 objects 5514 MB used, 472 TB / 472 TB avail 965 peering 832 active+clean 249 creating+peering 2 activating Usually when we've seen something like this is has been something annoying with the environment, like a broken network that causes the tcp streams to freeze once they start sending significant traffic (e.g., affecting the connections that transpart data but not the ones that handle heartbeats). As you're rebuilding, perhaps the issues start once you hit a particular rack or host? [ulhglive-root@mon1 ~]# ceph health detail | head -n 15 HEALTH_WARN 1214 pgs peering; 1216 pgs stuck inactive; 1216 pgs stuck unclean pg 2.17f is stuck inactive since forever, current state creating+peering, last acting [39,42,77] pg 2.17e is stuck inactive since forever, current state creating+peering, last acting [125,3,110] pg 2.179 is stuck inactive since forever, current state peering, last acting [0] pg 2.178 is stuck inactive since forever, current state creating+peering, last acting [99,120,54] pg 2.17b is stuck inactive since forever, current state peering, last acting [0] pg 2.17a is stuck inactive since forever, current state creating+peering, last acting [91,96,122] pg 2.175 is stuck inactive since forever, current state creating+peering, last acting [55,127,2] pg 2.174 is stuck inactive since forever, current state peering, last acting [0] pg 2.176 is stuck inactive since forever, current state creating+peering, last acting [13,70,8] pg 2.172 is stuck inactive since forever, current state peering, last acting [0] pg 2.16c is stuck inactive for 1344.369455, current state peering, last acting [99,104,85] pg 2.16e is stuck inactive since forever, current state peering, last acting [0] pg 2.169 is stuck inactive since forever, current state creating+peering, last acting [125,24,65] pg 2.16a is stuck inactive since forever, current state peering, last acting [0] Traceback (most recent call last): File /bin/ceph, line 896, in module retval = main() File /bin/ceph, line 883, in main sys.stdout.write(prefix + outbuf + suffix) IOError: [Errno 32] Broken pipe [ulhglive-root@mon1 ~]# ceph pg dump_stuck | head -n 15 ok pg_stat state up up_primary acting acting_primary 2.17f creating+peering[39,42,77] 39 [39,42,77] 39 2.17e creating+peering[125,3,110] 125 [125,3,110] 125 2.179 peering [0] 0 [0] 0 2.178 creating+peering[99,120,54] 99 [99,120,54] 99 2.17b peering [0] 0 [0] 0 2.17a creating+peering[91,96,122] 91 [91,96,122] 91 2.175 creating+peering[55,127,2] 55 [55,127,2] 55 2.174 peering [0] 0 [0] 0 2.176 creating+peering[13,70,8] 13 [13,70,8] 13 2.172 peering [0] 0 [0] 0 2.16c peering [99,104,85] 99 [99,104,85] 99 2.16e peering [0] 0 [0] 0 2.169 creating+peering[125,24,65] 125 [125,24,65] 125 2.16a peering [0] 0 [0] 0 Focusing on 2.17f on OSD 39, I set debugging to 20/20 and am attaching the logs. I've looked through the logs with 20/20 before we toasted the cluster and I couldn't find anything standing out. I have another cluster that is also exhibiting this problem which I'd prefer not to lose the data on. If anything stands out, please let me know. We are going to wipe this cluster again and take more manual steps.
[ceph-users] 0.93 fresh cluster won't create PGs
I've built Ceph clusters a few times now and I'm completely baffled about what we are seeing. We had a majority of the nodes on a new cluster go down yesterday and we got PGs stuck peering. We checked logs, firewalls, file descriptors, etc and nothing is pointing to what the problem is. We thought we could work around the problem by deleting all the pools and recreating them, but still most of the PGs were in a creating+peering state. Rebooting OSDs, reformatting them, adjusting the CRUSH, etc all proved fruitless. I took min_size and size to 1, tried scrubbing, deep-scrubbing the PGs and OSDs. Nothing seems to get the cluster to progress. As a last ditch effort, we wiped the whole cluster, regenerated UUID, keys, etc and pushed it all through puppet again. After creating the OSDs there are PGs stuck. Here is some info: [ulhglive-root@mon1 ~]# ceph status cluster fa158fa8-3e5d-47b1-a7bc-98a41f510ac0 health HEALTH_WARN 1214 pgs peering 1216 pgs stuck inactive 1216 pgs stuck unclean monmap e2: 3 mons at {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0} election epoch 6, quorum 0,1,2 mon1,mon2,mon3 osdmap e161: 130 osds: 130 up, 130 in pgmap v468: 2048 pgs, 2 pools, 0 bytes data, 0 objects 5514 MB used, 472 TB / 472 TB avail 965 peering 832 active+clean 249 creating+peering 2 activating [ulhglive-root@mon1 ~]# ceph health detail | head -n 15 HEALTH_WARN 1214 pgs peering; 1216 pgs stuck inactive; 1216 pgs stuck unclean pg 2.17f is stuck inactive since forever, current state creating+peering, last acting [39,42,77] pg 2.17e is stuck inactive since forever, current state creating+peering, last acting [125,3,110] pg 2.179 is stuck inactive since forever, current state peering, last acting [0] pg 2.178 is stuck inactive since forever, current state creating+peering, last acting [99,120,54] pg 2.17b is stuck inactive since forever, current state peering, last acting [0] pg 2.17a is stuck inactive since forever, current state creating+peering, last acting [91,96,122] pg 2.175 is stuck inactive since forever, current state creating+peering, last acting [55,127,2] pg 2.174 is stuck inactive since forever, current state peering, last acting [0] pg 2.176 is stuck inactive since forever, current state creating+peering, last acting [13,70,8] pg 2.172 is stuck inactive since forever, current state peering, last acting [0] pg 2.16c is stuck inactive for 1344.369455, current state peering, last acting [99,104,85] pg 2.16e is stuck inactive since forever, current state peering, last acting [0] pg 2.169 is stuck inactive since forever, current state creating+peering, last acting [125,24,65] pg 2.16a is stuck inactive since forever, current state peering, last acting [0] Traceback (most recent call last): File /bin/ceph, line 896, in module retval = main() File /bin/ceph, line 883, in main sys.stdout.write(prefix + outbuf + suffix) IOError: [Errno 32] Broken pipe [ulhglive-root@mon1 ~]# ceph pg dump_stuck | head -n 15 ok pg_stat state up up_primary acting acting_primary 2.17f creating+peering[39,42,77] 39 [39,42,77] 39 2.17e creating+peering[125,3,110] 125 [125,3,110] 125 2.179 peering [0] 0 [0] 0 2.178 creating+peering[99,120,54] 99 [99,120,54] 99 2.17b peering [0] 0 [0] 0 2.17a creating+peering[91,96,122] 91 [91,96,122] 91 2.175 creating+peering[55,127,2] 55 [55,127,2] 55 2.174 peering [0] 0 [0] 0 2.176 creating+peering[13,70,8] 13 [13,70,8] 13 2.172 peering [0] 0 [0] 0 2.16c peering [99,104,85] 99 [99,104,85] 99 2.16e peering [0] 0 [0] 0 2.169 creating+peering[125,24,65] 125 [125,24,65] 125 2.16a peering [0] 0 [0] 0 Focusing on 2.17f on OSD 39, I set debugging to 20/20 and am attaching the logs. I've looked through the logs with 20/20 before we toasted the cluster and I couldn't find anything standing out. I have another cluster that is also exhibiting this problem which I'd prefer not to lose the data on. If anything stands out, please let me know. We are going to wipe this cluster again and take more manual steps. ceph-osd.39.log.xz - https://owncloud.leblancnet.us/owncloud/public.php?service=filest=b120a67cc6111ffcba54d2e4cc8a62b5 map.xz - https://owncloud.leblancnet.us/owncloud/public.php?service=filest=df1eecf7d307225b7d43b5c9474561d0 After redoing the cluster again, we started slow. We added one OSD, dropped the pools to min_size=1 and size=1, and the cluster became healthy. We added a second OSD and changed the CRUSH rule to OSD and it became healthy again. We change size=3 and min_size=2. We had puppet add 10 OSDs on one host, and
Re: [ceph-users] ceph -s slow return result
@Kobi Laredo: thank you! It's exactly my problem. # du -sh /var/lib/ceph/mon/ *2.6G * /var/lib/ceph/mon/ # ceph tell mon.a compact compacted leveldb in 10.197506 # du -sh /var/lib/ceph/mon/ *461M*/var/lib/ceph/mon/ Now my ceph -s return result immediately. Maybe monitors' LevelDB store grow so big because i pushed 13 millions file into a bucket (over radosgw). When have extreme large number of files in a bucket, the state of ceph cluster could become unstable? (I'm running Giant) Regards, On Sat, Mar 28, 2015 at 12:57 AM, Kobi Laredo kobi.lar...@dreamhost.com wrote: What's the current health of the cluster? It may help to compact the monitors' LevelDB store if they have grown in size http://www.sebastien-han.fr/blog/2014/10/27/ceph-mon-store-taking-up-a-lot-of-space/ Depends on the size of the mon's store size it may take some time to compact, make sure to do only one at a time. *Kobi Laredo* *Cloud Systems Engineer* | (*408) 409-KOBI* On Fri, Mar 27, 2015 at 10:31 AM, Chu Duc Minh chu.ducm...@gmail.com wrote: All my monitors running. But i deleting pool .rgw.buckets, now having 13 million objects (just test data). The reason that i must delete this pool is my cluster become unstable, and sometimes an OSD down, PG peering, incomplete,... Therefore i must delete this pool to re-stablize my cluster. (radosgw is too slow for delete objects when one of my bucket reachs few million objects). Regards, On Sat, Mar 28, 2015 at 12:23 AM, Gregory Farnum g...@gregs42.com wrote: Are all your monitors running? Usually a temporary hang means that the Ceph client tries to reach a monitor that isn't up, then times out and contacts a different one. I have also seen it just be slow if the monitors are processing so many updates that they're behind, but that's usually on a very unhappy cluster. -Greg On Fri, Mar 27, 2015 at 8:50 AM Chu Duc Minh chu.ducm...@gmail.com wrote: On my CEPH cluster, ceph -s return result quite slow. Sometimes it return result immediately, sometimes i hang few seconds before return result. Do you think this problem (ceph -s slow return) only relate to ceph-mon(s) process? or maybe it relate to ceph-osd(s) too? (i deleting a big bucket, .rgw.buckets, and ceph-osd(s) disk util quite high) Regards, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 0.93 fresh cluster won't create PGs
Thanks, we'll give the gitbuilder packages a shot and report back. Robert LeBlanc Sent from a mobile device please excuse any typos. On Mar 27, 2015 10:03 PM, Sage Weil s...@newdream.net wrote: On Fri, 27 Mar 2015, Robert LeBlanc wrote: I've built Ceph clusters a few times now and I'm completely baffled about what we are seeing. We had a majority of the nodes on a new cluster go down yesterday and we got PGs stuck peering. We checked logs, firewalls, file descriptors, etc and nothing is pointing to what the problem is. We thought we could work around the problem by deleting all the pools and recreating them, but still most of the PGs were in a creating+peering state. Rebooting OSDs, reformatting them, adjusting the CRUSH, etc all proved fruitless. I took min_size and size to 1, tried scrubbing, deep-scrubbing the PGs and OSDs. Nothing seems to get the cluster to progress. As a last ditch effort, we wiped the whole cluster, regenerated UUID, keys, etc and pushed it all through puppet again. After creating the OSDs there are PGs stuck. Here is some info: [ulhglive-root@mon1 ~]# ceph status cluster fa158fa8-3e5d-47b1-a7bc-98a41f510ac0 health HEALTH_WARN 1214 pgs peering 1216 pgs stuck inactive 1216 pgs stuck unclean monmap e2: 3 mons at {mon1= 10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0} election epoch 6, quorum 0,1,2 mon1,mon2,mon3 osdmap e161: 130 osds: 130 up, 130 in pgmap v468: 2048 pgs, 2 pools, 0 bytes data, 0 objects 5514 MB used, 472 TB / 472 TB avail 965 peering 832 active+clean 249 creating+peering 2 activating Usually when we've seen something like this is has been something annoying with the environment, like a broken network that causes the tcp streams to freeze once they start sending significant traffic (e.g., affecting the connections that transpart data but not the ones that handle heartbeats). As you're rebuilding, perhaps the issues start once you hit a particular rack or host? [ulhglive-root@mon1 ~]# ceph health detail | head -n 15 HEALTH_WARN 1214 pgs peering; 1216 pgs stuck inactive; 1216 pgs stuck unclean pg 2.17f is stuck inactive since forever, current state creating+peering, last acting [39,42,77] pg 2.17e is stuck inactive since forever, current state creating+peering, last acting [125,3,110] pg 2.179 is stuck inactive since forever, current state peering, last acting [0] pg 2.178 is stuck inactive since forever, current state creating+peering, last acting [99,120,54] pg 2.17b is stuck inactive since forever, current state peering, last acting [0] pg 2.17a is stuck inactive since forever, current state creating+peering, last acting [91,96,122] pg 2.175 is stuck inactive since forever, current state creating+peering, last acting [55,127,2] pg 2.174 is stuck inactive since forever, current state peering, last acting [0] pg 2.176 is stuck inactive since forever, current state creating+peering, last acting [13,70,8] pg 2.172 is stuck inactive since forever, current state peering, last acting [0] pg 2.16c is stuck inactive for 1344.369455, current state peering, last acting [99,104,85] pg 2.16e is stuck inactive since forever, current state peering, last acting [0] pg 2.169 is stuck inactive since forever, current state creating+peering, last acting [125,24,65] pg 2.16a is stuck inactive since forever, current state peering, last acting [0] Traceback (most recent call last): File /bin/ceph, line 896, in module retval = main() File /bin/ceph, line 883, in main sys.stdout.write(prefix + outbuf + suffix) IOError: [Errno 32] Broken pipe [ulhglive-root@mon1 ~]# ceph pg dump_stuck | head -n 15 ok pg_stat state up up_primary acting acting_primary 2.17f creating+peering[39,42,77] 39 [39,42,77] 39 2.17e creating+peering[125,3,110] 125 [125,3,110] 125 2.179 peering [0] 0 [0] 0 2.178 creating+peering[99,120,54] 99 [99,120,54] 99 2.17b peering [0] 0 [0] 0 2.17a creating+peering[91,96,122] 91 [91,96,122] 91 2.175 creating+peering[55,127,2] 55 [55,127,2] 55 2.174 peering [0] 0 [0] 0 2.176 creating+peering[13,70,8] 13 [13,70,8] 13 2.172 peering [0] 0 [0] 0 2.16c peering [99,104,85] 99 [99,104,85] 99 2.16e peering [0] 0 [0] 0 2.169 creating+peering[125,24,65] 125 [125,24,65] 125 2.16a peering [0] 0 [0] 0 Focusing on 2.17f on OSD 39, I set debugging to 20/20 and am attaching the logs. I've looked through the logs with 20/20 before we toasted the cluster and I