Re: [ceph-users] how to set up disks in the same host
your client writes the file to one osd, and before this osd acknowledges your write request, it ensure that it is copied to other osd(s). I think this behaviour depends on how you configure you POOL: osd pool default min size: Description: Sets the minimum number of written replicas for objects in the pool in order to acknowledge a write operation to the client. If minimum is not met, Ceph will not acknowledge the write to the client. This setting ensures a minimum number of replicas when operating in degraded mode. Cheers, Robert van Leeuwen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Noticing lack of saucyness when installing on Ubuntu (particularly with ceph deploy)
Just noticed that Ubuntu 13.10 (saucy) is still causing failures when attempting to naively install ceph (in particular when using ceph-deploy). Now I know this is pretty easy to work around (e.g s/saucy/raring/ in ceph.list) but it seems highly undesirable to make installing ceph *harder* than it needs to be! Is there any plan to have a saucy flavour in the repos soon? Cheers Mark ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph reliability in large RBD setups
Hi Felix, I've been running similar calculations recently. I've been using this tool from Inktank to calculate RADOS reliabilities with different assumptions: https://github.com/ceph/ceph-tools/tree/master/models/reliability But I've also had similar questions about RBD (or any multi-part files stored in RADOS) -- naively, a file/device stored in N objects would be N times less reliable than a single object. But I hope there's an error in that logic. Cheers, Dan On Sat, Dec 7, 2013 at 4:10 PM, Felix Schüren felix.schue...@hosteurope.de wrote: Hi, I am trying to wrap my head around large RBD-on-RADOS clusters and their reliability and would love some community feedback. Firstly, for the RADOS-only case, reliability for a single object should be (only looking at node failures, assuming a MTTR of 1 day and a node MTBF of 20,000h (~2.3 years)): MTBF 20,000h == annualized failure rate of ~32%, broken down to a daily that means every day there is a ~0,09% chance for a single node to break down (assuming simplistically that daily failure rate = AFR/365) My chance of losing all object-holding nodes at the same time for the single object case is DFR^(number of replica), so: # rep # prob. of total system failure 1 0,089033220% 2 0,79269% 3 0,00071% 4 0,006% (though I think I need to take the number of nodes into question as well - the more nodes, the less likely it becomes that the single object peer nodes will crash simultaneously) that means even on hardware that has a high chance of failure, my single objects (when using 3 replica) should be fine - unsurprisingly, seeing as this is one of the design goals for RADOS. Now, let's take RBD into play. Using sufficiently large disks (assumed 10TB RBD disksize) and the default block size of 4MB, on a 10% filled disk (1TB written) we end up with 1TB/4MB = 250,000 objects. That means that every ceph OSD node participating in that disk's RBD pool has parts of this disk, so every OSD node failure means that this disk (and actually, all RBD disks since pretty much all of the RBD disks will have objects on every node) is now at risk of having blocks lost - my gut tells me there is a much higher risk of data loss for the RBD case vs the single object case, but maybe I am mistaken? Can one of you enlighten me with some probability calculation magic? Probably best to start with plain RADOS, then move into RBD territory. My fear is that really large (3000+ nodes) RBD clusters will become too risky to run, and I would love for someone to dispel my fear with math ;) Kind regards, Felix -- Felix Schüren Senior Infrastructure Architect Host Europe Group - http://www.hosteuropegroup.com/ Mail: felix.schue...@hosteuropegroup.com Tel:+49 2203 1045 7350 Mobile: +49 162 2323 988 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] centos6.4 + libvirt + qemu + rbd/ceph
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 06/12/13 22:56, Dimitri Maziuk wrote: Most servers novadays are re-provisioned even more often, Not where I work they aren't. Fedora release comes with more and more KVM/Libvirt features and resolved issues, so the net effect is positive anyway. Yes, that is the main argument for tracking ubuntu. ;) Just to back this up with a bit more detail - this was all enabled as part of the core Ubuntu distribution since 12.04; you can get up-to-date versions of all components either by using the latest interim release (13.10 for example) or using Ubuntu 12.04 LTS with the Cloud Archive (http://wiki.ubuntu.com/ServerTeam/CloudArchive). OpenStack + Ceph (which relies on all of those features) is something we test with for every upstream commit of OpenStack and for stable updates as well so it gets exercised *alot* *DISTRO PITCH OVER* Cheers James - -- James Page Ubuntu and Debian Developer james.p...@ubuntu.com jamesp...@debian.org -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.15 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJSpaFrAAoJEL/srsug59jDmmcP/juaPVldO7x68fO5/EGJWF9f 0j6NUOkA/auPa/22DiN2AjvZ5KHwFtmXh/+KpEUlYgUylne8371a19DbGYsk4rro y53dYN4Iy5ww9Yo9/yO5BSEu9EWs6G2LuwJ9e5bkP24RC9GUZZzldt/yVMGKwyiE 8nj7cqTt7Jme82XQQQEs+pwYsgXAOz1YcTc8iv2WDyEdXywJJFusn+dWd3G5kPjl daIWrVur42RlQV39tAtJT0VyPwJ5MCSNDdzM3jN9EC9q+Y5QBVGMqobGFAaycJg2 08M65OknRuURBqzI4tuBRtOd/rrClXWrNUL8e9xBRpH8ysF0HbLKQuslV3uG1uSU YgEGp8e3eecoK/nkZHbcNS/RKQrKDWami6NPDu8zZbCwE0p8J81uJLoEznGY1whz r5gt8DoNOkvJC480L9h78yUqFyn8v9ibJnfP5DAn67KuT+zWVQuVH32gXHo1/6UI zX+86ithH9Sw+5LePERf7GDsuo40+z4cqjrnVR5PFkQ4hVtgDcA9NLGjmciNtEJf L9pq/ZrmCqzlwPKh9wq8o9FxdKgwwKj5XAMiwSfEkhxbFtQCh/QyO5JM/bFljF12 aYXeKAaEFZMmacbScqldEun39EpA9fQ5u61uiMA/mP89tU8wxTh1mn4dZxWOa+vv 31BKvlCCNzTBxzgYPbFE =AnG5 -END PGP SIGNATURE- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-deploy best practices
I'm having a play with ceph-deploy after some time away from it (mainly relying on the puppet modules). With a test setup of only two debian testing servers, I do the following: ceph-deploy new host1 host2 ceph-deploy install host1 host2 (installs emperor) ceph-deploy mon create host1 host2 ceph-deploy osd prepare host1:/dev/sda4 host2:/dev/sda4 ceph-deploy osd activate host1:/dev/sda4 host2:/dev/sda4 ceph-deploy mds create host1 host2 Everything is running fine -- copy some files into CephFS, everything it looking great. host1: /etc/init.d/ceph stop osd Still fine. host1: /etc/init.d/ceph stop mds Fails over to the standby mds after a few seconds. Little outage, but to be expected. Everything fine. host1: /etc/init.d/ceph start osd host1: /etc/init.d/ceph start mds Everything recovers, everything is fine. Now, let's do something drastic: host1: reboot host2: reboot Both hosts come back up, but the mds never recovers -- it always says it is replaying. On closer inspection, host2's osd never came back into action. Doing: ceph-deploy osd activate host2:/dev/sda4 fixed the issue, and the mds recovered, as well as the osd now reporting both up and in. Is there something obvious I'm missing? The ceph.conf seemed remarkably empty, do I have to re-deploy the configuration file to the monitors or similar? I've never noticed a problem with puppet deployed hosts, but that manually writes out the ceph.conf as part of the puppet run. Many thanks in advance, Matthew Walster ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] many blocked requests when recovery
Hi, You didn't state what version of ceph or kvm/qemu you're using. I think it wasn't until qemu 1.5.0 (1.4.2+?) that an async patch from inktank was accepted into mainstream which significantly helps in situations like this. If not using that on top of not limiting recovery threads you'll prob. see issues like you describe. Also more nodes make it easier on the entire cluster in case of recovery so it might make sense adding smaller ones if/when you expand it. Cheers, Martin On Tue, Dec 3, 2013 at 7:09 AM, 飞 duron...@qq.com wrote: hello, I'm testing Ceph as storage for KVM virtual machine images, my cluster have 3 mons and 3 data nodes, every data node have 8x2T SATA HDD and 1 SSD for journal. when I shutdown one data node to imitate server fault, the cluster begin to recovery , when under recovery, I can see many blocked requests, and the KVM VMs will be crash (crash as they think their disk is offline), how Can I solve this issue ? any idea ? thank you ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Failed to execute command: ceph-disk list
On Sat, Dec 7, 2013 at 7:17 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 08/12/13 12:14, Mark Kirkwood wrote: I wonder if it might be worth adding a check at the start of either ceph-deploy to look for binaries we are gonna need. ...growl: either ceph-deploy *or ceph-disk* was what I was thinking! Still, this doesn't look quite right. Are you able to reproduce this from scratch? It would be interesting to see the ceph-deploy logs while trying to replicate. We assume that `gdisk` (that provides `sgdisk`) is installed but if that is not the case we need to add a check for it and install it. Hopefully the logs, and maybe some terminal output to see if the package is actually installed would be also useful ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph User Committee Formal Announcement Format
It looks like someone else may have made that change, but of course, that is fine :-) I ran it through a spell checker, and found two mistakes (now corrected) in the pad. There are several people on the pad currently. Best, Nate On 12/9/13 10:36 AM, Loic Dachary l...@dachary.org wrote: On 09/12/2013 01:54, Loic Dachary wrote: On 09/12/2013 00:13, Regola, Nathan (Contractor) wrote: Hi Loic, I made a few changes to the text. Feel free to comment/change it. Better indeed :-) Do you see a way to avoid the repetition of future ? I saw you updated the sentence. It looks like we're ready to advertise :-) http://pad.ceph.com/p/user-committee-announce has been stripped of all comments and ready for a last review. Cheers Cheers Best, Nate On 12/7/13 11:19 AM, Loic Dachary l...@dachary.org wrote: Hi Nathan, I worked some more on the announcement. I feel the structure and the content are good enough. This my humble opinion though, feel free to change, substantially even. Since I'm not a native english speaker and not much of a writer, the quality of the content is not great ;-) http://pad.ceph.com/p/user-committee-announce Cheers On 05/12/2013 16:21, Loic Dachary wrote: Hi Nathan, Here is a very rough draft of the announcement which is going to be released next monday. It is more a discussion starter than a draft. Feel free to modify at will :-) It includes the names and affiliations of all founding members. There may be more in the days to come and I'll add to it when I receive new applications: http://pad.ceph.com/p/user-committee-announce It basically is a re-iteration of what has been said during the past few weeks. I added two sentences about the scope, in a attempt to say that it is not just about companies but also academics, individuals and non profit ( there are no governmental agencies yet). And that it's not just technical and that the legal environment in which Ceph can prosper is something we should also care about (not just software patents but also the endless amendments to copyright law that may be detrimental to Free Software in general ). Not being a native english speaker it's difficult to get it right ;-) As for the personalized version of the announcement for each founding member, I would love to have one to remember this date. The graphics used http://www.slideshare.net/Inktank_Ceph/erasure-codeceph are under a Free Software license and you're welcome to use them if you want. I can send you high resolution versions. Cheers On 02/12/2013 15:52, Regola, Nathan (Contractor) wrote: I'm looking forward to working with everyone involved with the Ceph User Committee (http://wiki.ceph.com/01Planning/02Blueprints/Firefly/Ceph_User_Commi tte e#D etailed_Description). I believe that all of the members of the Ceph User Committee should have received an email from Loic asking them to confirm their organization's interest in being named a founding member. The formal announcement is currently being planned for 10 December and we are working on drafting it. Would members prefer a single general announcement or a personalized announcement? A personalized announcement would probably be something like an automatically generated PDF file containing a letter (with the member's name/affiliation) so that members could distribute it. We are open to suggestions. If you have a preference for a general announcement listing all of the members or a personalized announcement welcoming the user (which obviously could include a list of all members), please reply. Best Regards, Nate Regola ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 1MB/s throughput to 33-ssd test cluster
On Sun, Dec 8, 2013 at 8:33 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: I'd suggest testing the components separately - try to rule out NIC (and switch) issues and SSD performance issues, then when you are sure the bits all go fast individually test how ceph performs again. What make and model of SSD? I'd check that the firmware is up to date (sometimes makes a huge difference). I'm also wondering if you might get better performance by having (say) 7 osds and using 4 of the SSD for journals for them. Thanks, Mark. In my haste, I left out part of a paragraph... probably really a whole paragraph... that contains a pretty crucial detail. I had previously run rados bench on this hardware with some success (24-26MBps throughput w/ 4k blocks). ceph osd bench looks great. iperf on the network looks great. After my last round of testing (with a few aborted rados bench tests), I deleted the pool and recreated it (same name, crush ruleset, pg num, size, etc). That is when I started to notice the degraded performance. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 1MB/s throughput to 33-ssd test cluster
On 12/09/2013 10:06 AM, Greg Poirier wrote: On Sun, Dec 8, 2013 at 8:33 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz mailto:mark.kirkw...@catalyst.net.nz wrote: I'd suggest testing the components separately - try to rule out NIC (and switch) issues and SSD performance issues, then when you are sure the bits all go fast individually test how ceph performs again. What make and model of SSD? I'd check that the firmware is up to date (sometimes makes a huge difference). I'm also wondering if you might get better performance by having (say) 7 osds and using 4 of the SSD for journals for them. Thanks, Mark. In my haste, I left out part of a paragraph... probably really a whole paragraph... that contains a pretty crucial detail. I had previously run rados bench on this hardware with some success (24-26MBps throughput w/ 4k blocks). ceph osd bench looks great. iperf on the network looks great. After my last round of testing (with a few aborted rados bench tests), I deleted the pool and recreated it (same name, crush ruleset, pg num, size, etc). That is when I started to notice the degraded performance. Definitely sounds like something is mucked up! With 32 concurrent threads you aren't going to be saturating 33 SSDs, but you should be doing far better than 1MB/s! Basically what you should expect to see is something like 30-80MB/s of throughput (maybe higher with reads) all of the CPU cores consumed, and CPU being the limiting factor (at least for now! This is an area we are actively working on right now). Usually completely disabling logging helps, but it sounds like you've got something else going on for sure. Certainly fixing the clock skew mentioned in your original email wouldn't hurt. Also, with 33 SSDs I'd try to shoot for something like 4096 or maybe 8192 PGs. I'd suggest testing a pool with no replication to start out. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 1MB/s throughput to 33-ssd test cluster
What SSDs are you using, and is there any under-provisioning on them? On 2013-12-09 16:06, Greg Poirier wrote: On Sun, Dec 8, 2013 at 8:33 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: I'd suggest testing the components separately - try to rule out NIC (and switch) issues and SSD performance issues, then when you are sure the bits all go fast individually test how ceph performs again. What make and model of SSD? I'd check that the firmware is up to date (sometimes makes a huge difference). I'm also wondering if you might get better performance by having (say) 7 osds and using 4 of the SSD for journals for them. Thanks, Mark. In my haste, I left out part of a paragraph... probably really a whole paragraph... that contains a pretty crucial detail. I had previously run rados bench on this hardware with some success (24-26MBps throughput w/ 4k blocks). ceph osd bench looks great. iperf on the network looks great. After my last round of testing (with a few aborted rados bench tests), I deleted the pool and recreated it (same name, crush ruleset, pg num, size, etc). That is when I started to notice the degraded performance. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy best practices
This is a similar issue that we ran into, the root cause was that ceph-deploy doesn't set the partition type guid (that is used to auto activate the volume) on an existing partition. Setting this beforehand while pre-creating the partition is a must or you have you put entries in fstab. On Mon, Dec 9, 2013 at 8:11 AM, Alfredo Deza alfredo.d...@inktank.comwrote: On Mon, Dec 9, 2013 at 6:49 AM, Matthew Walster matt...@walster.org wrote: I'm having a play with ceph-deploy after some time away from it (mainly relying on the puppet modules). With a test setup of only two debian testing servers, I do the following: ceph-deploy new host1 host2 ceph-deploy install host1 host2 (installs emperor) ceph-deploy mon create host1 host2 ceph-deploy osd prepare host1:/dev/sda4 host2:/dev/sda4 ceph-deploy osd activate host1:/dev/sda4 host2:/dev/sda4 ceph-deploy mds create host1 host2 Everything is running fine -- copy some files into CephFS, everything it looking great. host1: /etc/init.d/ceph stop osd Still fine. host1: /etc/init.d/ceph stop mds Fails over to the standby mds after a few seconds. Little outage, but to be expected. Everything fine. host1: /etc/init.d/ceph start osd host1: /etc/init.d/ceph start mds Everything recovers, everything is fine. Now, let's do something drastic: host1: reboot host2: reboot Both hosts come back up, but the mds never recovers -- it always says it is replaying. That is something I would not expect having deployed with ceph-deploy. On closer inspection, host2's osd never came back into action. Doing: ceph-deploy osd activate host2:/dev/sda4 fixed the issue, and the mds recovered, as well as the osd now reporting both up and in. Is there something obvious I'm missing? The ceph.conf seemed remarkably empty, do I have to re-deploy the configuration file to the monitors or similar? ceph-deploy doesn't create specific entries for mon/mds/osd/'s I think it barely adds something in the global section for the mon initial members So that is actually normal ceph-deploy behavior. I've never noticed a problem with puppet deployed hosts, but that manually writes out the ceph.conf as part of the puppet run. Are you able to reproduce this in a different host from scratch? I just tried on a CentOS 6.4 box and everything came back after a reboot. It would also be very helpful to have all the output from ceph-deploy as you try to reproduce this behavior. Many thanks in advance, Matthew Walster ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- If google has done it, Google did it right! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy best practices
On 9 December 2013 16:26, Andrew Woodward xar...@gmail.com wrote: This is a similar issue that we ran into, the root cause was that ceph-deploy doesn't set the partition type guid (that is used to auto activate the volume) on an existing partition. Setting this beforehand while pre-creating the partition is a must or you have you put entries in fstab. What should it be? root@host1:~# parted /dev/sda GNU Parted 2.3 Using /dev/sda Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Model: ATA ST2000DM001-9YN1 (scsi) Disk /dev/sda: 2000GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Number Start End SizeFile system Name Flags 1 20.5kB 1049kB 1029kB primary bios_grub 2 2097kB 21.0GB 21.0GB ext4primary 3 21.0GB 21.5GB 536MB linux-swap(v1) primary 4 21.5GB 2000GB 1979GB xfs primary Matthew ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Washington DC area: Ceph users meetup, 12/18
Hi folks, I know it's short notice, but we have recently formed a Ceph users meetup group in the DC area. We have our first meetup on 12/18. We should have more notice before the next one, so please join the meetup group, even if you can't make this one! http://www.meetup.com/Ceph-DC/events/154304092/ -- Warren Wang Comcast PE Operations, Platform Infrastructure Office:703-939-8445 Mobile: 703-598-1643 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how to set up disks in the same host
On Mon, Dec 9, 2013 at 1:17 AM, Robert van Leeuwen robert.vanleeu...@spilgames.com wrote: your client writes the file to one osd, and before this osd acknowledges your write request, it ensure that it is copied to other osd(s). I think this behaviour depends on how you configure you POOL: osd pool default min size: Description: Sets the minimum number of written replicas for objects in the pool in order to acknowledge a write operation to the client. If minimum is not met, Ceph will not acknowledge the write to the client. This setting ensures a minimum number of replicas when operating in degraded mode. Oh dear; that description is worded a bit unfortunately. What it actually means is that a PG will not go to active in the first place if it does not have the min size number of copies. We should change that! -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mount error 12 = Cannot allocate memory
On Wed, Dec 4, 2013 at 7:15 AM, Mr.Salvatore Rapisarda salvor...@yahoo.it wrote: Hi, i have a ceph cluster with 3 nodes on Ubuntu 12.04.3 LTS and ceph version 0.72.1 My configuration is the follow: * 3 MON - XRVCLNOSTK001=10.170.0.110 - XRVCLNOSTK002=10.170.0.111 - XRVOSTKMNG001=10.170.0.112 * 3 OSD - XRVCLNOSTK001=10.170.0.110 - XRVCLNOSTK002=10.170.0.111 - XRVOSTKMNG001=10.170.0.112 * 1 MDS - XRVCLNOSTK001=10.170.0.110 All it's ok... -#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-# root@XRVOSTKMNG001:/mnt# ceph -s cluster b53078ff-2cd3-4c8f-ad23-16476658e4a0 health HEALTH_OK monmap e2: 3 mons at {XRVCLNOSTK001=10.170.0.110:6789/0,XRVCLNOSTK002=10.170.0.111:6789/0,XRVOSTKMNG001=10.170.0.112:6789/0}, election epoch 54, quorum 0,1,2 XRVCLNOSTK001,XRVCLNOSTK002,XRVOSTKMNG001 mdsmap e10: 1/1/1 up {0=XRVCLNOSTK001=up:active} osdmap e62: 3 osds: 3 up, 3 in pgmap v8375: 448 pgs, 5 pools, 716 MB data, 353 objects 6033 MB used, 166 GB / 172 GB avail 448 active+clean -#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-# If i try to mount cephfs on first node 10.170.0.112, used for cluster deploy process, there is no problem. But if i try to mount cephfs on second node 10.170.0.110 or third node 10.170.0.111 i have a mount error 12 = cannot allocate memory -#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-# root@XRVCLNOSTK002:/mnt# mount -t ceph XRVCLNOSTK001:6789:/ /mnt/nova -o name=admin,secret=my_secret_key mount error 12 = Cannot allocate memory -#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-# Any idea? :) I think everywhere the kernel client uses ENOMEM it means exactly that — it failed to allocate memory for something. I'd check your memory situation on that host, and see if you can reproduce it elsewhere. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy best practices
https://github.com/ceph/ceph/blob/master/udev/95-ceph-osd.rules Lists the 4 variants, in your case it sounds like a normal ceph volume so the guid you want is probably 4fbd7e29-9d25-41b8-afd0-062c0ceff05d. You will need sgdisk to set the guid correctly (part of gdisk) from man -t, --typecode=partnum:{hexcode|GUID} Change a single partition's type code. You enter the type code using either a two-byte hexadecimal number, as described ear‐ lier, or a fully-specifiedGUIDvalue,such as EBD0A0A2-B9E5-4433-87C0-68B6B72699C7. your exec should look like sgdisk --typecode=4:4fbd7e29-9d25-41b8-afd0-062c0ceff05d /dev/sda On Mon, Dec 9, 2013 at 8:32 AM, Matthew Walster matt...@walster.org wrote: On 9 December 2013 16:26, Andrew Woodward xar...@gmail.com wrote: This is a similar issue that we ran into, the root cause was that ceph-deploy doesn't set the partition type guid (that is used to auto activate the volume) on an existing partition. Setting this beforehand while pre-creating the partition is a must or you have you put entries in fstab. What should it be? root@host1:~# parted /dev/sda GNU Parted 2.3 Using /dev/sda Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Model: ATA ST2000DM001-9YN1 (scsi) Disk /dev/sda: 2000GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Number Start End SizeFile system Name Flags 1 20.5kB 1049kB 1029kB primary bios_grub 2 2097kB 21.0GB 21.0GB ext4primary 3 21.0GB 21.5GB 536MB linux-swap(v1) primary 4 21.5GB 2000GB 1979GB xfs primary Matthew -- If google has done it, Google did it right! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Blocked requests during and after CephFS delete
[ Re-added the list since I don't have log files. ;) ] On Mon, Dec 9, 2013 at 5:52 AM, Oliver Schulz osch...@mpp.mpg.de wrote: Hi Greg, I'll send this privately, maybe better not to post log-files, etc. to the list. :-) Nobody's reported it before, but I think the CephFS MDS is sending out too many delete requests. [...] That's all speculation on my part though; can you go sample the slow requests and see what their makeup looked like? Do you have logs from the MDS or OSDs during that time period? Uh - how do I sample the requests? I believe the slow requests should have been logged in the monitor's central log. That's a file sitting in the mon directory, and is probably accessible via other means I can't think of off-hand. Go see if it describes what the slow OSD requests are (eg, are they a bunch of MDS deletes with some other stuff sprinkled in, or all other stuff, or whatever). Concerning logs - you mean the regular ceph daemon log files? Sure - I'm attaching a tarball of all daemon logs from the relevant time interval (please don't publish them ;-) ). It's 13.2 MB, I hope it goes through by email. I also dumped ceph health every minute during the test. * 15:34:34 to 15:48:37 is the effect from my first mass delete. I aborted that one before it could finished, to see if emperor would to better By abort, you mean you stopped deleting all the things you intended to? snip -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy best practices
On 9 December 2013 17:35, Andrew Woodward xar...@gmail.com wrote: https://github.com/ceph/ceph/blob/master/udev/95-ceph-osd.rules Lists the 4 variants, in your case it sounds like a normal ceph volume so the guid you want is probably 4fbd7e29-9d25-41b8-afd0-062c0ceff05d. You will need sgdisk to set the guid correctly (part of gdisk) from man -t, --typecode=partnum:{hexcode|GUID} Change a single partition's type code. You enter the type code using either a two-byte hexadecimal number, as described ear‐ lier, or a fully-specifiedGUIDvalue,such as EBD0A0A2-B9E5-4433-87C0-68B6B72699C7. your exec should look like sgdisk --typecode=4:4fbd7e29-9d25-41b8-afd0-062c0ceff05d /dev/sda Wow! That's well hidden! I've only had a cursory look at ceph-deploy's underlying code -- is this a feature missing from there, or is there a reason it's left out of the osd prepare phase? It would be good to either document this on the ceph-deploy quickstart page or incorporate it into ceph-deploy. I can confirm sgdisk is in the path on login. Matthew ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy best practices
@Alfredo - Is this something that ceph-deploy should do // or warn about? or should we fix ceph-disk so that it set's the part guid on existing partations? On Mon, Dec 9, 2013 at 9:44 AM, Matthew Walster matt...@walster.org wrote: On 9 December 2013 17:35, Andrew Woodward xar...@gmail.com wrote: https://github.com/ceph/ceph/blob/master/udev/95-ceph-osd.rules Lists the 4 variants, in your case it sounds like a normal ceph volume so the guid you want is probably 4fbd7e29-9d25-41b8-afd0-062c0ceff05d. You will need sgdisk to set the guid correctly (part of gdisk) from man -t, --typecode=partnum:{hexcode|GUID} Change a single partition's type code. You enter the type code using either a two-byte hexadecimal number, as described ear‐ lier, or a fully-specifiedGUIDvalue,such as EBD0A0A2-B9E5-4433-87C0-68B6B72699C7. your exec should look like sgdisk --typecode=4:4fbd7e29-9d25-41b8-afd0-062c0ceff05d /dev/sda Wow! That's well hidden! I've only had a cursory look at ceph-deploy's underlying code -- is this a feature missing from there, or is there a reason it's left out of the osd prepare phase? It would be good to either document this on the ceph-deploy quickstart page or incorporate it into ceph-deploy. I can confirm sgdisk is in the path on login. Matthew -- If google has done it, Google did it right! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy best practices
Matthew, I'll flag this for future doc changes. I noticed that you didn't run ceph-deploy gatherkeys after creating your monitor(s). Any reason for that omission? On Mon, Dec 9, 2013 at 3:49 AM, Matthew Walster matt...@walster.org wrote: I'm having a play with ceph-deploy after some time away from it (mainly relying on the puppet modules). With a test setup of only two debian testing servers, I do the following: ceph-deploy new host1 host2 ceph-deploy install host1 host2 (installs emperor) ceph-deploy mon create host1 host2 ceph-deploy osd prepare host1:/dev/sda4 host2:/dev/sda4 ceph-deploy osd activate host1:/dev/sda4 host2:/dev/sda4 ceph-deploy mds create host1 host2 Everything is running fine -- copy some files into CephFS, everything it looking great. host1: /etc/init.d/ceph stop osd Still fine. host1: /etc/init.d/ceph stop mds Fails over to the standby mds after a few seconds. Little outage, but to be expected. Everything fine. host1: /etc/init.d/ceph start osd host1: /etc/init.d/ceph start mds Everything recovers, everything is fine. Now, let's do something drastic: host1: reboot host2: reboot Both hosts come back up, but the mds never recovers -- it always says it is replaying. On closer inspection, host2's osd never came back into action. Doing: ceph-deploy osd activate host2:/dev/sda4 fixed the issue, and the mds recovered, as well as the osd now reporting both up and in. Is there something obvious I'm missing? The ceph.conf seemed remarkably empty, do I have to re-deploy the configuration file to the monitors or similar? I've never noticed a problem with puppet deployed hosts, but that manually writes out the ceph.conf as part of the puppet run. Many thanks in advance, Matthew Walster ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- John Wilkins Senior Technical Writer Intank john.wilk...@inktank.com (415) 425-9599 http://inktank.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy best practices
John, Good catch. I did run it, but missed it when reviewing my actions for this post. Matthew On 9 Dec 2013 18:24, John Wilkins john.wilk...@inktank.com wrote: Matthew, I'll flag this for future doc changes. I noticed that you didn't run ceph-deploy gatherkeys after creating your monitor(s). Any reason for that omission? On Mon, Dec 9, 2013 at 3:49 AM, Matthew Walster matt...@walster.orgwrote: I'm having a play with ceph-deploy after some time away from it (mainly relying on the puppet modules). With a test setup of only two debian testing servers, I do the following: ceph-deploy new host1 host2 ceph-deploy install host1 host2 (installs emperor) ceph-deploy mon create host1 host2 ceph-deploy osd prepare host1:/dev/sda4 host2:/dev/sda4 ceph-deploy osd activate host1:/dev/sda4 host2:/dev/sda4 ceph-deploy mds create host1 host2 Everything is running fine -- copy some files into CephFS, everything it looking great. host1: /etc/init.d/ceph stop osd Still fine. host1: /etc/init.d/ceph stop mds Fails over to the standby mds after a few seconds. Little outage, but to be expected. Everything fine. host1: /etc/init.d/ceph start osd host1: /etc/init.d/ceph start mds Everything recovers, everything is fine. Now, let's do something drastic: host1: reboot host2: reboot Both hosts come back up, but the mds never recovers -- it always says it is replaying. On closer inspection, host2's osd never came back into action. Doing: ceph-deploy osd activate host2:/dev/sda4 fixed the issue, and the mds recovered, as well as the osd now reporting both up and in. Is there something obvious I'm missing? The ceph.conf seemed remarkably empty, do I have to re-deploy the configuration file to the monitors or similar? I've never noticed a problem with puppet deployed hosts, but that manually writes out the ceph.conf as part of the puppet run. Many thanks in advance, Matthew Walster ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- John Wilkins Senior Technical Writer Intank john.wilk...@inktank.com (415) 425-9599 http://inktank.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy best practices
On Mon, Dec 9, 2013 at 12:54 PM, Andrew Woodward xar...@gmail.com wrote: @Alfredo - Is this something that ceph-deploy should do // or warn about? or should we fix ceph-disk so that it set's the part guid on existing partations? This looks like an omission on our end. I've created http://tracker.ceph.com/issues/6955 to track this and make sure its fixed. On Mon, Dec 9, 2013 at 9:44 AM, Matthew Walster matt...@walster.org wrote: On 9 December 2013 17:35, Andrew Woodward xar...@gmail.com wrote: https://github.com/ceph/ceph/blob/master/udev/95-ceph-osd.rules Lists the 4 variants, in your case it sounds like a normal ceph volume so the guid you want is probably 4fbd7e29-9d25-41b8-afd0-062c0ceff05d. You will need sgdisk to set the guid correctly (part of gdisk) from man -t, --typecode=partnum:{hexcode|GUID} Change a single partition's type code. You enter the type code using either a two-byte hexadecimal number, as described ear‐ lier, or a fully-specifiedGUIDvalue,such as EBD0A0A2-B9E5-4433-87C0-68B6B72699C7. your exec should look like sgdisk --typecode=4:4fbd7e29-9d25-41b8-afd0-062c0ceff05d /dev/sda Wow! That's well hidden! I've only had a cursory look at ceph-deploy's underlying code -- is this a feature missing from there, or is there a reason it's left out of the osd prepare phase? It would be good to either document this on the ceph-deploy quickstart page or incorporate it into ceph-deploy. I can confirm sgdisk is in the path on login. Matthew -- If google has done it, Google did it right! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] The Ceph User Committee is born
The founding members of the Ceph User Committee (see below) are pleased to announce its creation as of December 10th, 2013. We are actively engaged in organizing meetups, collecting use cases, and more. Any Ceph user is welcome to join, simply by sending an email to our mailing list (ceph-commun...@lists.ceph.com). Ceph promises to be a major force in the future of storage and the intention is that the Ceph Community will be an integral part of that. The Ceph User Committee is founded by people and organizations from all backgrounds to discover innovative ways to leverage Ceph to its full potential. Its home is http://ceph.com/ and the founding members agreed on the following mission statement * Organize meetups, events, booth, talks... :Meeting in person is a useful way to exchange experience and tips and to discover new ways to use Ceph. The User Committee helps with the logistics. It is also responsible for maintaining a calendar of upcoming events. * Collect use cases and user stories : A diverse collection of existing use cases is a precious source of inspiration for the community. The User Committee approaches users and conducts interviews. A well articulated user story that describes how Ceph could be used in the future is an effective way for developers to figure out what should be included in the roadmap. The User Committee will organize user stories published in a wiki and sort them according to their popularity. * Focus on the Free and Open Source ecosystem : Ceph is included in proprietary solutions used by a growing number of companies to run successful commercial activities. Their ecosystem is not in the scope of the User Committee. The User Committee's focus is on the Open standards, formats, and the Free and Open Source software that can be assembled with Ceph and allow users to freely use, modify, distribute and study the code. The Ceph User Committee is a nonprofit initiative and favors diversity of all types. The founding members include industry leaders, academics, research organizations, SME, NGOs, and individuals from around the world. It will concern itself with all matters that may hinder the widespread adoption of Ceph by the general public, including technical, operational, and legal concerns. Any individual is welcome to join the Ceph User Committee : the only requirement is to agree on the mission statement. Volunteer participation is encouraged and a number of the founding members agreed to contribute by organizing meetups, editing ceph.com, etc. If the organization employing a member of the Ceph User Committee also agrees to the mission statement, it will be listed next to the member name in the directory. It does not cost anything to become a member (just ask to become a member by sending an email to the list at ceph-commun...@lists.ceph.com). The bureaucracy is kept to a minimum so that the Ceph User Committee can grow organically from the good will and the needs of its members. In six months from now an election will be organized to distribute roles and responsibilities. And, it will also probably be an appropriate time to establish more formal rules and a more stable mission statement. Loic Dachary l...@dachary.org Acting head of the committee, in charge of organizing the election of the board in may 2014. Li Wang (Kylin) liw...@ubuntukylin.com Meetup Changsha, China Jiangang Duan (Intel) jiangang.d...@intel.com Meetup Shangai, China Patrick McGarry (Inktank) patr...@inktank.com Kindly agreed to assist with all details that will help the User Committee grow and prosper. Ross Turk (Inktank) r...@inktank.com Can help with HTML/CSS, WordPress needs, general infrastructure, swag, coordination Marie-Claude Lanchantin (Cloudwatt) marie-claude.lanchan...@cloudwatt.com Logistics, from Meetup organization to shipping goodies for events and conferences. Nathan Regola (Comcast) nathan_reg...@cable.comcast.com Aaron Ten Clay aaro...@aarontc.com http://ceph.com/ editors Peter Matulis peter.matu...@canonical.com IRC operator Warren Wang (Comcast) warren_w...@cable.comcast.com Meetup Washington DC, USA Joao Luis jecl...@gmail.com Meetup Lisbon, Portugal Sebastien Han (eNovance) sebastien@enovance.com ceph-brag Ricardo Rocha (Catalyst) rica...@catalyst.net.nz David Clarke (Catalyst) dav...@catalyst.net.nz Meetup Wellington / Auckland, New Zealand Eric Mourgaya-Virapatrin (Crédit Mutuel Arkea) eric.mourgaya-virapat...@arkea.com Global events coordination Meetup France Paul Gadi (Acaleph) p...@acale.ph Meetup Philipines Kurt Bauer (ACOnet/VIX) kurt.ba...@univie.ac.at Use Cases Meetup Vienna, Austria Robert Sander (Heinlein Support GmbH) r.san...@heinlein-support.de Stephan Seitz (Heinlein Support GmbH) s.se...@heinlein-support.de Meetups Berlin, Germany Jens-Christian Fischer (SWITCH) jens-christian.fisc...@switch.ch Simon Leinen (SWITCH) simon.lei...@switch.ch Meetups
Re: [ceph-users] many meta files in osd
Is there any posibility to remove this meta files? (whithout recreate cluster) Files names: {path}.bucket.meta.test1:default.4110.{sequence number}__head_... -- Regards Dominik 2013/12/8 Dominik Mostowiec dominikmostow...@gmail.com: Hi, My api app to put files to s3/ceph checks if bucket exists by create this bucket. Each bucket create command adds 2 meta files. - root@vm-1:/vol0/ceph/osd# find | grep meta | grep test1 | wc -l 44 root@vm-1:/vol0/ceph/osd# s3 -u create test1 Bucket successfully created. root@vm-1:/vol0/ceph/osd# find | grep meta | grep test1 | wc -l 46 - Unfortunately: - root@vm-1:/vol0/ceph/osd# s3 -u delete test1 root@vm-1:/vol0/ceph/osd# find | grep meta | grep test1 | wc -l 46 - Is there some way to remove this meta files from ceph? -- Regards Dominik -- Pozdrawiam Dominik ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Anybody doing Ceph for OpenStack with OSDs across compute/hypervisor nodes?
We're running OpenStack (KVM) with local disk for ephemeral storage. Currently we use local RAID10 arrays of 10k SAS drives, so we're quite rich for IOPS and have 20GE across the board. Some recent patches in OpenStack Havana make it possible to use Ceph RBD as the source of ephemeral VM storage, so I'm interested in the potential for clustered storage across our hypervisors for this purpose. Any experience out there? -- Cheers, ~Blairo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Anybody doing Ceph for OpenStack with OSDs across compute/hypervisor nodes?
We're running OpenStack (KVM) with local disk for ephemeral storage. Currently we use local RAID10 arrays of 10k SAS drives, so we're quite rich for IOPS and have 20GE across the board. Some recent patches in OpenStack Havana make it possible to use Ceph RBD as the source of ephemeral VM storage, so I'm interested in the potential for clustered storage across our hypervisors for this purpose. Any experience out there? I believe Piston converges their storage/compute, they refer to it as a null-tier architecture. http://www.pistoncloud.com/openstack-cloud-software/technology/#storage -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com