Re: [ceph-users] meet up in shanghai? or user group in China?
Hi, I updated http://pad.ceph.com/p/user-committee-announce to display that you're involved in organizing Shangai based meetups. Hopefully this is displayed properly. If not, feel free to update or let me know and I'll do it for you. Cheers On 26/11/2013 03:59, jiangang duan wrote: After talking with Sage, Ross, Patrick and Loic, I am thinking to build up some Ceph user group in China - for Ceph developer/user to talk, learn and have fun together - and promote Ceph in China. Anybody in the lists are interested in this? please drop me a mail for further discussion. I can arrange some in Shanghai - (if you guys are OK, we can use meeting room in intel office with snack provide) or we can pick up some industry forum to gather together. -jiangang -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Emergency! Production Cluster is down
Hello Howie, Is your cluster still down? If you have a support contract with us please make sure to submit a support ticket so that our professional services team sees it. If not, I'd suggest looking through the logs on the hosts that have remaining monitors and seeing if they say anything. You can also set debug mon = 20 in your ceph.conf file and restart the mons to get more debugging info. Mark On 12/08/2013 12:39 AM, Mark Kirkwood wrote: On 08/12/13 19:28, Howie C. wrote: Hello Guys, Tonight when I was trying to remove 2 monitors from the production cluster, everything seems fine but all the sudden I cannot connect to the cluster no more, showing root@mon01:~# ceph mon dump 2013-12-07 22:24:57.693246 7f7ee21cc700 0 monclient(hunting): authenticate timed out after 300 2013-12-07 22:24:57.693291 7f7ee21cc700 0 librados: client.admin authentication error (110) Connection timed out Error connecting to cluster: TimedOut I tried to call Intank, but no ones there. Any suggestions? Please help! How many monitors did you have (before removing the 2)? Check that your ceph.conf on the host where you are running the mon dump has them all listed (otherwise use the -m switch to specify own you know is still there)! It might be that the remaining ones are just taking a few moments to decide on a quorum. Regards Mark ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph reliability in large RBD setups
Hi, I am trying to wrap my head around large RBD-on-RADOS clusters and their reliability and would love some community feedback. Firstly, for the RADOS-only case, reliability for a single object should be (only looking at node failures, assuming a MTTR of 1 day and a node MTBF of 20,000h (~2.3 years)): MTBF 20,000h == annualized failure rate of ~32%, broken down to a daily that means every day there is a ~0,09% chance for a single node to break down (assuming simplistically that daily failure rate = AFR/365) My chance of losing all object-holding nodes at the same time for the single object case is DFR^(number of replica), so: # rep # prob. of total system failure 1 0,089033220% 2 0,79269% 3 0,00071% 4 0,006% (though I think I need to take the number of nodes into question as well - the more nodes, the less likely it becomes that the single object peer nodes will crash simultaneously) that means even on hardware that has a high chance of failure, my single objects (when using 3 replica) should be fine - unsurprisingly, seeing as this is one of the design goals for RADOS. Now, let's take RBD into play. Using sufficiently large disks (assumed 10TB RBD disksize) and the default block size of 4MB, on a 10% filled disk (1TB written) we end up with 1TB/4MB = 250,000 objects. That means that every ceph OSD node participating in that disk's RBD pool has parts of this disk, so every OSD node failure means that this disk (and actually, all RBD disks since pretty much all of the RBD disks will have objects on every node) is now at risk of having blocks lost - my gut tells me there is a much higher risk of data loss for the RBD case vs the single object case, but maybe I am mistaken? Can one of you enlighten me with some probability calculation magic? Probably best to start with plain RADOS, then move into RBD territory. My fear is that really large (3000+ nodes) RBD clusters will become too risky to run, and I would love for someone to dispel my fear with math ;) Kind regards, Felix -- Felix Schüren Senior Infrastructure Architect Host Europe Group - http://www.hosteuropegroup.com/ Mail: felix.schue...@hosteuropegroup.com Tel:+49 2203 1045 7350 Mobile: +49 162 2323 988 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] many meta files in osd
Hi, My api app to put files to s3/ceph checks if bucket exists by create this bucket. Each bucket create command adds 2 meta files. - root@vm-1:/vol0/ceph/osd# find | grep meta | grep test1 | wc -l 44 root@vm-1:/vol0/ceph/osd# s3 -u create test1 Bucket successfully created. root@vm-1:/vol0/ceph/osd# find | grep meta | grep test1 | wc -l 46 - Unfortunately: - root@vm-1:/vol0/ceph/osd# s3 -u delete test1 root@vm-1:/vol0/ceph/osd# find | grep meta | grep test1 | wc -l 46 - Is there some way to remove this meta files from ceph? -- Regards Dominik ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph User Committee Formal Announcement Format
On 09/12/2013 00:13, Regola, Nathan (Contractor) wrote: Hi Loic, I made a few changes to the text. Feel free to comment/change it. Better indeed :-) Do you see a way to avoid the repetition of future ? Cheers Best, Nate On 12/7/13 11:19 AM, Loic Dachary l...@dachary.org wrote: Hi Nathan, I worked some more on the announcement. I feel the structure and the content are good enough. This my humble opinion though, feel free to change, substantially even. Since I'm not a native english speaker and not much of a writer, the quality of the content is not great ;-) http://pad.ceph.com/p/user-committee-announce Cheers On 05/12/2013 16:21, Loic Dachary wrote: Hi Nathan, Here is a very rough draft of the announcement which is going to be released next monday. It is more a discussion starter than a draft. Feel free to modify at will :-) It includes the names and affiliations of all founding members. There may be more in the days to come and I'll add to it when I receive new applications: http://pad.ceph.com/p/user-committee-announce It basically is a re-iteration of what has been said during the past few weeks. I added two sentences about the scope, in a attempt to say that it is not just about companies but also academics, individuals and non profit ( there are no governmental agencies yet). And that it's not just technical and that the legal environment in which Ceph can prosper is something we should also care about (not just software patents but also the endless amendments to copyright law that may be detrimental to Free Software in general ). Not being a native english speaker it's difficult to get it right ;-) As for the personalized version of the announcement for each founding member, I would love to have one to remember this date. The graphics used http://www.slideshare.net/Inktank_Ceph/erasure-codeceph are under a Free Software license and you're welcome to use them if you want. I can send you high resolution versions. Cheers On 02/12/2013 15:52, Regola, Nathan (Contractor) wrote: I'm looking forward to working with everyone involved with the Ceph User Committee (http://wiki.ceph.com/01Planning/02Blueprints/Firefly/Ceph_User_Committe e#D etailed_Description). I believe that all of the members of the Ceph User Committee should have received an email from Loic asking them to confirm their organization's interest in being named a founding member. The formal announcement is currently being planned for 10 December and we are working on drafting it. Would members prefer a single general announcement or a personalized announcement? A personalized announcement would probably be something like an automatically generated PDF file containing a letter (with the member's name/affiliation) so that members could distribute it. We are open to suggestions. If you have a preference for a general announcement listing all of the members or a personalized announcement welcoming the user (which obviously could include a list of all members), please reply. Best Regards, Nate Regola ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph User Committee Formal Announcement Format
Hi Loic, I made a few changes to the text. Feel free to comment/change it. Best, Nate On 12/7/13 11:19 AM, Loic Dachary l...@dachary.org wrote: Hi Nathan, I worked some more on the announcement. I feel the structure and the content are good enough. This my humble opinion though, feel free to change, substantially even. Since I'm not a native english speaker and not much of a writer, the quality of the content is not great ;-) http://pad.ceph.com/p/user-committee-announce Cheers On 05/12/2013 16:21, Loic Dachary wrote: Hi Nathan, Here is a very rough draft of the announcement which is going to be released next monday. It is more a discussion starter than a draft. Feel free to modify at will :-) It includes the names and affiliations of all founding members. There may be more in the days to come and I'll add to it when I receive new applications: http://pad.ceph.com/p/user-committee-announce It basically is a re-iteration of what has been said during the past few weeks. I added two sentences about the scope, in a attempt to say that it is not just about companies but also academics, individuals and non profit ( there are no governmental agencies yet). And that it's not just technical and that the legal environment in which Ceph can prosper is something we should also care about (not just software patents but also the endless amendments to copyright law that may be detrimental to Free Software in general ). Not being a native english speaker it's difficult to get it right ;-) As for the personalized version of the announcement for each founding member, I would love to have one to remember this date. The graphics used http://www.slideshare.net/Inktank_Ceph/erasure-codeceph are under a Free Software license and you're welcome to use them if you want. I can send you high resolution versions. Cheers On 02/12/2013 15:52, Regola, Nathan (Contractor) wrote: I'm looking forward to working with everyone involved with the Ceph User Committee (http://wiki.ceph.com/01Planning/02Blueprints/Firefly/Ceph_User_Committe e#D etailed_Description). I believe that all of the members of the Ceph User Committee should have received an email from Loic asking them to confirm their organization's interest in being named a founding member. The formal announcement is currently being planned for 10 December and we are working on drafting it. Would members prefer a single general announcement or a personalized announcement? A personalized announcement would probably be something like an automatically generated PDF file containing a letter (with the member's name/affiliation) so that members could distribute it. We are open to suggestions. If you have a preference for a general announcement listing all of the members or a personalized announcement welcoming the user (which obviously could include a list of all members), please reply. Best Regards, Nate Regola ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Blocked requests during and after CephFS delete
Hello Ceph-Gurus, a short while ago I reported some trouble we had with our cluster suddenly going into a state of blocked requests. We did a few tests, and we can reproduce the problem: During / after deleting of a substantial chunk of data on CephFS (a few TB), ceph health shows blocked requests like HEALTH_WARN 222 requests are blocked 32 sec This goes on for a couple of minutes, during which the cluster is pretty much unusable. The number of blocked requests jumps around (but seems to go down on average), until finally (after about 15 minutes in my last test) health is back to OK. I upgraded the cluster to Ceph emperor (0.72.1) and repeated the test, but the problem persists. Is this normal - and if not, what might be the reason? Obviously, having the cluster go on strike for a while after data deletion is a bit of a problem, especially with a mixed application load. The VM's running on RBDs aren't too happy about it, for example. ;-) Our cluster structure: 6 Nodes, 6x 3TB disks plus 1x System/Journal SSD per node, one OSD per disk. We're running ceph version 0.72.1-1precise on Ubuntu 12.04.3 with kernel 3.8.0-33-generic (x86_64). All active pools use replication factor 3. Any ideas? Cheers, Oliver ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Emergency! Production Cluster is down
Hi Howie, Our support team is available 24/7. You might have called the wrong number. We conduct onboarding sessions with our subscription customers where we brief new customers on how to get assistance even at 1am on a Saturday night. I will send you a pm with more information. Regards, Wolfgang VP Services, Inktank On 12/8/13 9:02 AM, Mark Nelson mark.nel...@inktank.com wrote: Hello Howie, Is your cluster still down? If you have a support contract with us please make sure to submit a support ticket so that our professional services team sees it. If not, I'd suggest looking through the logs on the hosts that have remaining monitors and seeing if they say anything. You can also set debug mon = 20 in your ceph.conf file and restart the mons to get more debugging info. Mark On 12/08/2013 12:39 AM, Mark Kirkwood wrote: On 08/12/13 19:28, Howie C. wrote: Hello Guys, Tonight when I was trying to remove 2 monitors from the production cluster, everything seems fine but all the sudden I cannot connect to the cluster no more, showing root@mon01:~# ceph mon dump 2013-12-07 22:24:57.693246 7f7ee21cc700 0 monclient(hunting): authenticate timed out after 300 2013-12-07 22:24:57.693291 7f7ee21cc700 0 librados: client.admin authentication error (110) Connection timed out Error connecting to cluster: TimedOut I tried to call Intank, but no ones there. Any suggestions? Please help! How many monitors did you have (before removing the 2)? Check that your ceph.conf on the host where you are running the mon dump has them all listed (otherwise use the -m switch to specify own you know is still there)! It might be that the remaining ones are just taking a few moments to decide on a quorum. Regards Mark ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Blocked requests during and after CephFS delete
On Sun, Dec 8, 2013 at 7:16 AM, Oliver Schulz osch...@mpp.mpg.de wrote: Hello Ceph-Gurus, a short while ago I reported some trouble we had with our cluster suddenly going into a state of blocked requests. We did a few tests, and we can reproduce the problem: During / after deleting of a substantial chunk of data on CephFS (a few TB), ceph health shows blocked requests like HEALTH_WARN 222 requests are blocked 32 sec This goes on for a couple of minutes, during which the cluster is pretty much unusable. The number of blocked requests jumps around (but seems to go down on average), until finally (after about 15 minutes in my last test) health is back to OK. I upgraded the cluster to Ceph emperor (0.72.1) and repeated the test, but the problem persists. Is this normal - and if not, what might be the reason? Obviously, having the cluster go on strike for a while after data deletion is a bit of a problem, especially with a mixed application load. The VM's running on RBDs aren't too happy about it, for example. ;-) Nobody's reported it before, but I think the CephFS MDS is sending out too many delete requests. When you delete something in CephFS, it's just marked as deleted and the MDS is supposed to do so asynchronously in the background, but I'm not sure if there are any throttles on how quickly it does so. If you remove several terabytes worth of data, and the MDS is sending out RADOS object deletes for each 4MB as fast as it can, that's a lot of unfiltered traffic on the OSDs. That's all speculation on my part though; can you go sample the slow requests and see what their makeup looked like? Do you have logs from the MDS or OSDs during that time period? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Emergency! Production Cluster is down
Hi Howie, Our support team is available 24/7. You might have called the wrong number. We conduct onboarding sessions with our subscription customers where we Regards, Wolfgang VP Services, Inktank On 12/8/13 9:02 AM, Mark Nelson mark.nel...@inktank.com wrote: Hello Howie, Is your cluster still down? If you have a support contract with us please make sure to submit a support ticket so that our professional services team sees it. If not, I'd suggest looking through the logs on the hosts that have remaining monitors and seeing if they say anything. You can also set debug mon = 20 in your ceph.conf file and restart the mons to get more debugging info. Mark On 12/08/2013 12:39 AM, Mark Kirkwood wrote: On 08/12/13 19:28, Howie C. wrote: Hello Guys, Tonight when I was trying to remove 2 monitors from the production cluster, everything seems fine but all the sudden I cannot connect to the cluster no more, showing root@mon01:~# ceph mon dump 2013-12-07 22:24:57.693246 7f7ee21cc700 0 monclient(hunting): authenticate timed out after 300 2013-12-07 22:24:57.693291 7f7ee21cc700 0 librados: client.admin authentication error (110) Connection timed out Error connecting to cluster: TimedOut I tried to call Intank, but no ones there. Any suggestions? Please help! How many monitors did you have (before removing the 2)? Check that your ceph.conf on the host where you are running the mon dump has them all listed (otherwise use the -m switch to specify own you know is still there)! It might be that the remaining ones are just taking a few moments to decide on a quorum. Regards Mark ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] 1MB/s throughput to 33-ssd test cluster
Hi. So, I have a test cluster made up of ludicrously overpowered machines with nothing but SSDs in them. Bonded 10Gbps NICs (802.3ad layer 2+3 xmit hash policy, confirmed ~19.8 Gbps throughput with 32+ threads). I'm running rados bench, and I am currently getting less than 1 MBps throughput: sudo rados -N `hostname` bench 600 write -b 4096 -p volumes --no-cleanup -t 32 bench_write_4096_volumes_1_32.out 21' Colocated journals on the same disk, so I'm not expecting optimum throughput, but previous tests on spinning disks have shown reasonable speeds (23MB/s, 4000-6000 iops) as opposed to the 150-450 iops I'm currently getting. ceph_deploy@ssd-1001:~$ sudo ceph -s cluster 4167d5f2-2b9e-4bde-a653-f24af68a45f8 health HEALTH_WARN clock skew detected on mon.ssd-1003 monmap e1: 3 mons at {ssd-1001= 10.20.69.101:6789/0,ssd-1002=10.20.69.102:6789/0,ssd-1003=10.20.69.103:6789/0}, election epoch 20, quorum 0,1,2 ssd-1001,ssd-1002,ssd-1003 osdmap e344: 33 osds: 33 up, 33 in pgmap v10600: 1650 pgs, 6 pools, 289 MB data, 74029 objects 466 GB used, 17621 GB / 18088 GB avail 1650 active+clean client io 1263 kB/s wr, 315 op/s ceph_deploy@ssd-1001:~$ sudo ceph osd tree # id weight type name up/down reweight -1 30.03 root default -2 10.01 host ssd-1001 0 0.91 osd.0 up 1 1 0.91 osd.1 up 1 2 0.91 osd.2 up 1 3 0.91 osd.3 up 1 4 0.91 osd.4 up 1 5 0.91 osd.5 up 1 6 0.91 osd.6 up 1 7 0.91 osd.7 up 1 8 0.91 osd.8 up 1 9 0.91 osd.9 up 1 10 0.91 osd.10 up 1 -3 10.01 host ssd-1002 11 0.91 osd.11 up 1 12 0.91 osd.12 up 1 13 0.91 osd.13 up 1 14 0.91 osd.14 up 1 15 0.91 osd.15 up 1 16 0.91 osd.16 up 1 17 0.91 osd.17 up 1 18 0.91 osd.18 up 1 19 0.91 osd.19 up 1 20 0.91 osd.20 up 1 21 0.91 osd.21 up 1 -4 10.01 host ssd-1003 22 0.91 osd.22 up 1 23 0.91 osd.23 up 1 24 0.91 osd.24 up 1 25 0.91 osd.25 up 1 26 0.91 osd.26 up 1 27 0.91 osd.27 up 1 28 0.91 osd.28 up 1 29 0.91 osd.29 up 1 30 0.91 osd.30 up 1 31 0.91 osd.31 up 1 32 0.91 osd.32 up 1 The clock skew error can safely be ignored. It's something like 2-3 ms skew, I just haven't bothered configuring away the warning. This is with a newly-created pool after deleting the last pool used for testing. Any suggestions on where to start debugging? thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 1MB/s throughput to 33-ssd test cluster
On 09/12/13 17:07, Greg Poirier wrote: Hi. So, I have a test cluster made up of ludicrously overpowered machines with nothing but SSDs in them. Bonded 10Gbps NICs (802.3ad layer 2+3 xmit hash policy, confirmed ~19.8 Gbps throughput with 32+ threads). I'm running rados bench, and I am currently getting less than 1 MBps throughput: sudo rados -N `hostname` bench 600 write -b 4096 -p volumes --no-cleanup -t 32 bench_write_4096_volumes_1_32.out 21' Colocated journals on the same disk, so I'm not expecting optimum throughput, but previous tests on spinning disks have shown reasonable speeds (23MB/s, 4000-6000 iops) as opposed to the 150-450 iops I'm currently getting. ceph_deploy@ssd-1001:~$ sudo ceph -s cluster 4167d5f2-2b9e-4bde-a653-f24af68a45f8 health HEALTH_WARN clock skew detected on mon.ssd-1003 monmap e1: 3 mons at {ssd-1001=10.20.69.101:6789/0,ssd-1002=10.20.69.102:6789/0,ssd-1003=10.20.69.103:6789/0 http://10.20.69.101:6789/0,ssd-1002=10.20.69.102:6789/0,ssd-1003=10.20.69.103:6789/0}, election epoch 20, quorum 0,1,2 ssd-1001,ssd-1002,ssd-1003 osdmap e344: 33 osds: 33 up, 33 in pgmap v10600: 1650 pgs, 6 pools, 289 MB data, 74029 objects 466 GB used, 17621 GB / 18088 GB avail 1650 active+clean client io 1263 kB/s wr, 315 op/s ceph_deploy@ssd-1001:~$ sudo ceph osd tree # idweighttype nameup/downreweight -130.03root default -210.01host ssd-1001 00.91osd.0up1 10.91osd.1up1 20.91osd.2up1 30.91osd.3up1 40.91osd.4up1 50.91osd.5up1 60.91osd.6up1 70.91osd.7up1 80.91osd.8up1 90.91osd.9up1 100.91osd.10up1 -310.01host ssd-1002 110.91osd.11up1 120.91osd.12up1 130.91osd.13up1 140.91osd.14up1 150.91osd.15up1 160.91osd.16up1 170.91osd.17up1 180.91osd.18up1 190.91osd.19up1 200.91osd.20up1 210.91osd.21up1 -410.01host ssd-1003 220.91osd.22up1 230.91osd.23up1 240.91osd.24up1 250.91osd.25up1 260.91osd.26up1 270.91osd.27up1 280.91osd.28up1 290.91osd.29up1 300.91osd.30up1 310.91osd.31up1 320.91osd.32up1 The clock skew error can safely be ignored. It's something like 2-3 ms skew, I just haven't bothered configuring away the warning. This is with a newly-created pool after deleting the last pool used for testing. Any suggestions on where to start debugging? I'd suggest testing the components separately - try to rule out NIC (and switch) issues and SSD performance issues, then when you are sure the bits all go fast individually test how ceph performs again. What make and model of SSD? I'd check that the firmware is up to date (sometimes makes a huge difference). I'm also wondering if you might get better performance by having (say) 7 osds and using 4 of the SSD for journals for them. Cheers Mark ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com