Re: [ceph-users] Size of RBD images
Thanks Josh! This is a lot clearer now. I understand that librbd is low-level, but still, a warning wouldn't hurt, would it? Just check if the size parameter is larger than the cluster capacity, no? Thank you for pointing out the trick of simply deleting the rbd_header, I will try that now. Best regards, Nicolas Canceill Scalable Storage Systems SURFsara (Amsterdam, NL) On 11/20/2013 06:33 PM, Josh Durgin wrote: On 11/20/2013 06:53 AM, nicolasc wrote: Thank you Bernhard and Wogri. My old kernel version also explains the format issue. Once again, sorry to have mixed that in the problem. Back to my original inquiries, I hope someone can help me understand why: * it is possible to create an RBD image larger than the total capacity of the cluster There's simply no checking of the size of the cluster by librbd. rbd does not know whether you're about to add a bunch of capacity to the cluster, or whether you want your storage overcommitted (and by how much). Higher level tools like openstack cinder can provide that kind of logic, but 'rbd create' is more of a low level tool at this time. * a large empty image takes longer to shrink/delete than a small one rbd doesn't keep an index of which objects exist (since doing so would hurt write performance). The downside is as you saw, when shrinking or deleting an image it must look for all objects above the shrink size (deleting is like shrinking to 0 of course). In dumpling or later rbd can do this in parallel controlled by --rbd-concurrent-management-ops, which defaults to 10. If you've never written to the image, you can just delete the rbd_header and rbd_id objects for it (or just the $imagename.rbd object for format 1 images), then 'rbd rm' will be fast since it'll just remove its entry from the rbd_directory object. Josh Best regards, Nicolas Canceill Scalable Storage Systems SURFsara (Amsterdam, NL) On 11/20/2013 01:56 PM, Bernhard Glomm wrote: That might be, manpage of ceph version 0.72.1 tells me it isn't though. anyhow still running kernel 3.8.xx Bernhard Am 19.11.2013 20:10:04, schrieb Wolfgang Hennerbichler: On Nov 19, 2013, at 3:47 PM, Bernhard Glomm bernhard.gl...@ecologic.eu wrote: Hi Nicolas just fyi rbd format 2 is not supported yet by the linux kernel (module) I believe this is wrong. I think linux supports rbd format 2 images since 3.10. wogri -- *Ecologic Institute* *Bernhard Glomm* IT Administration Phone: +49 (30) 86880 134 Fax: +49 (30) 86880 100 Skype: bernhard.glomm.ecologic Website: http://ecologic.eu | Video: http://www.youtube.com/v/hZtiK04A9Yo | Newsletter: http://ecologic.eu/newsletter/subscribe | Facebook: http://www.facebook.com/Ecologic.Institute | Linkedin: http://www.linkedin.com/company/ecologic-institute-berlin-germany | Twitter: http://twitter.com/EcologicBerlin | YouTube: http://www.youtube.com/user/EcologicInstitute | Google+: http://plus.google.com/113756356645020994482 Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | 10717 Berlin | Germany GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | USt/VAT-IdNr.: DE811963464 Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Size of RBD images
-- http://www.wogri.at On Nov 21, 2013, at 10:30 AM, nicolasc nicolas.cance...@surfsara.nl wrote: Thanks Josh! This is a lot clearer now. I understand that librbd is low-level, but still, a warning wouldn't hurt, would it? Just check if the size parameter is larger than the cluster capacity, no? maybe I want to create a huge image now, and add the OSD capacity later. so this makes sense. Thank you for pointing out the trick of simply deleting the rbd_header, I will try that now. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Size of RBD images
Yes, I understand that creating an image larger than the cluster may sometimes be considered a feature. I am not suggesting it should be forbidden, simply that it should display a warning message to the operator. Full disc: I am not a Ceph dev, this is a simple user's opinion Best regards, Nicolas Canceill Scalable Storage Systems SURFsara (Amsterdam, NL) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Size of RBD images
That might be, manpage of ceph version 0.72.1 tells me it isn't though. anyhow still running kernel 3.8.xx Bernhard Am 19.11.2013 20:10:04, schrieb Wolfgang Hennerbichler: On Nov 19, 2013, at 3:47 PM, Bernhard Glomm bernhard.gl...@ecologic.eu wrote: Hi Nicolas just fyi rbd format 2 is not supported yet by the linux kernel (module) I believe this is wrong. I think linux supports rbd format 2 images since 3.10. wogri -- Bernhard Glomm IT Administration Phone: +49 (30) 86880 134 Fax: +49 (30) 86880 100 Skype: bernhard.glomm.ecologic Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | 10717 Berlin | Germany GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | USt/VAT-IdNr.: DE811963464 Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Size of RBD images
Thank you Bernhard and Wogri. My old kernel version also explains the format issue. Once again, sorry to have mixed that in the problem. Back to my original inquiries, I hope someone can help me understand why: * it is possible to create an RBD image larger than the total capacity of the cluster * a large empty image takes longer to shrink/delete than a small one Best regards, Nicolas Canceill Scalable Storage Systems SURFsara (Amsterdam, NL) On 11/20/2013 01:56 PM, Bernhard Glomm wrote: That might be, manpage of ceph version 0.72.1 tells me it isn't though. anyhow still running kernel 3.8.xx Bernhard Am 19.11.2013 20:10:04, schrieb Wolfgang Hennerbichler: On Nov 19, 2013, at 3:47 PM, Bernhard Glomm bernhard.gl...@ecologic.eu # wrote: Hi Nicolas just fyi rbd format 2 is not supported yet by the linux kernel (module) I believe this is wrong. I think linux supports rbd format 2 images since 3.10. wogri -- *Ecologic Institute**Bernhard Glomm* IT Administration Phone: +49 (30) 86880 134 Fax:+49 (30) 86880 100 Skype: bernhard.glomm.ecologic Website: http://ecologic.eu | Video: http://www.youtube.com/v/hZtiK04A9Yo | Newsletter: http://ecologic.eu/newsletter/subscribe | Facebook: http://www.facebook.com/Ecologic.Institute | Linkedin: http://www.linkedin.com/company/ecologic-institute-berlin-germany | Twitter: http://twitter.com/EcologicBerlin | YouTube: http://www.youtube.com/user/EcologicInstitute | Google+: http://plus.google.com/113756356645020994482 Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | 10717 Berlin | Germany GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | USt/VAT-IdNr.: DE811963464 Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Size of RBD images
On 11/20/2013 06:53 AM, nicolasc wrote: Thank you Bernhard and Wogri. My old kernel version also explains the format issue. Once again, sorry to have mixed that in the problem. Back to my original inquiries, I hope someone can help me understand why: * it is possible to create an RBD image larger than the total capacity of the cluster There's simply no checking of the size of the cluster by librbd. rbd does not know whether you're about to add a bunch of capacity to the cluster, or whether you want your storage overcommitted (and by how much). Higher level tools like openstack cinder can provide that kind of logic, but 'rbd create' is more of a low level tool at this time. * a large empty image takes longer to shrink/delete than a small one rbd doesn't keep an index of which objects exist (since doing so would hurt write performance). The downside is as you saw, when shrinking or deleting an image it must look for all objects above the shrink size (deleting is like shrinking to 0 of course). In dumpling or later rbd can do this in parallel controlled by --rbd-concurrent-management-ops, which defaults to 10. If you've never written to the image, you can just delete the rbd_header and rbd_id objects for it (or just the $imagename.rbd object for format 1 images), then 'rbd rm' will be fast since it'll just remove its entry from the rbd_directory object. Josh Best regards, Nicolas Canceill Scalable Storage Systems SURFsara (Amsterdam, NL) On 11/20/2013 01:56 PM, Bernhard Glomm wrote: That might be, manpage of ceph version 0.72.1 tells me it isn't though. anyhow still running kernel 3.8.xx Bernhard Am 19.11.2013 20:10:04, schrieb Wolfgang Hennerbichler: On Nov 19, 2013, at 3:47 PM, Bernhard Glomm bernhard.gl...@ecologic.eu wrote: Hi Nicolas just fyi rbd format 2 is not supported yet by the linux kernel (module) I believe this is wrong. I think linux supports rbd format 2 images since 3.10. wogri -- *Ecologic Institute**Bernhard Glomm* IT Administration Phone: +49 (30) 86880 134 Fax:+49 (30) 86880 100 Skype: bernhard.glomm.ecologic Website: http://ecologic.eu | Video: http://www.youtube.com/v/hZtiK04A9Yo | Newsletter: http://ecologic.eu/newsletter/subscribe | Facebook: http://www.facebook.com/Ecologic.Institute | Linkedin: http://www.linkedin.com/company/ecologic-institute-berlin-germany | Twitter: http://twitter.com/EcologicBerlin | YouTube: http://www.youtube.com/user/EcologicInstitute | Google+: http://plus.google.com/113756356645020994482 Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | 10717 Berlin | Germany GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | USt/VAT-IdNr.: DE811963464 Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Size of RBD images
Hi Nicolas just fyi rbd format 2 is not supported yet by the linux kernel (module) it only can be used as a target for virtual machines using librbd see: man rbd -- --image-format shrinking time: same happend to me, rbd (v1) device took about a week to shrink from 1PB to 10TB the good news: I had already about 5TB data on it and ongoing processes using the device and neither was there any data loss nor was there a significant performance issue. (3mons + 4machines with different amount of OSDs each) Bernhard EDIT: sorry about the No such file error Now, it seems this is a separate issue: the system I was using was apparently unable to map devices to images in format 2. I will be investigating that further before mentioning it again. I would still appreciate answers about the 1PB image and the time to shrink it. Best regards, Nicolas Canceill Scalable Storage Systems SURFsara (Amsterdam, NL) On 11/19/2013 03:20 PM, nicolasc wrote: Hi every one, In the course of playing with RBD, I noticed a few things: * The RBD images are so thin-provisioned, you can create arbitrarily large ones. On my 0.72.1 freshly-installed empty 200TB cluster, I was able to create a 1PB image: $ rbd create --image-format 2 --size 1073741824 test_img This command is successful, and I can check the image status: $ rbd info test rbd image 'test': size 1024 TB in 268435456 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.19f76b8b4567 format: 2 features: layering * Such an oversized image seems unmountable on my 3.2.46 kernel. However the error message is not very explicit: $ rbd map test_img rbd: add failed: (2) No such file or directory There is no error or explanation to be seen anywhere in the logs. dmesg reports the connection to the cluster through RBD as usual, and that's it. Using the exact same commands with image size 32GB will successfully map the device. * Such an oversize image takes an awfully long time to shrink or remove. However, the image has just been created and is empty. In RADOS, I only see the corresponding rbd_id and rbd_header, but no data object at all. Still, removing the 1PB image takes roughly 8 hours. Cluster config: 3 mons, 8nodes * 72osds, about 4800pgs (2400pgs in pool rbd) cluster and public network are 10GbE, each node has 8 cores and 64GB mem Oh, so my questions: - why is it possible to create an image five times the size of the cluster without warning? - where could this No such file error come from? - why does it take long to shrink/delete a large-but-empty-and-thin-provisioned image? I know that 1PB is oversized (No such file when trying to map), and 32GB is not, so I am currently looking for the oversize threshold. More info coming soon. Best regards, Nicolas Canceill Scalable Storage Systems SURFsara (Amsterdam, NL) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Bernhard Glomm IT Administration Phone: +49 (30) 86880 134 Fax: +49 (30) 86880 100 Skype: bernhard.glomm.ecologic Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | 10717 Berlin | Germany GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | USt/VAT-IdNr.: DE811963464 Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Size of RBD images
On 11/19/13, 2:10 PM, Wolfgang Hennerbichler wo...@wogri.com wrote: On Nov 19, 2013, at 3:47 PM, Bernhard Glomm bernhard.gl...@ecologic.eu wrote: Hi Nicolas just fyi rbd format 2 is not supported yet by the linux kernel (module) I believe this is wrong. I think linux supports rbd format 2 images since 3.10. One more reason to cross our fingers for official Saucy Salamander support soonŠ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Size of RBD images
So is there any size limit on RBD images? I had a failure this morning mounting 1TB RBD. Deleting now (why does it take so long to delete if it was never even mapped, much less written to?) and will retry with smaller images. See output below. This is 0.72 on Ubuntu 13.04 with 3.12 kernel. ceph@joceph-client01:~$ rbd info testrbd rbd image 'testrbd': size 1024 GB in 262144 objects order 22 (4096 kB objects) block_name_prefix: rb.0.1770.6b8b4567 format: 1 ceph@joceph-client01:~$ rbd map testrbd -p testpool01 rbd: add failed: (13) Permission denied ceph@joceph-client01:~$ sudo rbd map testrbd -p testpool01 rbd: add failed: (2) No such file or directory ceph@joceph-client01:/etc/ceph$ rados df pool name category KB objects clones degraded unfound rdrd KB wrwr KB data- 000 0 00000 metadata- 000 0 00000 rbd - 120 0 0 10788 testpool01 - 000 0 00000 testpool02 - 000 0 00000 testpool03 - 000 0 00000 testpool04 - 000 0 00000 total used 23287851602 total avail 9218978040 total space11547763200 ceph@joceph-client01:/etc/ceph$ sudo modprobe rbd ceph@joceph-client01:/etc/ceph$ sudo rbd map testrbd --pool testpool01 rbd: add failed: (2) No such file or directory ceph@joceph-client01:/etc/ceph$ rbd info testrbd rbd image 'testrbd': size 1024 GB in 262144 objects order 22 (4096 kB objects) block_name_prefix: rb.0.1770.6b8b4567 format: 1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Size of RBD images
-Original Message- From: Gruher, Joseph R Sent: Tuesday, November 19, 2013 12:24 PM To: 'Wolfgang Hennerbichler'; Bernhard Glomm Cc: ceph-users@lists.ceph.com Subject: RE: [ceph-users] Size of RBD images So is there any size limit on RBD images? I had a failure this morning mounting 1TB RBD. Deleting now (why does it take so long to delete if it was never even mapped, much less written to?) and will retry with smaller images. See output below. This is 0.72 on Ubuntu 13.04 with 3.12 kernel. ceph@joceph-client01:~$ rbd info testrbd rbd image 'testrbd': size 1024 GB in 262144 objects order 22 (4096 kB objects) block_name_prefix: rb.0.1770.6b8b4567 format: 1 ceph@joceph-client01:~$ rbd map testrbd -p testpool01 rbd: add failed: (13) Permission denied ceph@joceph-client01:~$ sudo rbd map testrbd -p testpool01 rbd: add failed: (2) No such file or directory ceph@joceph-client01:/etc/ceph$ rados df pool name category KB objects clones degraded unfound rdrd KB wrwr KB data- 000 0 0000 0 metadata- 000 0 0000 0 rbd - 120 0 0 1078 8 testpool01 - 000 0 0000 0 testpool02 - 000 0 0000 0 testpool03 - 000 0 0000 0 testpool04 - 000 0 0000 0 total used 23287851602 total avail 9218978040 total space11547763200 ceph@joceph-client01:/etc/ceph$ sudo modprobe rbd ceph@joceph-client01:/etc/ceph$ sudo rbd map testrbd --pool testpool01 rbd: add failed: (2) No such file or directory ceph@joceph-client01:/etc/ceph$ rbd info testrbd rbd image 'testrbd': size 1024 GB in 262144 objects order 22 (4096 kB objects) block_name_prefix: rb.0.1770.6b8b4567 format: 1 I think I figured out where I went wrong here. I had thought if you didn't specify the pool on the 'rbd create' command line you could then later map to any pool. In retrospect that probably doesn't make a lot of sense and it appears if you don't specify the pool at the create step it just defaults to the rbd pool. See example below. ceph@joceph-client01:/etc/ceph$ sudo rbd create --size 1048576 testimage5 --pool testpool01 ceph@joceph-client01:/etc/ceph$ sudo rbd map testimage5 --pool testpool01 ceph@joceph-client01:/etc/ceph$ sudo rbd create --size 1048576 testimage6 ceph@joceph-client01:/etc/ceph$ sudo rbd map testimage6 --pool testpool01 rbd: add failed: (2) No such file or directory ceph@joceph-client01:/etc/ceph$ sudo rbd map testimage6 --pool rbd ceph@joceph-client01:/etc/ceph$ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com