Re: [ceph-users] Size of RBD images

2013-11-21 Thread nicolasc

Thanks Josh! This is a lot clearer now.

I understand that librbd is low-level, but still, a warning wouldn't 
hurt, would it? Just check if the size parameter is larger than the 
cluster capacity, no?


Thank you for pointing out the trick of simply deleting the rbd_header, 
I will try that now.


Best regards,

Nicolas Canceill
Scalable Storage Systems
SURFsara (Amsterdam, NL)



On 11/20/2013 06:33 PM, Josh Durgin wrote:

On 11/20/2013 06:53 AM, nicolasc wrote:

Thank you Bernhard and Wogri. My old kernel version also explains the
format issue. Once again, sorry to have mixed that in the problem.

Back to my original inquiries, I hope someone can help me understand 
why:

* it is possible to create an RBD image larger than the total capacity
of the cluster


There's simply no checking of the size of the cluster by librbd.
rbd does not know whether you're about to add a bunch of capacity to 
the cluster, or whether you want your storage overcommitted (and by 
how much).


Higher level tools like openstack cinder can provide that kind of 
logic, but 'rbd create' is more of a low level tool at this time.



* a large empty image takes longer to shrink/delete than a small one


rbd doesn't keep an index of which objects exist (since doing so would 
hurt write performance). The downside is as you saw, when shrinking or

deleting an image it must look for all objects above the shrink size
(deleting is like shrinking to 0 of course).

In dumpling or later rbd can do this in parallel controlled by 
--rbd-concurrent-management-ops, which defaults to 10.


If you've never written to the image, you can just delete the rbd_header
and rbd_id objects for it (or just the $imagename.rbd object for format 1
images), then 'rbd rm' will be fast since it'll just remove its entry 
from

the rbd_directory object.

Josh


Best regards,

Nicolas Canceill
Scalable Storage Systems
SURFsara (Amsterdam, NL)



On 11/20/2013 01:56 PM, Bernhard Glomm wrote:

That might be,
manpage of
ceph version 0.72.1
tells me it isn't though.
anyhow still running kernel 3.8.xx

Bernhard

Am 19.11.2013 20:10:04, schrieb Wolfgang Hennerbichler:


On Nov 19, 2013, at 3:47 PM, Bernhard Glomm
bernhard.gl...@ecologic.eu wrote:

Hi Nicolas
just fyi
rbd format 2 is not supported yet by the linux kernel (module)


I believe this is wrong. I think linux supports rbd format 2
images since 3.10.

wogri




--
 


*Ecologic Institute* *Bernhard Glomm*
IT Administration

Phone: +49 (30) 86880 134
Fax: +49 (30) 86880 100
Skype: bernhard.glomm.ecologic

Website: http://ecologic.eu | Video:
http://www.youtube.com/v/hZtiK04A9Yo | Newsletter:
http://ecologic.eu/newsletter/subscribe | Facebook:
http://www.facebook.com/Ecologic.Institute | Linkedin:
http://www.linkedin.com/company/ecologic-institute-berlin-germany |
Twitter: http://twitter.com/EcologicBerlin | YouTube:
http://www.youtube.com/user/EcologicInstitute | Google+:
http://plus.google.com/113756356645020994482
Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | 10717
Berlin | Germany
GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | USt/VAT-IdNr.:
DE811963464
Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH
 







___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Size of RBD images

2013-11-21 Thread Wolfgang Hennerbichler

-- 
http://www.wogri.at

On Nov 21, 2013, at 10:30 AM, nicolasc nicolas.cance...@surfsara.nl wrote:

 Thanks Josh! This is a lot clearer now.
 
 I understand that librbd is low-level, but still, a warning wouldn't hurt, 
 would it? Just check if the size parameter is larger than the cluster 
 capacity, no?

maybe I want to create a huge image now, and add the OSD capacity later. so 
this makes sense. 

 Thank you for pointing out the trick of simply deleting the rbd_header, I 
 will try that now.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Size of RBD images

2013-11-21 Thread nicolasc
Yes, I understand that creating an image larger than the cluster may 
sometimes be considered a feature. I am not suggesting it should be 
forbidden, simply that it should display a warning message to the operator.


Full disc: I am not a Ceph dev, this is a simple user's opinion

Best regards,

Nicolas Canceill
Scalable Storage Systems
SURFsara (Amsterdam, NL)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Size of RBD images

2013-11-20 Thread Bernhard Glomm
That might be,
manpage of
ceph version 0.72.1
tells me it isn't though.
anyhow still running kernel 3.8.xx

Bernhard

Am 19.11.2013 20:10:04, schrieb Wolfgang Hennerbichler:

 On Nov 19, 2013, at 3:47 PM, Bernhard Glomm  bernhard.gl...@ecologic.eu  
 wrote:
 
  Hi Nicolas
  just fyi
  rbd format 2 is not supported yet by the linux kernel (module)

 I believe this is wrong. I think linux supports rbd format 2 images since 
 3.10. 
 
 wogri



-- 


  
 


  

  
Bernhard Glomm

IT Administration


  

  Phone:


  +49 (30) 86880 134

  
  Fax:


  +49 (30) 86880 100

  
  Skype:


  bernhard.glomm.ecologic

  

  









  


  Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | 10717 
Berlin | Germany

  GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | 
USt/VAT-IdNr.: DE811963464

  Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH

  

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Size of RBD images

2013-11-20 Thread nicolasc
Thank you Bernhard and Wogri. My old kernel version also explains the 
format issue. Once again, sorry to have mixed that in the problem.


Back to my original inquiries, I hope someone can help me understand why:
* it is possible to create an RBD image larger than the total capacity 
of the cluster

* a large empty image takes longer to shrink/delete than a small one

Best regards,

Nicolas Canceill
Scalable Storage Systems
SURFsara (Amsterdam, NL)



On 11/20/2013 01:56 PM, Bernhard Glomm wrote:

That might be,
manpage of
ceph version 0.72.1
tells me it isn't though.
anyhow still running kernel 3.8.xx

Bernhard

Am 19.11.2013 20:10:04, schrieb Wolfgang Hennerbichler:


On Nov 19, 2013, at 3:47 PM, Bernhard Glomm
bernhard.gl...@ecologic.eu # wrote:

Hi Nicolas
just fyi
rbd format 2 is not supported yet by the linux kernel (module)


I believe this is wrong. I think linux supports rbd format 2
images since 3.10.

wogri




--

*Ecologic Institute**Bernhard Glomm*
IT Administration

Phone:  +49 (30) 86880 134
Fax:+49 (30) 86880 100
Skype:  bernhard.glomm.ecologic

Website: http://ecologic.eu | Video: 
http://www.youtube.com/v/hZtiK04A9Yo | Newsletter: 
http://ecologic.eu/newsletter/subscribe | Facebook: 
http://www.facebook.com/Ecologic.Institute | Linkedin: 
http://www.linkedin.com/company/ecologic-institute-berlin-germany | 
Twitter: http://twitter.com/EcologicBerlin | YouTube: 
http://www.youtube.com/user/EcologicInstitute | Google+: 
http://plus.google.com/113756356645020994482
Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | 10717 
Berlin | Germany
GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | USt/VAT-IdNr.: 
DE811963464

Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Size of RBD images

2013-11-20 Thread Josh Durgin

On 11/20/2013 06:53 AM, nicolasc wrote:

Thank you Bernhard and Wogri. My old kernel version also explains the
format issue. Once again, sorry to have mixed that in the problem.

Back to my original inquiries, I hope someone can help me understand why:
* it is possible to create an RBD image larger than the total capacity
of the cluster


There's simply no checking of the size of the cluster by librbd.
rbd does not know whether you're about to add a bunch of capacity to the 
cluster, or whether you want your storage overcommitted (and by how much).


Higher level tools like openstack cinder can provide that kind of logic, 
but 'rbd create' is more of a low level tool at this time.



* a large empty image takes longer to shrink/delete than a small one


rbd doesn't keep an index of which objects exist (since doing so would 
hurt write performance). The downside is as you saw, when shrinking or

deleting an image it must look for all objects above the shrink size
(deleting is like shrinking to 0 of course).

In dumpling or later rbd can do this in parallel controlled by 
--rbd-concurrent-management-ops, which defaults to 10.


If you've never written to the image, you can just delete the rbd_header
and rbd_id objects for it (or just the $imagename.rbd object for format 1
images), then 'rbd rm' will be fast since it'll just remove its entry from
the rbd_directory object.

Josh


Best regards,

Nicolas Canceill
Scalable Storage Systems
SURFsara (Amsterdam, NL)



On 11/20/2013 01:56 PM, Bernhard Glomm wrote:

That might be,
manpage of
ceph version 0.72.1
tells me it isn't though.
anyhow still running kernel 3.8.xx

Bernhard

Am 19.11.2013 20:10:04, schrieb Wolfgang Hennerbichler:


On Nov 19, 2013, at 3:47 PM, Bernhard Glomm
bernhard.gl...@ecologic.eu wrote:

Hi Nicolas
just fyi
rbd format 2 is not supported yet by the linux kernel (module)


I believe this is wrong. I think linux supports rbd format 2
images since 3.10.

wogri




--

*Ecologic Institute**Bernhard Glomm*
IT Administration

Phone:  +49 (30) 86880 134
Fax:+49 (30) 86880 100
Skype:  bernhard.glomm.ecologic

Website: http://ecologic.eu | Video:
http://www.youtube.com/v/hZtiK04A9Yo | Newsletter:
http://ecologic.eu/newsletter/subscribe | Facebook:
http://www.facebook.com/Ecologic.Institute | Linkedin:
http://www.linkedin.com/company/ecologic-institute-berlin-germany |
Twitter: http://twitter.com/EcologicBerlin | YouTube:
http://www.youtube.com/user/EcologicInstitute | Google+:
http://plus.google.com/113756356645020994482
Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | 10717
Berlin | Germany
GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | USt/VAT-IdNr.:
DE811963464
Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Size of RBD images

2013-11-19 Thread Bernhard Glomm
Hi Nicolas
just fyi
rbd format 2 is not supported yet by the linux kernel (module)
it only can be used as a target for virtual machines using librbd
see: man rbd -- --image-format 

shrinking time: same happend to me,
rbd (v1) device
took about a week to shrink from 1PB  to 10TB
the good news: I had already about 5TB data on it
and ongoing processes using the device and 
neither was there any data loss nor was there
a significant performance issue.
(3mons + 4machines with different amount of OSDs each)

Bernhard

 EDIT: sorry about the No such file error
 
 Now, it seems this is a separate issue: the system I was using was 
 apparently unable to map devices to images in format 2. I will be 
 investigating that further before mentioning it again.
 
 I would still appreciate answers about the 1PB image and the time to 
 shrink it.
 
 Best regards,
 
 Nicolas Canceill
 Scalable Storage Systems
 SURFsara (Amsterdam, NL)
 
 
 On 11/19/2013 03:20 PM, nicolasc wrote:
  Hi every one,
  
  In the course of playing with RBD, I noticed a few things:
  
  * The RBD images are so thin-provisioned, you can create arbitrarily 
  large ones.
  On my 0.72.1 freshly-installed empty 200TB cluster, I was able to 
  create a 1PB image:
  
  $ rbd create --image-format 2 --size 1073741824 test_img
  
  This command is successful, and I can check the image status:
  
  $ rbd info test
  rbd image 'test':
  size 1024 TB in 268435456 objects
  order 22 (4096 kB objects)
  block_name_prefix: rbd_data.19f76b8b4567
  format: 2
  features: layering
  
  * Such an oversized image seems unmountable on my 3.2.46 kernel.
  However the error message is not very explicit:
  
  $ rbd map test_img
  rbd: add failed: (2) No such file or directory
  
  There is no error or explanation to be seen anywhere in the logs.
  dmesg reports the connection to the cluster through RBD as usual, 
  and that's it.
  Using the exact same commands with image size 32GB will successfully 
  map the device.
  
  * Such an oversize image takes an awfully long time to shrink or remove.
  However, the image has just been created and is empty.
  In RADOS, I only see the corresponding rbd_id and rbd_header, but no 
  data object at all.
  Still, removing the 1PB image takes roughly 8 hours.
  
  Cluster config:
  3 mons, 8nodes * 72osds, about 4800pgs (2400pgs in pool rbd)
  cluster and public network are 10GbE, each node has 8 cores and 64GB mem
  
  Oh, so my questions:
  - why is it possible to create an image five times the size of the 
  cluster without warning?
  - where could this No such file error come from?
  - why does it take long to shrink/delete a 
  large-but-empty-and-thin-provisioned image?
  
  I know that 1PB is oversized (No such file when trying to map), and 
  32GB is not, so I am currently looking for the oversize threshold. 
  More info coming soon.
  
  Best regards,
  
  Nicolas Canceill
  Scalable Storage Systems
  SURFsara (Amsterdam, NL)
  
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 


  
 


  

  
Bernhard Glomm

IT Administration


  

  Phone:


  +49 (30) 86880 134

  
  Fax:


  +49 (30) 86880 100

  
  Skype:


  bernhard.glomm.ecologic

  

  









  


  Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | 10717 
Berlin | Germany

  GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | 
USt/VAT-IdNr.: DE811963464

  Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH

  

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Size of RBD images

2013-11-19 Thread LaSalle, Jurvis
On 11/19/13, 2:10 PM, Wolfgang Hennerbichler wo...@wogri.com wrote:


On Nov 19, 2013, at 3:47 PM, Bernhard Glomm bernhard.gl...@ecologic.eu
wrote:

 Hi Nicolas
 just fyi
 rbd format 2 is not supported yet by the linux kernel (module)

I believe this is wrong. I think linux supports rbd format 2 images since
3.10. 

One more reason to cross our fingers for official Saucy Salamander support
soonŠ 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Size of RBD images

2013-11-19 Thread Gruher, Joseph R
So is there any size limit on RBD images?  I had a failure this morning 
mounting 1TB RBD.  Deleting now (why does it take so long to delete if it was 
never even mapped, much less written to?) and will retry with smaller images.  
See output below.  This is 0.72 on Ubuntu 13.04 with 3.12 kernel.

ceph@joceph-client01:~$ rbd info testrbd
rbd image 'testrbd':
size 1024 GB in 262144 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.1770.6b8b4567
format: 1

ceph@joceph-client01:~$ rbd map testrbd -p testpool01
rbd: add failed: (13) Permission denied

ceph@joceph-client01:~$ sudo rbd map testrbd -p testpool01
rbd: add failed: (2) No such file or directory

ceph@joceph-client01:/etc/ceph$ rados df
pool name   category KB  objects   clones 
degraded  unfound   rdrd KB   wrwr KB
data-  000  
  0   00000
metadata-  000  
  0   00000
rbd -  120  
  0   0   10788
testpool01  -  000  
  0   00000
testpool02  -  000  
  0   00000
testpool03  -  000  
  0   00000
testpool04  -  000  
  0   00000
  total used  23287851602
  total avail 9218978040
  total space11547763200

ceph@joceph-client01:/etc/ceph$ sudo modprobe rbd

ceph@joceph-client01:/etc/ceph$ sudo rbd map testrbd --pool testpool01
rbd: add failed: (2) No such file or directory

ceph@joceph-client01:/etc/ceph$ rbd info testrbd
rbd image 'testrbd':
size 1024 GB in 262144 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.1770.6b8b4567
format: 1


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Size of RBD images

2013-11-19 Thread Gruher, Joseph R


-Original Message-
From: Gruher, Joseph R
Sent: Tuesday, November 19, 2013 12:24 PM
To: 'Wolfgang Hennerbichler'; Bernhard Glomm
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] Size of RBD images

So is there any size limit on RBD images?  I had a failure this morning 
mounting
1TB RBD.  Deleting now (why does it take so long to delete if it was never even
mapped, much less written to?) and will retry with smaller images.  See
output below.  This is 0.72 on Ubuntu 13.04 with 3.12 kernel.

ceph@joceph-client01:~$ rbd info testrbd
rbd image 'testrbd':
size 1024 GB in 262144 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.1770.6b8b4567
format: 1

ceph@joceph-client01:~$ rbd map testrbd -p testpool01
rbd: add failed: (13) Permission denied

ceph@joceph-client01:~$ sudo rbd map testrbd -p testpool01
rbd: add failed: (2) No such file or directory

ceph@joceph-client01:/etc/ceph$ rados df
pool name   category KB  objects   clones 
degraded  unfound
rdrd KB   wrwr KB
data-  000 
   0   0000
0
metadata-  000 
   0   0000
0
rbd -  120 
   0   0   1078
8
testpool01  -  000 
   0   0000
0
testpool02  -  000 
   0   0000
0
testpool03  -  000 
   0   0000
0
testpool04  -  000 
   0   0000
0
  total used  23287851602
  total avail 9218978040
  total space11547763200

ceph@joceph-client01:/etc/ceph$ sudo modprobe rbd

ceph@joceph-client01:/etc/ceph$ sudo rbd map testrbd --pool testpool01
rbd: add failed: (2) No such file or directory

ceph@joceph-client01:/etc/ceph$ rbd info testrbd
rbd image 'testrbd':
size 1024 GB in 262144 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.1770.6b8b4567
format: 1


I think I figured out where I went wrong here.  I had thought if you didn't 
specify the pool on the 'rbd create' command line you could then later map to 
any pool.  In retrospect that probably doesn't make a lot of sense and it 
appears if you don't specify the pool at the create step it just defaults to 
the rbd pool.  See example below.

ceph@joceph-client01:/etc/ceph$ sudo rbd create --size 1048576 testimage5 
--pool testpool01
ceph@joceph-client01:/etc/ceph$ sudo rbd map testimage5 --pool testpool01

ceph@joceph-client01:/etc/ceph$ sudo rbd create --size 1048576 testimage6
ceph@joceph-client01:/etc/ceph$ sudo rbd map testimage6 --pool testpool01
rbd: add failed: (2) No such file or directory

ceph@joceph-client01:/etc/ceph$ sudo rbd map testimage6 --pool rbd
ceph@joceph-client01:/etc/ceph$
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com