Re: [ceph-users] xattrs vs omap
On Tue, Jul 14, 2015 at 10:53 AM, Jan Schermer j...@schermer.cz wrote: Thank you for your reply. Comments inline. I’m still hoping to get some more input, but there are many people running ceph on ext4, and it sounds like it works pretty good out of the box. Maybe I’m overthinking this, then? I think so — somebody did a lot of work making sure we were well-tuned on the standard filesystems; I believe it was David. -Greg Jan On 13 Jul 2015, at 21:04, Somnath Roy somnath@sandisk.com wrote: inline -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan Schermer Sent: Monday, July 13, 2015 2:32 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] xattrs vs omap Sorry for reviving an old thread, but could I get some input on this, pretty please? ext4 has 256-byte inodes by default (at least according to docs) but the fragment below says: OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512) The default 512b is too much if the inode is just 256b, so shouldn’t that be 256b in case people use the default ext4 inode size? Anyway, is it better to format ext4 with larger inodes (say 2048b) and set filestore_max_inline_xattr_size_other=1536, or leave it at defaults? [Somnath] Why 1536 ? why not 1024 or any power of 2 ? I am not seeing any harm though, but, curious. AFAIK there is other information in the inode other than xattrs, also you need to count the xattra labels into this - so if I want to store 1536B of “values” it would cost more, and there still needs to be some space left. (As I understand it, on ext4 xattrs ale limited to one block, inode size + something can spill to one different inode - maybe someone knows better). [Somnath] The xttr size (_) is now more than 256 bytes and it will spill over, so, bigger inode size will be good. But, I would suggest do your benchmark before putting it into production. Good poin and I am going to do that, but I’d like to avoid the guesswork. Also, not all patterns are always replicable…. Is filestore_max_inline_xattr_size and absolute limit, or is it filestore_max_inline_xattr_size*filestore_max_inline_xattrs in reality? [Somnath] The *_size is tracking the xttr size per attribute and *inline_xattrs keep track of max number of inline attributes allowed. So, if a xattr size is *_size , it will go to omap and also if the total number of xattra *inline_xattrs , it will go to omap. If you are only using rbd, the number of inline xattrs will be always 2 and it will not cross that default max limit. If I’m reading this correctly then with my setting of filestore_max_inline_xattr_size_other=1536, it could actually consume 3072B (2 xattrs), so I should in reality use 4K inodes…? Does OSD do the sane thing if for some reason the xattrs do not fit? What are the performance implications of storing the xattrs in leveldb? [Somnath] Even though I don't have the exact numbers, but, it has a significant overhead if the xattrs go to leveldb. And lastly - what size of xattrs should I really expect if all I use is RBD for OpenStack instances? (No radosgw, no cephfs, but heavy on rbd image and pool snapshots). This overhead is quite large [Somnath] It will be 2 xattrs, default _ will be little bigger than 256 bytes and _snapset is small depends on number of snaps/clones, but unlikely will cross 256 bytes range. I have few pool snapshots and lots (hundreds) of (nested) snapshots for rbd volumes. Does this come into play somehow? My plan so far is to format the drives like this: mkfs.ext4 -I 2048 -b 4096 -i 524288 -E stride=32,stripe-width=256 (2048b inode, 4096b block size, one inode for 512k of space and set filestore_max_inline_xattr_size_other=1536 [Somnath] Not much idea on ext4, sorry.. Does that make sense? Thanks! Jan On 02 Jul 2015, at 12:18, Jan Schermer j...@schermer.cz wrote: Does anyone have a known-good set of parameters for ext4? I want to try it as well but I’m a bit worried what happnes if I get it wrong. Thanks Jan On 02 Jul 2015, at 09:40, Nick Fisk n...@fisk.me.uk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Christian Balzer Sent: 02 July 2015 02:23 To: Ceph Users Subject: Re: [ceph-users] xattrs vs omap On Thu, 2 Jul 2015 00:36:18 + Somnath Roy wrote: It is replaced with the following config option.. // Use omap for xattrs for attrs over // filestore_max_inline_xattr_size or OPTION(filestore_max_inline_xattr_size, OPT_U32, 0) //Override OPTION(filestore_max_inline_xattr_size_xfs, OPT_U32, 65536) OPTION(filestore_max_inline_xattr_size_btrfs, OPT_U32, 2048) OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512) // for more than filestore_max_inline_xattrs attrs OPTION(filestore_max_inline_xattrs, OPT_U32, 0) //Override OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10
Re: [ceph-users] xattrs vs omap
Thank you for your reply. Comments inline. I’m still hoping to get some more input, but there are many people running ceph on ext4, and it sounds like it works pretty good out of the box. Maybe I’m overthinking this, then? Jan On 13 Jul 2015, at 21:04, Somnath Roy somnath@sandisk.com wrote: inline -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan Schermer Sent: Monday, July 13, 2015 2:32 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] xattrs vs omap Sorry for reviving an old thread, but could I get some input on this, pretty please? ext4 has 256-byte inodes by default (at least according to docs) but the fragment below says: OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512) The default 512b is too much if the inode is just 256b, so shouldn’t that be 256b in case people use the default ext4 inode size? Anyway, is it better to format ext4 with larger inodes (say 2048b) and set filestore_max_inline_xattr_size_other=1536, or leave it at defaults? [Somnath] Why 1536 ? why not 1024 or any power of 2 ? I am not seeing any harm though, but, curious. AFAIK there is other information in the inode other than xattrs, also you need to count the xattra labels into this - so if I want to store 1536B of “values” it would cost more, and there still needs to be some space left. (As I understand it, on ext4 xattrs ale limited to one block, inode size + something can spill to one different inode - maybe someone knows better). [Somnath] The xttr size (_) is now more than 256 bytes and it will spill over, so, bigger inode size will be good. But, I would suggest do your benchmark before putting it into production. Good poin and I am going to do that, but I’d like to avoid the guesswork. Also, not all patterns are always replicable…. Is filestore_max_inline_xattr_size and absolute limit, or is it filestore_max_inline_xattr_size*filestore_max_inline_xattrs in reality? [Somnath] The *_size is tracking the xttr size per attribute and *inline_xattrs keep track of max number of inline attributes allowed. So, if a xattr size is *_size , it will go to omap and also if the total number of xattra *inline_xattrs , it will go to omap. If you are only using rbd, the number of inline xattrs will be always 2 and it will not cross that default max limit. If I’m reading this correctly then with my setting of filestore_max_inline_xattr_size_other=1536, it could actually consume 3072B (2 xattrs), so I should in reality use 4K inodes…? Does OSD do the sane thing if for some reason the xattrs do not fit? What are the performance implications of storing the xattrs in leveldb? [Somnath] Even though I don't have the exact numbers, but, it has a significant overhead if the xattrs go to leveldb. And lastly - what size of xattrs should I really expect if all I use is RBD for OpenStack instances? (No radosgw, no cephfs, but heavy on rbd image and pool snapshots). This overhead is quite large [Somnath] It will be 2 xattrs, default _ will be little bigger than 256 bytes and _snapset is small depends on number of snaps/clones, but unlikely will cross 256 bytes range. I have few pool snapshots and lots (hundreds) of (nested) snapshots for rbd volumes. Does this come into play somehow? My plan so far is to format the drives like this: mkfs.ext4 -I 2048 -b 4096 -i 524288 -E stride=32,stripe-width=256 (2048b inode, 4096b block size, one inode for 512k of space and set filestore_max_inline_xattr_size_other=1536 [Somnath] Not much idea on ext4, sorry.. Does that make sense? Thanks! Jan On 02 Jul 2015, at 12:18, Jan Schermer j...@schermer.cz wrote: Does anyone have a known-good set of parameters for ext4? I want to try it as well but I’m a bit worried what happnes if I get it wrong. Thanks Jan On 02 Jul 2015, at 09:40, Nick Fisk n...@fisk.me.uk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Christian Balzer Sent: 02 July 2015 02:23 To: Ceph Users Subject: Re: [ceph-users] xattrs vs omap On Thu, 2 Jul 2015 00:36:18 + Somnath Roy wrote: It is replaced with the following config option.. // Use omap for xattrs for attrs over // filestore_max_inline_xattr_size or OPTION(filestore_max_inline_xattr_size, OPT_U32, 0) //Override OPTION(filestore_max_inline_xattr_size_xfs, OPT_U32, 65536) OPTION(filestore_max_inline_xattr_size_btrfs, OPT_U32, 2048) OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512) // for more than filestore_max_inline_xattrs attrs OPTION(filestore_max_inline_xattrs, OPT_U32, 0) //Override OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10) OPTION(filestore_max_inline_xattrs_btrfs, OPT_U32, 10) OPTION(filestore_max_inline_xattrs_other, OPT_U32, 2) If these limits crossed, xattrs will be stored in omap.. Sounds
Re: [ceph-users] xattrs vs omap
Instead of guessing I took a look at one of my OSDs. TL;DR: I’m going to bump the inode size to 512 which should fit majority of xattrs, no need to touch filestore parameters. Short news first - I can’t find a file with more than 2 xattrs. (and that’s good) Then I extracted all the xattrs on all the ~100K files, counted their size and counted the occurences. The largest xattrs I have are 705 chars in base64 (so let’s say it’s half), and that particular file has about 512B total in xattr (that’s more than was expected with RBD-only workload, right?) # file: var/lib/ceph/osd/ceph-55//current/4.1ad7_head/rbd134udata.1a785181f15746a.0005a578__head_E5C51AD7__4 117 user.ceph._=0sCwjyBANKACkAAAByYmRfZGF0YS4xYTc4NTE4MWYxNTc0NmEuMDAwMDAwMDAwMDA1YTU3OP7/1xrF5QAABAAFAxQEAP8AAADrEKMAADB2DQAiDaMA AG11DQACAhUI1xSoAQD9CwAMAEAAABAgpFWoa6QVAgIV6xCjAAAwdg0= 347 user.ceph.snapset=0sAgL5AQAAgt8HAAABBgAAAILfBwAAb94HAAC23AcAAEnPBwAA470HAAB4ugcAAAQAAAC1ugcAAOO9BwAAStAHAACC3wcAAAQAAAC1ugcAAAQAAABQFGAUwAowHwAAAJAZ4DggBwAA470HAAAFEA8gDwAAACAFSBQAAABADgAAAJAioAI4JQAAAMgaAABK0AcAAAQAAADgAQAAAOgBeCYAAACAKHAAACkAFwAAgt8HAAAFoAEAAADAAQAAAIAMUA4QBgAAAIAU4ACAFQAAAIAqAAAEtboHAEAAAOO9BwBAAABK0AcAQAAAgt8HAE== 705 (If anyone wants to enlighten me on the contents that would be great - is this expected to grow much?) BUT most of the files have much smaller xattrs, and if I researched it correctly it seems ext4 uses free space in inode (which should be something like inode_size-128-28=free) and if that’s not enough it will allocate one more block. In other words, if I format ext4 with 2048 inode size and 4096 block size, there will be 2048-(128+28)=1892 bytes available in the inode, and 4096 bytes can be allocated from another block. With default format, there will be just 256-(128+28)=100 bytes in the inode + 4096 bytes in another block. In my case, majority of the files have xattr size 200B, which is larger than fits inside one inode, but not really that large, so it should be beneficial to bump the inode size to 512B (that leaves plenty of 356 bytes for xattrs). Jan On 14 Jul 2015, at 12:18, Gregory Farnum g...@gregs42.com wrote: On Tue, Jul 14, 2015 at 10:53 AM, Jan Schermer j...@schermer.cz wrote: Thank you for your reply. Comments inline. I’m still hoping to get some more input, but there are many people running ceph on ext4, and it sounds like it works pretty good out of the box. Maybe I’m overthinking this, then? I think so — somebody did a lot of work making sure we were well-tuned on the standard filesystems; I believe it was David. -Greg Jan On 13 Jul 2015, at 21:04, Somnath Roy somnath@sandisk.com wrote: inline -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan Schermer Sent: Monday, July 13, 2015 2:32 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] xattrs vs omap Sorry for reviving an old thread, but could I get some input on this, pretty please? ext4 has 256-byte inodes by default (at least according to docs) but the fragment below says: OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512) The default 512b is too much if the inode is just 256b, so shouldn’t that be 256b in case people use the default ext4 inode size? Anyway, is it better to format ext4 with larger inodes (say 2048b) and set filestore_max_inline_xattr_size_other=1536, or leave it at defaults? [Somnath] Why 1536 ? why not 1024 or any power of 2 ? I am not seeing any harm though, but, curious. AFAIK there is other information in the inode other than xattrs, also you need to count the xattra labels into this - so if I want to store 1536B of “values” it would cost more, and there still needs to be some space left. (As I understand it, on ext4 xattrs ale limited to one block, inode size + something can spill to one different inode - maybe someone knows better). [Somnath] The xttr size (_) is now more than 256 bytes and it will spill over, so, bigger inode size will be good. But, I would suggest do your benchmark before putting it into production. Good poin and I am going to do that, but I’d like to avoid the guesswork. Also, not all patterns are always replicable…. Is filestore_max_inline_xattr_size and absolute limit, or is it filestore_max_inline_xattr_size*filestore_max_inline_xattrs in reality? [Somnath] The *_size is tracking
Re: [ceph-users] xattrs vs omap
Sorry for reviving an old thread, but could I get some input on this, pretty please? ext4 has 256-byte inodes by default (at least according to docs) but the fragment below says: OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512) The default 512b is too much if the inode is just 256b, so shouldn’t that be 256b in case people use the default ext4 inode size? Anyway, is it better to format ext4 with larger inodes (say 2048b) and set filestore_max_inline_xattr_size_other=1536, or leave it at defaults? (As I understand it, on ext4 xattrs ale limited to one block, inode size + something can spill to one different inode - maybe someone knows better). Is filestore_max_inline_xattr_size and absolute limit, or is it filestore_max_inline_xattr_size*filestore_max_inline_xattrs in reality? Does OSD do the sane thing if for some reason the xattrs do not fit? What are the performance implications of storing the xattrs in leveldb? And lastly - what size of xattrs should I really expect if all I use is RBD for OpenStack instances? (No radosgw, no cephfs, but heavy on rbd image and pool snapshots). This overhead is quite large My plan so far is to format the drives like this: mkfs.ext4 -I 2048 -b 4096 -i 524288 -E stride=32,stripe-width=256 (2048b inode, 4096b block size, one inode for 512k of space and set filestore_max_inline_xattr_size_other=1536 Does that make sense? Thanks! Jan On 02 Jul 2015, at 12:18, Jan Schermer j...@schermer.cz wrote: Does anyone have a known-good set of parameters for ext4? I want to try it as well but I’m a bit worried what happnes if I get it wrong. Thanks Jan On 02 Jul 2015, at 09:40, Nick Fisk n...@fisk.me.uk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Christian Balzer Sent: 02 July 2015 02:23 To: Ceph Users Subject: Re: [ceph-users] xattrs vs omap On Thu, 2 Jul 2015 00:36:18 + Somnath Roy wrote: It is replaced with the following config option.. // Use omap for xattrs for attrs over // filestore_max_inline_xattr_size or OPTION(filestore_max_inline_xattr_size, OPT_U32, 0) //Override OPTION(filestore_max_inline_xattr_size_xfs, OPT_U32, 65536) OPTION(filestore_max_inline_xattr_size_btrfs, OPT_U32, 2048) OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512) // for more than filestore_max_inline_xattrs attrs OPTION(filestore_max_inline_xattrs, OPT_U32, 0) //Override OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10) OPTION(filestore_max_inline_xattrs_btrfs, OPT_U32, 10) OPTION(filestore_max_inline_xattrs_other, OPT_U32, 2) If these limits crossed, xattrs will be stored in omap.. Sounds fair. Since I only use RBD I don't think it will ever exceed this. Possibly, see my thread about performance difference between new and old pools. Still not quite sure what's going on, but for some reasons some of the objects behind RBD's have larger xattrs which is causing really poor performance. Thanks, Chibi For ext4, you can use either filestore_max*_other or filestore_max_inline_xattrs/ filestore_max_inline_xattr_size. I any case, later two will override everything. Thanks Regards Somnath -Original Message- From: Christian Balzer [mailto:ch...@gol.com] Sent: Wednesday, July 01, 2015 5:26 PM To: Ceph Users Cc: Somnath Roy Subject: Re: [ceph-users] xattrs vs omap Hello, On Wed, 1 Jul 2015 15:24:13 + Somnath Roy wrote: It doesn't matter, I think filestore_xattr_use_omap is a 'noop' and not used in the Hammer. Then what was this functionality replaced with, esp. considering EXT4 based OSDs? Chibi Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Adam Tygart Sent: Wednesday, July 01, 2015 8:20 AM To: Ceph Users Subject: [ceph-users] xattrs vs omap Hello all, I've got a coworker who put filestore_xattr_use_omap = true in the ceph.conf when we first started building the cluster. Now he can't remember why. He thinks it may be a holdover from our first Ceph cluster (running dumpling on ext4, iirc). In the newly built cluster, we are using XFS with 2048 byte inodes, running Ceph 0.94.2. It currently has production data in it. From my reading of other threads, it looks like this is probably not something you want set to true (at least on XFS), due to performance implications. Is this something you can change on a running cluster? Is it worth the hassle? Thanks, Adam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified
Re: [ceph-users] xattrs vs omap
inline -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan Schermer Sent: Monday, July 13, 2015 2:32 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] xattrs vs omap Sorry for reviving an old thread, but could I get some input on this, pretty please? ext4 has 256-byte inodes by default (at least according to docs) but the fragment below says: OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512) The default 512b is too much if the inode is just 256b, so shouldn’t that be 256b in case people use the default ext4 inode size? Anyway, is it better to format ext4 with larger inodes (say 2048b) and set filestore_max_inline_xattr_size_other=1536, or leave it at defaults? [Somnath] Why 1536 ? why not 1024 or any power of 2 ? I am not seeing any harm though, but, curious. (As I understand it, on ext4 xattrs ale limited to one block, inode size + something can spill to one different inode - maybe someone knows better). [Somnath] The xttr size (_) is now more than 256 bytes and it will spill over, so, bigger inode size will be good. But, I would suggest do your benchmark before putting it into production. Is filestore_max_inline_xattr_size and absolute limit, or is it filestore_max_inline_xattr_size*filestore_max_inline_xattrs in reality? [Somnath] The *_size is tracking the xttr size per attribute and *inline_xattrs keep track of max number of inline attributes allowed. So, if a xattr size is *_size , it will go to omap and also if the total number of xattra *inline_xattrs , it will go to omap. If you are only using rbd, the number of inline xattrs will be always 2 and it will not cross that default max limit. Does OSD do the sane thing if for some reason the xattrs do not fit? What are the performance implications of storing the xattrs in leveldb? [Somnath] Even though I don't have the exact numbers, but, it has a significant overhead if the xattrs go to leveldb. And lastly - what size of xattrs should I really expect if all I use is RBD for OpenStack instances? (No radosgw, no cephfs, but heavy on rbd image and pool snapshots). This overhead is quite large [Somnath] It will be 2 xattrs, default _ will be little bigger than 256 bytes and _snapset is small depends on number of snaps/clones, but unlikely will cross 256 bytes range. My plan so far is to format the drives like this: mkfs.ext4 -I 2048 -b 4096 -i 524288 -E stride=32,stripe-width=256 (2048b inode, 4096b block size, one inode for 512k of space and set filestore_max_inline_xattr_size_other=1536 [Somnath] Not much idea on ext4, sorry.. Does that make sense? Thanks! Jan On 02 Jul 2015, at 12:18, Jan Schermer j...@schermer.cz wrote: Does anyone have a known-good set of parameters for ext4? I want to try it as well but I’m a bit worried what happnes if I get it wrong. Thanks Jan On 02 Jul 2015, at 09:40, Nick Fisk n...@fisk.me.uk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Christian Balzer Sent: 02 July 2015 02:23 To: Ceph Users Subject: Re: [ceph-users] xattrs vs omap On Thu, 2 Jul 2015 00:36:18 + Somnath Roy wrote: It is replaced with the following config option.. // Use omap for xattrs for attrs over // filestore_max_inline_xattr_size or OPTION(filestore_max_inline_xattr_size, OPT_U32, 0) //Override OPTION(filestore_max_inline_xattr_size_xfs, OPT_U32, 65536) OPTION(filestore_max_inline_xattr_size_btrfs, OPT_U32, 2048) OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512) // for more than filestore_max_inline_xattrs attrs OPTION(filestore_max_inline_xattrs, OPT_U32, 0) //Override OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10) OPTION(filestore_max_inline_xattrs_btrfs, OPT_U32, 10) OPTION(filestore_max_inline_xattrs_other, OPT_U32, 2) If these limits crossed, xattrs will be stored in omap.. Sounds fair. Since I only use RBD I don't think it will ever exceed this. Possibly, see my thread about performance difference between new and old pools. Still not quite sure what's going on, but for some reasons some of the objects behind RBD's have larger xattrs which is causing really poor performance. Thanks, Chibi For ext4, you can use either filestore_max*_other or filestore_max_inline_xattrs/ filestore_max_inline_xattr_size. I any case, later two will override everything. Thanks Regards Somnath -Original Message- From: Christian Balzer [mailto:ch...@gol.com] Sent: Wednesday, July 01, 2015 5:26 PM To: Ceph Users Cc: Somnath Roy Subject: Re: [ceph-users] xattrs vs omap Hello, On Wed, 1 Jul 2015 15:24:13 + Somnath Roy wrote: It doesn't matter, I think filestore_xattr_use_omap is a 'noop' and not used in the Hammer. Then what was this functionality replaced with, esp. considering EXT4 based OSDs? Chibi Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph
Re: [ceph-users] xattrs vs omap
Does anyone have a known-good set of parameters for ext4? I want to try it as well but I’m a bit worried what happnes if I get it wrong. Thanks Jan On 02 Jul 2015, at 09:40, Nick Fisk n...@fisk.me.uk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Christian Balzer Sent: 02 July 2015 02:23 To: Ceph Users Subject: Re: [ceph-users] xattrs vs omap On Thu, 2 Jul 2015 00:36:18 + Somnath Roy wrote: It is replaced with the following config option.. // Use omap for xattrs for attrs over // filestore_max_inline_xattr_size or OPTION(filestore_max_inline_xattr_size, OPT_U32, 0) //Override OPTION(filestore_max_inline_xattr_size_xfs, OPT_U32, 65536) OPTION(filestore_max_inline_xattr_size_btrfs, OPT_U32, 2048) OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512) // for more than filestore_max_inline_xattrs attrs OPTION(filestore_max_inline_xattrs, OPT_U32, 0) //Override OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10) OPTION(filestore_max_inline_xattrs_btrfs, OPT_U32, 10) OPTION(filestore_max_inline_xattrs_other, OPT_U32, 2) If these limits crossed, xattrs will be stored in omap.. Sounds fair. Since I only use RBD I don't think it will ever exceed this. Possibly, see my thread about performance difference between new and old pools. Still not quite sure what's going on, but for some reasons some of the objects behind RBD's have larger xattrs which is causing really poor performance. Thanks, Chibi For ext4, you can use either filestore_max*_other or filestore_max_inline_xattrs/ filestore_max_inline_xattr_size. I any case, later two will override everything. Thanks Regards Somnath -Original Message- From: Christian Balzer [mailto:ch...@gol.com] Sent: Wednesday, July 01, 2015 5:26 PM To: Ceph Users Cc: Somnath Roy Subject: Re: [ceph-users] xattrs vs omap Hello, On Wed, 1 Jul 2015 15:24:13 + Somnath Roy wrote: It doesn't matter, I think filestore_xattr_use_omap is a 'noop' and not used in the Hammer. Then what was this functionality replaced with, esp. considering EXT4 based OSDs? Chibi Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Adam Tygart Sent: Wednesday, July 01, 2015 8:20 AM To: Ceph Users Subject: [ceph-users] xattrs vs omap Hello all, I've got a coworker who put filestore_xattr_use_omap = true in the ceph.conf when we first started building the cluster. Now he can't remember why. He thinks it may be a holdover from our first Ceph cluster (running dumpling on ext4, iirc). In the newly built cluster, we are using XFS with 2048 byte inodes, running Ceph 0.94.2. It currently has production data in it. From my reading of other threads, it looks like this is probably not something you want set to true (at least on XFS), due to performance implications. Is this something you can change on a running cluster? Is it worth the hassle? Thanks, Adam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -- Christian BalzerNetwork/Systems Engineer ch...@gol.comGlobal OnLine Japan/Fusion
[ceph-users] xattrs vs omap
Hello all, I've got a coworker who put filestore_xattr_use_omap = true in the ceph.conf when we first started building the cluster. Now he can't remember why. He thinks it may be a holdover from our first Ceph cluster (running dumpling on ext4, iirc). In the newly built cluster, we are using XFS with 2048 byte inodes, running Ceph 0.94.2. It currently has production data in it. From my reading of other threads, it looks like this is probably not something you want set to true (at least on XFS), due to performance implications. Is this something you can change on a running cluster? Is it worth the hassle? Thanks, Adam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] xattrs vs omap
It doesn't matter, I think filestore_xattr_use_omap is a 'noop' and not used in the Hammer. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Adam Tygart Sent: Wednesday, July 01, 2015 8:20 AM To: Ceph Users Subject: [ceph-users] xattrs vs omap Hello all, I've got a coworker who put filestore_xattr_use_omap = true in the ceph.conf when we first started building the cluster. Now he can't remember why. He thinks it may be a holdover from our first Ceph cluster (running dumpling on ext4, iirc). In the newly built cluster, we are using XFS with 2048 byte inodes, running Ceph 0.94.2. It currently has production data in it. From my reading of other threads, it looks like this is probably not something you want set to true (at least on XFS), due to performance implications. Is this something you can change on a running cluster? Is it worth the hassle? Thanks, Adam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] xattrs vs omap
Hello, On Wed, 1 Jul 2015 15:24:13 + Somnath Roy wrote: It doesn't matter, I think filestore_xattr_use_omap is a 'noop' and not used in the Hammer. Then what was this functionality replaced with, esp. considering EXT4 based OSDs? Chibi Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Adam Tygart Sent: Wednesday, July 01, 2015 8:20 AM To: Ceph Users Subject: [ceph-users] xattrs vs omap Hello all, I've got a coworker who put filestore_xattr_use_omap = true in the ceph.conf when we first started building the cluster. Now he can't remember why. He thinks it may be a holdover from our first Ceph cluster (running dumpling on ext4, iirc). In the newly built cluster, we are using XFS with 2048 byte inodes, running Ceph 0.94.2. It currently has production data in it. From my reading of other threads, it looks like this is probably not something you want set to true (at least on XFS), due to performance implications. Is this something you can change on a running cluster? Is it worth the hassle? Thanks, Adam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] xattrs vs. omap with radosgw
We've since merged something that stripes over several small xattrs so that we can keep things inline, but it hasn't been backported to hammer yet. See c6cdb4081e366f471b372102905a1192910ab2da. Hi Sage: You wrote yet - should we earmark it for hammer backport? Nathan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] xattrs vs. omap with radosgw
On Wed, Jun 17, 2015 at 1:02 PM, Nathan Cutler ncut...@suse.cz wrote: We've since merged something that stripes over several small xattrs so that we can keep things inline, but it hasn't been backported to hammer yet. See c6cdb4081e366f471b372102905a1192910ab2da. Hi Sage: You wrote yet - should we earmark it for hammer backport? I'm guessing https://github.com/ceph/ceph/pull/4973 is the backport for hammer (issue http://tracker.ceph.com/issues/11981) Regards Abhishek ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] xattrs vs. omap with radosgw
On Wed, 17 Jun 2015, Nathan Cutler wrote: We've since merged something that stripes over several small xattrs so that we can keep things inline, but it hasn't been backported to hammer yet. See c6cdb4081e366f471b372102905a1192910ab2da. Hi Sage: You wrote yet - should we earmark it for hammer backport? Yes, please! sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] xattrs vs. omap with radosgw
Hi Cephers, While looking at disk utilization on OSD, I noticed the disk was constantly busy with large number of small writes, further investigation showed that, as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which made the xattrs get from local to extents, which incurred extra I/O. I would like to check if anybody has experience with offloading the metadata to omap: 1 Offload everything to omap? If this is the case, should we make the inode size as 512 (instead of 2k)? 2 Partial offload the metadata to omap, e.g. only offloading the rgw specified metadata to omap. Any sharing is deeply appreciated. Thanks! Thanks, Guang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] xattrs vs. omap with radosgw
Guang, Try to play around with the following conf attributes specially filestore_max_inline_xattr_size and filestore_max_inline_xattrs // Use omap for xattrs for attrs over // filestore_max_inline_xattr_size or OPTION(filestore_max_inline_xattr_size, OPT_U32, 0) //Override OPTION(filestore_max_inline_xattr_size_xfs, OPT_U32, 65536) OPTION(filestore_max_inline_xattr_size_btrfs, OPT_U32, 2048) OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512) // for more than filestore_max_inline_xattrs attrs OPTION(filestore_max_inline_xattrs, OPT_U32, 0) //Override OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10) OPTION(filestore_max_inline_xattrs_btrfs, OPT_U32, 10) OPTION(filestore_max_inline_xattrs_other, OPT_U32, 2) I think the behavior for XFS is if the xttrs are more than 10, it will use OMAP. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of GuangYang Sent: Tuesday, June 16, 2015 11:31 AM To: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com Subject: [ceph-users] xattrs vs. omap with radosgw Hi Cephers, While looking at disk utilization on OSD, I noticed the disk was constantly busy with large number of small writes, further investigation showed that, as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which made the xattrs get from local to extents, which incurred extra I/O. I would like to check if anybody has experience with offloading the metadata to omap: 1 Offload everything to omap? If this is the case, should we make the inode size as 512 (instead of 2k)? 2 Partial offload the metadata to omap, e.g. only offloading the rgw specified metadata to omap. Any sharing is deeply appreciated. Thanks! Thanks, Guang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] xattrs vs. omap with radosgw
On Wed, 17 Jun 2015, Zhou, Yuan wrote: FWIW, there was some discussion in OpenStack Swift and their performance tests showed 255 is not the best in recent XFS. They decided to use large xattr boundary size(65535). https://gist.github.com/smerritt/5e7e650abaa20599ff34 If I read this correctly the total metadata they are setting is pretty big: PILE_O_METADATA = pickle.dumps(dict( (attribute%d % i, hashlib.sha512(thingy %d % i).hexdigest()) for i in range(200))) So lots of small attrs won't really help since they'll have to spill out into other extents eventually no matter what. In our case, we have big (2k) inodes and can easily fit everything in there.. as long as it is in 255 byte pieces. sage -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil Sent: Wednesday, June 17, 2015 3:43 AM To: GuangYang Cc: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com Subject: Re: xattrs vs. omap with radosgw On Tue, 16 Jun 2015, GuangYang wrote: Hi Cephers, While looking at disk utilization on OSD, I noticed the disk was constantly busy with large number of small writes, further investigation showed that, as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which made the xattrs get from local to extents, which incurred extra I/O. I would like to check if anybody has experience with offloading the metadata to omap: 1 Offload everything to omap? If this is the case, should we make the inode size as 512 (instead of 2k)? 2 Partial offload the metadata to omap, e.g. only offloading the rgw specified metadata to omap. Any sharing is deeply appreciated. Thanks! Hi Guang, Is this hammer or firefly? With hammer the size of object_info_t crossed the 255 byte boundary, which is the max xattr value that XFS can inline. We've since merged something that stripes over several small xattrs so that we can keep things inline, but it hasn't been backported to hammer yet. See c6cdb4081e366f471b372102905a1192910ab2da. Perhaps this is what you're seeing? I think we're still better off with larger XFS inodes and inline xattrs if it means we avoid leveldb at all for most objects. sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] xattrs vs. omap with radosgw
After back-porting Sage's patch to Giant, with radosgw, the xattrs can get inline. I haven't run extensive testing yet, will update once I have some performance data to share. Thanks, Guang Date: Tue, 16 Jun 2015 15:51:44 -0500 From: mnel...@redhat.com To: yguan...@outlook.com; s...@newdream.net CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com Subject: Re: xattrs vs. omap with radosgw On 06/16/2015 03:48 PM, GuangYang wrote: Thanks Sage for the quick response. It is on Firefly v0.80.4. While trying to put with *rados* directly, the xattrs can be inline. The problem comes to light when using radosgw, since we have a bunch of metadata to keep via xattrs, including: rgw.idtag : 15 bytes rgw.manifest : 381 bytes Ah, that manifest will push us over the limit afaik resulting in every inode getting a new extent. rgw.acl : 121 bytes rgw.etag : 33 bytes Given the background, it looks like the problem is that the rgw.manifest is too large so that XFS make it extents. If I understand correctly, if we port the change to Firefly, we should be able to inline the inode since the accumulated size is still less than 2K (please correct me if I am wrong here). I think you are correct so long as the patch breaks that manifest down into 254 byte or smaller chunks. Thanks, Guang Date: Tue, 16 Jun 2015 12:43:08 -0700 From: s...@newdream.net To: yguan...@outlook.com CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com Subject: Re: xattrs vs. omap with radosgw On Tue, 16 Jun 2015, GuangYang wrote: Hi Cephers, While looking at disk utilization on OSD, I noticed the disk was constantly busy with large number of small writes, further investigation showed that, as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which made the xattrs get from local to extents, which incurred extra I/O. I would like to check if anybody has experience with offloading the metadata to omap: 1 Offload everything to omap? If this is the case, should we make the inode size as 512 (instead of 2k)? 2 Partial offload the metadata to omap, e.g. only offloading the rgw specified metadata to omap. Any sharing is deeply appreciated. Thanks! Hi Guang, Is this hammer or firefly? With hammer the size of object_info_t crossed the 255 byte boundary, which is the max xattr value that XFS can inline. We've since merged something that stripes over several small xattrs so that we can keep things inline, but it hasn't been backported to hammer yet. See c6cdb4081e366f471b372102905a1192910ab2da. Perhaps this is what you're seeing? I think we're still better off with larger XFS inodes and inline xattrs if it means we avoid leveldb at all for most objects. sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] xattrs vs. omap with radosgw
Hi Yuan, Thanks for sharing the link, it is interesting to read. My understanding of the test results, is that with a fixed size of xattrs, using smaller stripe size will incur larger latency for read, which kind of makes sense since there are more k-v pairs, and with the size, it needs to get extents anyway. Correct me if I am wrong here... Thanks, Guang From: yuan.z...@intel.com To: s...@newdream.net; yguan...@outlook.com CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com Subject: RE: xattrs vs. omap with radosgw Date: Wed, 17 Jun 2015 01:32:35 + FWIW, there was some discussion in OpenStack Swift and their performance tests showed 255 is not the best in recent XFS. They decided to use large xattr boundary size(65535). https://gist.github.com/smerritt/5e7e650abaa20599ff34 -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil Sent: Wednesday, June 17, 2015 3:43 AM To: GuangYang Cc: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com Subject: Re: xattrs vs. omap with radosgw On Tue, 16 Jun 2015, GuangYang wrote: Hi Cephers, While looking at disk utilization on OSD, I noticed the disk was constantly busy with large number of small writes, further investigation showed that, as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which made the xattrs get from local to extents, which incurred extra I/O. I would like to check if anybody has experience with offloading the metadata to omap: 1 Offload everything to omap? If this is the case, should we make the inode size as 512 (instead of 2k)? 2 Partial offload the metadata to omap, e.g. only offloading the rgw specified metadata to omap. Any sharing is deeply appreciated. Thanks! Hi Guang, Is this hammer or firefly? With hammer the size of object_info_t crossed the 255 byte boundary, which is the max xattr value that XFS can inline. We've since merged something that stripes over several small xattrs so that we can keep things inline, but it hasn't been backported to hammer yet. See c6cdb4081e366f471b372102905a1192910ab2da. Perhaps this is what you're seeing? I think we're still better off with larger XFS inodes and inline xattrs if it means we avoid leveldb at all for most objects. sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com