Re: [ceph-users] xattrs vs omap

2015-07-14 Thread Gregory Farnum
On Tue, Jul 14, 2015 at 10:53 AM, Jan Schermer j...@schermer.cz wrote:
 Thank you for your reply.
 Comments inline.

 I’m still hoping to get some more input, but there are many people running 
 ceph on ext4, and it sounds like it works pretty good out of the box. Maybe 
 I’m overthinking this, then?

I think so — somebody did a lot of work making sure we were well-tuned
on the standard filesystems; I believe it was David.
-Greg


 Jan

 On 13 Jul 2015, at 21:04, Somnath Roy somnath@sandisk.com wrote:

 inline

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan 
 Schermer
 Sent: Monday, July 13, 2015 2:32 AM
 To: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] xattrs vs omap

 Sorry for reviving an old thread, but could I get some input on this, pretty 
 please?

 ext4 has 256-byte inodes by default (at least according to docs) but the 
 fragment below says:
 OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512)

 The default 512b is too much if the inode is just 256b, so shouldn’t that be 
 256b in case people use the default ext4 inode size?

 Anyway, is it better to format ext4 with larger inodes (say 2048b) and set 
 filestore_max_inline_xattr_size_other=1536, or leave it at defaults?
 [Somnath] Why 1536 ? why not 1024 or any power of 2 ? I am not seeing any 
 harm though, but, curious.

 AFAIK there is other information in the inode other than xattrs, also you 
 need to count the xattra labels into this - so if I want to store 1536B of 
 “values” it would cost more, and there still needs to be some space left.

 (As I understand it, on ext4 xattrs ale limited to one block, inode size + 
 something can spill to one different inode - maybe someone knows better).


 [Somnath] The xttr size (_) is now more than 256 bytes and it will spill 
 over, so, bigger inode  size will be good. But, I would suggest do your 
 benchmark before putting it into production.


 Good poin and I am going to do that, but I’d like to avoid the guesswork. 
 Also, not all patterns are always replicable….

 Is filestore_max_inline_xattr_size and absolute limit, or is it 
 filestore_max_inline_xattr_size*filestore_max_inline_xattrs in reality?

 [Somnath] The *_size is tracking the xttr size per attribute and 
 *inline_xattrs keep track of max number of inline attributes allowed. So, if 
 a xattr size is  *_size , it will go to omap and also if the total number 
 of xattra  *inline_xattrs , it will go to omap.
 If you are only using rbd, the number of inline xattrs will be always 2 and 
 it will not cross that default max limit.

 If I’m reading this correctly then with my setting of  
 filestore_max_inline_xattr_size_other=1536, it could actually consume 3072B 
 (2 xattrs), so I should in reality use 4K inodes…?



 Does OSD do the sane thing if for some reason the xattrs do not fit? What 
 are the performance implications of storing the xattrs in leveldb?

 [Somnath] Even though I don't have the exact numbers, but, it has a 
 significant overhead if the xattrs go to leveldb.

 And lastly - what size of xattrs should I really expect if all I use is RBD 
 for OpenStack instances? (No radosgw, no cephfs, but heavy on rbd image and 
 pool snapshots). This overhead is quite large

 [Somnath] It will be 2 xattrs, default _ will be little bigger than 256 
 bytes and _snapset is small depends on number of snaps/clones, but 
 unlikely will cross 256 bytes range.

 I have few pool snapshots and lots (hundreds) of (nested) snapshots for rbd 
 volumes. Does this come into play somehow?


 My plan so far is to format the drives like this:
 mkfs.ext4 -I 2048 -b 4096 -i 524288 -E stride=32,stripe-width=256 (2048b 
 inode, 4096b block size, one inode for 512k of space and set  
 filestore_max_inline_xattr_size_other=1536
 [Somnath] Not much idea on ext4, sorry..

 Does that make sense?

 Thanks!

 Jan



 On 02 Jul 2015, at 12:18, Jan Schermer j...@schermer.cz wrote:

 Does anyone have a known-good set of parameters for ext4? I want to try it 
 as well but I’m a bit worried what happnes if I get it wrong.

 Thanks

 Jan



 On 02 Jul 2015, at 09:40, Nick Fisk n...@fisk.me.uk wrote:

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
 Behalf Of Christian Balzer
 Sent: 02 July 2015 02:23
 To: Ceph Users
 Subject: Re: [ceph-users] xattrs vs omap

 On Thu, 2 Jul 2015 00:36:18 + Somnath Roy wrote:

 It is replaced with the following config option..

 // Use omap for xattrs for attrs over //
 filestore_max_inline_xattr_size or
 OPTION(filestore_max_inline_xattr_size, OPT_U32, 0) //Override
 OPTION(filestore_max_inline_xattr_size_xfs, OPT_U32, 65536)
 OPTION(filestore_max_inline_xattr_size_btrfs, OPT_U32, 2048)
 OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512)

 // for more than filestore_max_inline_xattrs attrs
 OPTION(filestore_max_inline_xattrs, OPT_U32, 0) //Override
 OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10

Re: [ceph-users] xattrs vs omap

2015-07-14 Thread Jan Schermer
Thank you for your reply.
Comments inline.

I’m still hoping to get some more input, but there are many people running ceph 
on ext4, and it sounds like it works pretty good out of the box. Maybe I’m 
overthinking this, then?

Jan

 On 13 Jul 2015, at 21:04, Somnath Roy somnath@sandisk.com wrote:
 
 inline
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan 
 Schermer
 Sent: Monday, July 13, 2015 2:32 AM
 To: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] xattrs vs omap
 
 Sorry for reviving an old thread, but could I get some input on this, pretty 
 please?
 
 ext4 has 256-byte inodes by default (at least according to docs) but the 
 fragment below says:
 OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512)
 
 The default 512b is too much if the inode is just 256b, so shouldn’t that be 
 256b in case people use the default ext4 inode size?
 
 Anyway, is it better to format ext4 with larger inodes (say 2048b) and set 
 filestore_max_inline_xattr_size_other=1536, or leave it at defaults?
 [Somnath] Why 1536 ? why not 1024 or any power of 2 ? I am not seeing any 
 harm though, but, curious.

AFAIK there is other information in the inode other than xattrs, also you need 
to count the xattra labels into this - so if I want to store 1536B of “values” 
it would cost more, and there still needs to be some space left.

 (As I understand it, on ext4 xattrs ale limited to one block, inode size + 
 something can spill to one different inode - maybe someone knows better).
 
 
 [Somnath] The xttr size (_) is now more than 256 bytes and it will spill 
 over, so, bigger inode  size will be good. But, I would suggest do your 
 benchmark before putting it into production.
 

Good poin and I am going to do that, but I’d like to avoid the guesswork. Also, 
not all patterns are always replicable….

 Is filestore_max_inline_xattr_size and absolute limit, or is it 
 filestore_max_inline_xattr_size*filestore_max_inline_xattrs in reality?
 
 [Somnath] The *_size is tracking the xttr size per attribute and 
 *inline_xattrs keep track of max number of inline attributes allowed. So, if 
 a xattr size is  *_size , it will go to omap and also if the total number of 
 xattra  *inline_xattrs , it will go to omap.
 If you are only using rbd, the number of inline xattrs will be always 2 and 
 it will not cross that default max limit.

If I’m reading this correctly then with my setting of  
filestore_max_inline_xattr_size_other=1536, it could actually consume 3072B (2 
xattrs), so I should in reality use 4K inodes…?


 
 Does OSD do the sane thing if for some reason the xattrs do not fit? What are 
 the performance implications of storing the xattrs in leveldb?
 
 [Somnath] Even though I don't have the exact numbers, but, it has a 
 significant overhead if the xattrs go to leveldb.
 
 And lastly - what size of xattrs should I really expect if all I use is RBD 
 for OpenStack instances? (No radosgw, no cephfs, but heavy on rbd image and 
 pool snapshots). This overhead is quite large
 
 [Somnath] It will be 2 xattrs, default _ will be little bigger than 256 
 bytes and _snapset is small depends on number of snaps/clones, but unlikely 
 will cross 256 bytes range.

I have few pool snapshots and lots (hundreds) of (nested) snapshots for rbd 
volumes. Does this come into play somehow?

 
 My plan so far is to format the drives like this:
 mkfs.ext4 -I 2048 -b 4096 -i 524288 -E stride=32,stripe-width=256 (2048b 
 inode, 4096b block size, one inode for 512k of space and set  
 filestore_max_inline_xattr_size_other=1536
 [Somnath] Not much idea on ext4, sorry..
 
 Does that make sense?
 
 Thanks!
 
 Jan
 
 
 
 On 02 Jul 2015, at 12:18, Jan Schermer j...@schermer.cz wrote:
 
 Does anyone have a known-good set of parameters for ext4? I want to try it 
 as well but I’m a bit worried what happnes if I get it wrong.
 
 Thanks
 
 Jan
 
 
 
 On 02 Jul 2015, at 09:40, Nick Fisk n...@fisk.me.uk wrote:
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
 Behalf Of Christian Balzer
 Sent: 02 July 2015 02:23
 To: Ceph Users
 Subject: Re: [ceph-users] xattrs vs omap
 
 On Thu, 2 Jul 2015 00:36:18 + Somnath Roy wrote:
 
 It is replaced with the following config option..
 
 // Use omap for xattrs for attrs over //
 filestore_max_inline_xattr_size or
 OPTION(filestore_max_inline_xattr_size, OPT_U32, 0) //Override
 OPTION(filestore_max_inline_xattr_size_xfs, OPT_U32, 65536)
 OPTION(filestore_max_inline_xattr_size_btrfs, OPT_U32, 2048)
 OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512)
 
 // for more than filestore_max_inline_xattrs attrs
 OPTION(filestore_max_inline_xattrs, OPT_U32, 0) //Override
 OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10)
 OPTION(filestore_max_inline_xattrs_btrfs, OPT_U32, 10)
 OPTION(filestore_max_inline_xattrs_other, OPT_U32, 2)
 
 
 If these limits crossed, xattrs will be stored in omap..
 
 Sounds

Re: [ceph-users] xattrs vs omap

2015-07-14 Thread Jan Schermer
Instead of guessing I took a look at one of my OSDs.

TL;DR: I’m going to bump the inode size to 512 which should fit majority of 
xattrs, no need to touch filestore parameters.

Short news first - I can’t find a file with more than 2 xattrs. (and that’s 
good)

Then I extracted all the xattrs on all the ~100K files, counted their size and 
counted the occurences.

The largest xattrs I have are 705 chars in base64 (so let’s say it’s half), and 
that particular file has about 512B total in xattr (that’s more than was 
expected with RBD-only workload, right?)

# file: 
var/lib/ceph/osd/ceph-55//current/4.1ad7_head/rbd134udata.1a785181f15746a.0005a578__head_E5C51AD7__4
 117
user.ceph._=0sCwjyBANKACkAAAByYmRfZGF0YS4xYTc4NTE4MWYxNTc0NmEuMDAwMDAwMDAwMDA1YTU3OP7/1xrF5QAABAAFAxQEAP8AAADrEKMAADB2DQAiDaMA
AG11DQACAhUI1xSoAQD9CwAMAEAAABAgpFWoa6QVAgIV6xCjAAAwdg0=
 347
user.ceph.snapset=0sAgL5AQAAgt8HAAABBgAAAILfBwAAb94HAAC23AcAAEnPBwAA470HAAB4ugcAAAQAAAC1ugcAAOO9BwAAStAHAACC3wcAAAQAAAC1ugcAAAQAAABQFGAUwAowHwAAAJAZ4DggBwAA470HAAAFEA8gDwAAACAFSBQAAABADgAAAJAioAI4JQAAAMgaAABK0AcAAAQAAADgAQAAAOgBeCYAAACAKHAAACkAFwAAgt8HAAAFoAEAAADAAQAAAIAMUA4QBgAAAIAU4ACAFQAAAIAqAAAEtboHAEAAAOO9BwBAAABK0AcAQAAAgt8HAE==
 705

(If anyone wants to enlighten me on the contents that would be great - is this 
expected to grow much?)


BUT most of the files have much smaller xattrs, and if I researched it 
correctly it seems ext4 uses free space in inode (which should be something 
like inode_size-128-28=free) and if that’s not enough it will allocate one more 
block.

In other words, if I format ext4 with 2048 inode size and 4096 block size, 
there will be 2048-(128+28)=1892 bytes available in the inode, and 4096 bytes 
can be allocated  from another block. With default format, there will be just 
256-(128+28)=100 bytes in the inode + 4096 bytes in another block.


In my case, majority of the files have xattr size 200B, which is larger than 
fits inside one inode, but not really that large, so it should be beneficial to 
bump the inode size to 512B (that leaves plenty of 356 bytes for xattrs).

Jan


 On 14 Jul 2015, at 12:18, Gregory Farnum g...@gregs42.com wrote:
 
 On Tue, Jul 14, 2015 at 10:53 AM, Jan Schermer j...@schermer.cz wrote:
 Thank you for your reply.
 Comments inline.
 
 I’m still hoping to get some more input, but there are many people running 
 ceph on ext4, and it sounds like it works pretty good out of the box. Maybe 
 I’m overthinking this, then?
 
 I think so — somebody did a lot of work making sure we were well-tuned
 on the standard filesystems; I believe it was David.
 -Greg
 
 
 Jan
 
 On 13 Jul 2015, at 21:04, Somnath Roy somnath@sandisk.com wrote:
 
 inline
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
 Jan Schermer
 Sent: Monday, July 13, 2015 2:32 AM
 To: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] xattrs vs omap
 
 Sorry for reviving an old thread, but could I get some input on this, 
 pretty please?
 
 ext4 has 256-byte inodes by default (at least according to docs) but the 
 fragment below says:
 OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512)
 
 The default 512b is too much if the inode is just 256b, so shouldn’t that 
 be 256b in case people use the default ext4 inode size?
 
 Anyway, is it better to format ext4 with larger inodes (say 2048b) and set 
 filestore_max_inline_xattr_size_other=1536, or leave it at defaults?
 [Somnath] Why 1536 ? why not 1024 or any power of 2 ? I am not seeing any 
 harm though, but, curious.
 
 AFAIK there is other information in the inode other than xattrs, also you 
 need to count the xattra labels into this - so if I want to store 1536B of 
 “values” it would cost more, and there still needs to be some space left.
 
 (As I understand it, on ext4 xattrs ale limited to one block, inode size + 
 something can spill to one different inode - maybe someone knows better).
 
 
 [Somnath] The xttr size (_) is now more than 256 bytes and it will spill 
 over, so, bigger inode  size will be good. But, I would suggest do your 
 benchmark before putting it into production.
 
 
 Good poin and I am going to do that, but I’d like to avoid the guesswork. 
 Also, not all patterns are always replicable….
 
 Is filestore_max_inline_xattr_size and absolute limit, or is it 
 filestore_max_inline_xattr_size*filestore_max_inline_xattrs in reality?
 
 [Somnath] The *_size is tracking

Re: [ceph-users] xattrs vs omap

2015-07-13 Thread Jan Schermer
Sorry for reviving an old thread, but could I get some input on this, pretty 
please?

ext4 has 256-byte inodes by default (at least according to docs)
but the fragment below says:
OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512)

The default 512b is too much if the inode is just 256b, so shouldn’t that be 
256b in case people use the default ext4 inode size?

Anyway, is it better to format ext4 with larger inodes (say 2048b) and set 
filestore_max_inline_xattr_size_other=1536, or leave it at defaults?
(As I understand it, on ext4 xattrs ale limited to one block, inode size + 
something can spill to one different inode - maybe someone knows better).

Is filestore_max_inline_xattr_size and absolute limit, or is it 
filestore_max_inline_xattr_size*filestore_max_inline_xattrs in reality?

Does OSD do the sane thing if for some reason the xattrs do not fit? What are 
the performance implications of storing the xattrs in leveldb?

And lastly - what size of xattrs should I really expect if all I use is RBD for 
OpenStack instances? (No radosgw, no cephfs, but heavy on rbd image and pool 
snapshots). This overhead is quite large

My plan so far is to format the drives like this:
mkfs.ext4 -I 2048 -b 4096 -i 524288 -E stride=32,stripe-width=256
(2048b inode, 4096b block size, one inode for 512k of space 
and set  filestore_max_inline_xattr_size_other=1536

Does that make sense?

Thanks!

Jan



 On 02 Jul 2015, at 12:18, Jan Schermer j...@schermer.cz wrote:
 
 Does anyone have a known-good set of parameters for ext4? I want to try it as 
 well but I’m a bit worried what happnes if I get it wrong.
 
 Thanks
 
 Jan
 
 
 
 On 02 Jul 2015, at 09:40, Nick Fisk n...@fisk.me.uk wrote:
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Christian Balzer
 Sent: 02 July 2015 02:23
 To: Ceph Users
 Subject: Re: [ceph-users] xattrs vs omap
 
 On Thu, 2 Jul 2015 00:36:18 + Somnath Roy wrote:
 
 It is replaced with the following config option..
 
 // Use omap for xattrs for attrs over
 // filestore_max_inline_xattr_size or
 OPTION(filestore_max_inline_xattr_size, OPT_U32, 0) //Override
 OPTION(filestore_max_inline_xattr_size_xfs, OPT_U32, 65536)
 OPTION(filestore_max_inline_xattr_size_btrfs, OPT_U32, 2048)
 OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512)
 
 // for more than filestore_max_inline_xattrs attrs
 OPTION(filestore_max_inline_xattrs, OPT_U32, 0) //Override
 OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10)
 OPTION(filestore_max_inline_xattrs_btrfs, OPT_U32, 10)
 OPTION(filestore_max_inline_xattrs_other, OPT_U32, 2)
 
 
 If these limits crossed, xattrs will be stored in omap..
 
 Sounds fair.
 
 Since I only use RBD I don't think it will ever exceed this.
 
 Possibly, see my thread  about performance difference between new and old
 pools. Still not quite sure what's going on, but for some reasons some of
 the objects behind RBD's have larger xattrs which is causing really poor
 performance.
 
 
 Thanks,
 
 Chibi
 For ext4, you can use either filestore_max*_other or
 filestore_max_inline_xattrs/ filestore_max_inline_xattr_size. I any
 case, later two will override everything.
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: Christian Balzer [mailto:ch...@gol.com]
 Sent: Wednesday, July 01, 2015 5:26 PM
 To: Ceph Users
 Cc: Somnath Roy
 Subject: Re: [ceph-users] xattrs vs omap
 
 
 Hello,
 
 On Wed, 1 Jul 2015 15:24:13 + Somnath Roy wrote:
 
 It doesn't matter, I think filestore_xattr_use_omap is a 'noop'  and
 not used in the Hammer.
 
 Then what was this functionality replaced with, esp. considering EXT4
 based OSDs?
 
 Chibi
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
 Behalf Of Adam Tygart Sent: Wednesday, July 01, 2015 8:20 AM
 To: Ceph Users
 Subject: [ceph-users] xattrs vs omap
 
 Hello all,
 
 I've got a coworker who put filestore_xattr_use_omap = true in the
 ceph.conf when we first started building the cluster. Now he can't
 remember why. He thinks it may be a holdover from our first Ceph
 cluster (running dumpling on ext4, iirc).
 
 In the newly built cluster, we are using XFS with 2048 byte inodes,
 running Ceph 0.94.2. It currently has production data in it.
 
 From my reading of other threads, it looks like this is probably not
 something you want set to true (at least on XFS), due to performance
 implications. Is this something you can change on a running cluster?
 Is it worth the hassle?
 
 Thanks,
 Adam
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 PLEASE NOTE: The information contained in this electronic mail
 message is intended only for the use of the designated recipient(s)
 named above. If the reader of this message is not the intended
 recipient, you are hereby notified

Re: [ceph-users] xattrs vs omap

2015-07-13 Thread Somnath Roy
inline

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan 
Schermer
Sent: Monday, July 13, 2015 2:32 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] xattrs vs omap

Sorry for reviving an old thread, but could I get some input on this, pretty 
please?

ext4 has 256-byte inodes by default (at least according to docs) but the 
fragment below says:
OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512)

The default 512b is too much if the inode is just 256b, so shouldn’t that be 
256b in case people use the default ext4 inode size?

Anyway, is it better to format ext4 with larger inodes (say 2048b) and set 
filestore_max_inline_xattr_size_other=1536, or leave it at defaults?
[Somnath] Why 1536 ? why not 1024 or any power of 2 ? I am not seeing any harm 
though, but, curious.
(As I understand it, on ext4 xattrs ale limited to one block, inode size + 
something can spill to one different inode - maybe someone knows better).


[Somnath] The xttr size (_) is now more than 256 bytes and it will spill 
over, so, bigger inode  size will be good. But, I would suggest do your 
benchmark before putting it into production.

Is filestore_max_inline_xattr_size and absolute limit, or is it 
filestore_max_inline_xattr_size*filestore_max_inline_xattrs in reality?

[Somnath] The *_size is tracking the xttr size per attribute and *inline_xattrs 
keep track of max number of inline attributes allowed. So, if a xattr size is  
*_size , it will go to omap and also if the total number of xattra  
*inline_xattrs , it will go to omap.
If you are only using rbd, the number of inline xattrs will be always 2 and it 
will not cross that default max limit.

Does OSD do the sane thing if for some reason the xattrs do not fit? What are 
the performance implications of storing the xattrs in leveldb?

[Somnath] Even though I don't have the exact numbers, but, it has a significant 
overhead if the xattrs go to leveldb.

And lastly - what size of xattrs should I really expect if all I use is RBD for 
OpenStack instances? (No radosgw, no cephfs, but heavy on rbd image and pool 
snapshots). This overhead is quite large

[Somnath] It will be 2 xattrs, default _ will be little bigger than 256 bytes 
and _snapset is small depends on number of snaps/clones, but unlikely will 
cross 256 bytes range.

My plan so far is to format the drives like this:
mkfs.ext4 -I 2048 -b 4096 -i 524288 -E stride=32,stripe-width=256 (2048b inode, 
4096b block size, one inode for 512k of space and set  
filestore_max_inline_xattr_size_other=1536
[Somnath] Not much idea on ext4, sorry..

Does that make sense?

Thanks!

Jan



 On 02 Jul 2015, at 12:18, Jan Schermer j...@schermer.cz wrote:

 Does anyone have a known-good set of parameters for ext4? I want to try it as 
 well but I’m a bit worried what happnes if I get it wrong.

 Thanks

 Jan



 On 02 Jul 2015, at 09:40, Nick Fisk n...@fisk.me.uk wrote:

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
 Behalf Of Christian Balzer
 Sent: 02 July 2015 02:23
 To: Ceph Users
 Subject: Re: [ceph-users] xattrs vs omap

 On Thu, 2 Jul 2015 00:36:18 + Somnath Roy wrote:

 It is replaced with the following config option..

 // Use omap for xattrs for attrs over //
 filestore_max_inline_xattr_size or
 OPTION(filestore_max_inline_xattr_size, OPT_U32, 0) //Override
 OPTION(filestore_max_inline_xattr_size_xfs, OPT_U32, 65536)
 OPTION(filestore_max_inline_xattr_size_btrfs, OPT_U32, 2048)
 OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512)

 // for more than filestore_max_inline_xattrs attrs
 OPTION(filestore_max_inline_xattrs, OPT_U32, 0) //Override
 OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10)
 OPTION(filestore_max_inline_xattrs_btrfs, OPT_U32, 10)
 OPTION(filestore_max_inline_xattrs_other, OPT_U32, 2)


 If these limits crossed, xattrs will be stored in omap..

 Sounds fair.

 Since I only use RBD I don't think it will ever exceed this.

 Possibly, see my thread  about performance difference between new and
 old pools. Still not quite sure what's going on, but for some reasons
 some of the objects behind RBD's have larger xattrs which is causing
 really poor performance.


 Thanks,

 Chibi
 For ext4, you can use either filestore_max*_other or
 filestore_max_inline_xattrs/ filestore_max_inline_xattr_size. I any
 case, later two will override everything.

 Thanks  Regards
 Somnath

 -Original Message-
 From: Christian Balzer [mailto:ch...@gol.com]
 Sent: Wednesday, July 01, 2015 5:26 PM
 To: Ceph Users
 Cc: Somnath Roy
 Subject: Re: [ceph-users] xattrs vs omap


 Hello,

 On Wed, 1 Jul 2015 15:24:13 + Somnath Roy wrote:

 It doesn't matter, I think filestore_xattr_use_omap is a 'noop'
 and not used in the Hammer.

 Then what was this functionality replaced with, esp. considering
 EXT4 based OSDs?

 Chibi
 Thanks  Regards
 Somnath

 -Original Message-
 From: ceph-users [mailto:ceph

Re: [ceph-users] xattrs vs omap

2015-07-02 Thread Jan Schermer
Does anyone have a known-good set of parameters for ext4? I want to try it as 
well but I’m a bit worried what happnes if I get it wrong.

Thanks

Jan



 On 02 Jul 2015, at 09:40, Nick Fisk n...@fisk.me.uk wrote:
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Christian Balzer
 Sent: 02 July 2015 02:23
 To: Ceph Users
 Subject: Re: [ceph-users] xattrs vs omap
 
 On Thu, 2 Jul 2015 00:36:18 + Somnath Roy wrote:
 
 It is replaced with the following config option..
 
 // Use omap for xattrs for attrs over
 // filestore_max_inline_xattr_size or
 OPTION(filestore_max_inline_xattr_size, OPT_U32, 0) //Override
 OPTION(filestore_max_inline_xattr_size_xfs, OPT_U32, 65536)
 OPTION(filestore_max_inline_xattr_size_btrfs, OPT_U32, 2048)
 OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512)
 
 // for more than filestore_max_inline_xattrs attrs
 OPTION(filestore_max_inline_xattrs, OPT_U32, 0) //Override
 OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10)
 OPTION(filestore_max_inline_xattrs_btrfs, OPT_U32, 10)
 OPTION(filestore_max_inline_xattrs_other, OPT_U32, 2)
 
 
 If these limits crossed, xattrs will be stored in omap..
 
 Sounds fair.
 
 Since I only use RBD I don't think it will ever exceed this.
 
 Possibly, see my thread  about performance difference between new and old
 pools. Still not quite sure what's going on, but for some reasons some of
 the objects behind RBD's have larger xattrs which is causing really poor
 performance.
 
 
 Thanks,
 
 Chibi
 For ext4, you can use either filestore_max*_other or
 filestore_max_inline_xattrs/ filestore_max_inline_xattr_size. I any
 case, later two will override everything.
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: Christian Balzer [mailto:ch...@gol.com]
 Sent: Wednesday, July 01, 2015 5:26 PM
 To: Ceph Users
 Cc: Somnath Roy
 Subject: Re: [ceph-users] xattrs vs omap
 
 
 Hello,
 
 On Wed, 1 Jul 2015 15:24:13 + Somnath Roy wrote:
 
 It doesn't matter, I think filestore_xattr_use_omap is a 'noop'  and
 not used in the Hammer.
 
 Then what was this functionality replaced with, esp. considering EXT4
 based OSDs?
 
 Chibi
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
 Behalf Of Adam Tygart Sent: Wednesday, July 01, 2015 8:20 AM
 To: Ceph Users
 Subject: [ceph-users] xattrs vs omap
 
 Hello all,
 
 I've got a coworker who put filestore_xattr_use_omap = true in the
 ceph.conf when we first started building the cluster. Now he can't
 remember why. He thinks it may be a holdover from our first Ceph
 cluster (running dumpling on ext4, iirc).
 
 In the newly built cluster, we are using XFS with 2048 byte inodes,
 running Ceph 0.94.2. It currently has production data in it.
 
 From my reading of other threads, it looks like this is probably not
 something you want set to true (at least on XFS), due to performance
 implications. Is this something you can change on a running cluster?
 Is it worth the hassle?
 
 Thanks,
 Adam
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 PLEASE NOTE: The information contained in this electronic mail
 message is intended only for the use of the designated recipient(s)
 named above. If the reader of this message is not the intended
 recipient, you are hereby notified that you have received this
 message in error and that any review, dissemination, distribution,
 or copying of this message is strictly prohibited. If you have
 received this communication in error, please notify the sender by
 telephone or e-mail (as shown above) immediately and destroy any and
 all copies of this message in your possession (whether hard copies
 or electronically stored copies).
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 --
 Christian BalzerNetwork/Systems Engineer
 ch...@gol.com   Global OnLine Japan/Fusion Communications
 http://www.gol.com/
 
 
 
 PLEASE NOTE: The information contained in this electronic mail message
 is intended only for the use of the designated recipient(s) named above.
 If the reader of this message is not the intended recipient, you are
 hereby notified that you have received this message in error and that
 any review, dissemination, distribution, or copying of this message is
 strictly prohibited. If you have received this communication in error,
 please notify the sender by telephone or e-mail (as shown above)
 immediately and destroy any and all copies of this message in your
 possession (whether hard copies or electronically stored copies).
 
 
 
 
 --
 Christian BalzerNetwork/Systems Engineer
 ch...@gol.comGlobal OnLine Japan/Fusion

[ceph-users] xattrs vs omap

2015-07-01 Thread Adam Tygart
Hello all,

I've got a coworker who put filestore_xattr_use_omap = true in the
ceph.conf when we first started building the cluster. Now he can't
remember why. He thinks it may be a holdover from our first Ceph
cluster (running dumpling on ext4, iirc).

In the newly built cluster, we are using XFS with 2048 byte inodes,
running Ceph 0.94.2. It currently has production data in it.

From my reading of other threads, it looks like this is probably not
something you want set to true (at least on XFS), due to performance
implications. Is this something you can change on a running cluster?
Is it worth the hassle?

Thanks,
Adam
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs omap

2015-07-01 Thread Somnath Roy
It doesn't matter, I think filestore_xattr_use_omap is a 'noop'  and not used 
in the Hammer.

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Adam 
Tygart
Sent: Wednesday, July 01, 2015 8:20 AM
To: Ceph Users
Subject: [ceph-users] xattrs vs omap

Hello all,

I've got a coworker who put filestore_xattr_use_omap = true in the ceph.conf 
when we first started building the cluster. Now he can't remember why. He 
thinks it may be a holdover from our first Ceph cluster (running dumpling on 
ext4, iirc).

In the newly built cluster, we are using XFS with 2048 byte inodes, running 
Ceph 0.94.2. It currently has production data in it.

From my reading of other threads, it looks like this is probably not something 
you want set to true (at least on XFS), due to performance implications. Is 
this something you can change on a running cluster?
Is it worth the hassle?

Thanks,
Adam
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs omap

2015-07-01 Thread Christian Balzer

Hello,

On Wed, 1 Jul 2015 15:24:13 + Somnath Roy wrote:

 It doesn't matter, I think filestore_xattr_use_omap is a 'noop'  and not
 used in the Hammer.
 
Then what was this functionality replaced with, esp. considering EXT4
based OSDs?

Chibi
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Adam Tygart Sent: Wednesday, July 01, 2015 8:20 AM
 To: Ceph Users
 Subject: [ceph-users] xattrs vs omap
 
 Hello all,
 
 I've got a coworker who put filestore_xattr_use_omap = true in the
 ceph.conf when we first started building the cluster. Now he can't
 remember why. He thinks it may be a holdover from our first Ceph cluster
 (running dumpling on ext4, iirc).
 
 In the newly built cluster, we are using XFS with 2048 byte inodes,
 running Ceph 0.94.2. It currently has production data in it.
 
 From my reading of other threads, it looks like this is probably not
 something you want set to true (at least on XFS), due to performance
 implications. Is this something you can change on a running cluster? Is
 it worth the hassle?
 
 Thanks,
 Adam
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 PLEASE NOTE: The information contained in this electronic mail message
 is intended only for the use of the designated recipient(s) named above.
 If the reader of this message is not the intended recipient, you are
 hereby notified that you have received this message in error and that
 any review, dissemination, distribution, or copying of this message is
 strictly prohibited. If you have received this communication in error,
 please notify the sender by telephone or e-mail (as shown above)
 immediately and destroy any and all copies of this message in your
 possession (whether hard copies or electronically stored copies).
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-17 Thread Nathan Cutler
 We've since merged something 
 that stripes over several small xattrs so that we can keep things inline, 
 but it hasn't been backported to hammer yet.  See
 c6cdb4081e366f471b372102905a1192910ab2da.

Hi Sage:

You wrote yet - should we earmark it for hammer backport?

Nathan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-17 Thread Abhishek L
On Wed, Jun 17, 2015 at 1:02 PM, Nathan Cutler ncut...@suse.cz wrote:
 We've since merged something
 that stripes over several small xattrs so that we can keep things inline,
 but it hasn't been backported to hammer yet.  See
 c6cdb4081e366f471b372102905a1192910ab2da.

 Hi Sage:

 You wrote yet - should we earmark it for hammer backport?

I'm guessing https://github.com/ceph/ceph/pull/4973 is the backport for hammer
(issue http://tracker.ceph.com/issues/11981)

Regards
Abhishek
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-17 Thread Sage Weil
On Wed, 17 Jun 2015, Nathan Cutler wrote:
  We've since merged something 
  that stripes over several small xattrs so that we can keep things inline, 
  but it hasn't been backported to hammer yet.  See
  c6cdb4081e366f471b372102905a1192910ab2da.
 
 Hi Sage:
 
 You wrote yet - should we earmark it for hammer backport?

Yes, please!

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] xattrs vs. omap with radosgw

2015-06-16 Thread GuangYang
Hi Cephers,
While looking at disk utilization on OSD, I noticed the disk was constantly 
busy with large number of small writes, further investigation showed that, as 
radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which 
made the xattrs get from local to extents, which incurred extra I/O.

I would like to check if anybody has experience with offloading the metadata to 
omap:
  1 Offload everything to omap? If this is the case, should we make the inode 
size as 512 (instead of 2k)?
  2 Partial offload the metadata to omap, e.g. only offloading the rgw 
specified metadata to omap.

Any sharing is deeply appreciated. Thanks!

Thanks,
Guang 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-16 Thread Somnath Roy
Guang,
Try to play around with the following conf attributes specially 
filestore_max_inline_xattr_size and filestore_max_inline_xattrs

// Use omap for xattrs for attrs over
// filestore_max_inline_xattr_size or
OPTION(filestore_max_inline_xattr_size, OPT_U32, 0) //Override
OPTION(filestore_max_inline_xattr_size_xfs, OPT_U32, 65536)
OPTION(filestore_max_inline_xattr_size_btrfs, OPT_U32, 2048)
OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512)

// for more than filestore_max_inline_xattrs attrs
OPTION(filestore_max_inline_xattrs, OPT_U32, 0) //Override
OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10)
OPTION(filestore_max_inline_xattrs_btrfs, OPT_U32, 10)
OPTION(filestore_max_inline_xattrs_other, OPT_U32, 2)

I think the behavior for XFS is if the xttrs are more than 10, it will use OMAP.

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
GuangYang
Sent: Tuesday, June 16, 2015 11:31 AM
To: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
Subject: [ceph-users] xattrs vs. omap with radosgw

Hi Cephers,
While looking at disk utilization on OSD, I noticed the disk was constantly 
busy with large number of small writes, further investigation showed that, as 
radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which 
made the xattrs get from local to extents, which incurred extra I/O.

I would like to check if anybody has experience with offloading the metadata to 
omap:
  1 Offload everything to omap? If this is the case, should we make the inode 
size as 512 (instead of 2k)?
  2 Partial offload the metadata to omap, e.g. only offloading the rgw 
specified metadata to omap.

Any sharing is deeply appreciated. Thanks!

Thanks,
Guang
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-16 Thread Sage Weil
On Wed, 17 Jun 2015, Zhou, Yuan wrote:
 FWIW, there was some discussion in OpenStack Swift and their performance 
 tests showed 255 is not the best in recent XFS. They decided to use large 
 xattr boundary size(65535).
 
 https://gist.github.com/smerritt/5e7e650abaa20599ff34

If I read this correctly the total metadata they are setting is pretty 
big:

PILE_O_METADATA = pickle.dumps(dict(
(attribute%d % i, hashlib.sha512(thingy %d % i).hexdigest())
for i in range(200)))

So lots of small attrs won't really help since they'll have to spill out 
into other extents eventually no matter what.

In our case, we have big (2k) inodes and can easily fit everything in 
there.. as long as it is in 255 byte pieces.

sage


 
 
 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org 
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil
 Sent: Wednesday, June 17, 2015 3:43 AM
 To: GuangYang
 Cc: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
 Subject: Re: xattrs vs. omap with radosgw
 
 On Tue, 16 Jun 2015, GuangYang wrote:
  Hi Cephers,
  While looking at disk utilization on OSD, I noticed the disk was constantly 
  busy with large number of small writes, further investigation showed that, 
  as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), 
  which made the xattrs get from local to extents, which incurred extra I/O.
  
  I would like to check if anybody has experience with offloading the 
  metadata to omap:
    1 Offload everything to omap? If this is the case, should we make the 
  inode size as 512 (instead of 2k)?
    2 Partial offload the metadata to omap, e.g. only offloading the rgw 
  specified metadata to omap.
  
  Any sharing is deeply appreciated. Thanks!
 
 Hi Guang,
 
 Is this hammer or firefly?
 
 With hammer the size of object_info_t crossed the 255 byte boundary, which is 
 the max xattr value that XFS can inline.  We've since merged something that 
 stripes over several small xattrs so that we can keep things inline, but it 
 hasn't been backported to hammer yet.  See 
 c6cdb4081e366f471b372102905a1192910ab2da.  Perhaps this is what you're seeing?
 
 I think we're still better off with larger XFS inodes and inline xattrs if it 
 means we avoid leveldb at all for most objects.
 
 sage
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-16 Thread GuangYang
After back-porting Sage's patch to Giant, with radosgw, the xattrs can get 
inline. I haven't run extensive testing yet, will update once I have some 
performance data to share.

Thanks,
Guang

 Date: Tue, 16 Jun 2015 15:51:44 -0500
 From: mnel...@redhat.com
 To: yguan...@outlook.com; s...@newdream.net
 CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
 Subject: Re: xattrs vs. omap with radosgw
 
 
 
 On 06/16/2015 03:48 PM, GuangYang wrote:
  Thanks Sage for the quick response.
 
  It is on Firefly v0.80.4.
 
  While trying to put with *rados* directly, the xattrs can be inline. The 
  problem comes to light when using radosgw, since we have a bunch of 
  metadata to keep via xattrs, including:
  rgw.idtag  : 15 bytes
  rgw.manifest :  381 bytes
 
 Ah, that manifest will push us over the limit afaik resulting in every 
 inode getting a new extent.
 
  rgw.acl : 121 bytes
  rgw.etag : 33 bytes
 
  Given the background, it looks like the problem is that the rgw.manifest is 
  too large so that XFS make it extents. If I understand correctly, if we 
  port the change to Firefly, we should be able to inline the inode since the 
  accumulated size is still less than 2K (please correct me if I am wrong 
  here).
 
 I think you are correct so long as the patch breaks that manifest down 
 into 254 byte or smaller chunks.
 
 
  Thanks,
  Guang
 
 
  
  Date: Tue, 16 Jun 2015 12:43:08 -0700
  From: s...@newdream.net
  To: yguan...@outlook.com
  CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
  Subject: Re: xattrs vs. omap with radosgw
 
  On Tue, 16 Jun 2015, GuangYang wrote:
  Hi Cephers,
  While looking at disk utilization on OSD, I noticed the disk was 
  constantly busy with large number of small writes, further investigation 
  showed that, as radosgw uses xattrs to store metadata (e.g. etag, 
  content-type, etc.), which made the xattrs get from local to extents, 
  which incurred extra I/O.
 
  I would like to check if anybody has experience with offloading the 
  metadata to omap:
  1 Offload everything to omap? If this is the case, should we make the 
  inode size as 512 (instead of 2k)?
  2 Partial offload the metadata to omap, e.g. only offloading the rgw 
  specified metadata to omap.
 
  Any sharing is deeply appreciated. Thanks!
 
  Hi Guang,
 
  Is this hammer or firefly?
 
  With hammer the size of object_info_t crossed the 255 byte boundary, which
  is the max xattr value that XFS can inline. We've since merged something
  that stripes over several small xattrs so that we can keep things inline,
  but it hasn't been backported to hammer yet. See
  c6cdb4081e366f471b372102905a1192910ab2da. Perhaps this is what you're
  seeing?
 
  I think we're still better off with larger XFS inodes and inline xattrs if
  it means we avoid leveldb at all for most objects.
 
  sage
--
  To unsubscribe from this list: send the line unsubscribe ceph-devel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-16 Thread GuangYang
Hi Yuan,
Thanks for sharing the link, it is interesting to read. My understanding of the 
test results, is that with a fixed size of xattrs, using smaller stripe size 
will incur larger latency for read, which kind of makes sense since there are 
more k-v pairs, and with the size, it needs to get extents anyway. 

Correct me if I am wrong here...

Thanks,
Guang

 From: yuan.z...@intel.com
 To: s...@newdream.net; yguan...@outlook.com
 CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
 Subject: RE: xattrs vs. omap with radosgw
 Date: Wed, 17 Jun 2015 01:32:35 +
 
 FWIW, there was some discussion in OpenStack Swift and their performance 
 tests showed 255 is not the best in recent XFS. They decided to use large 
 xattr boundary size(65535).
 
 https://gist.github.com/smerritt/5e7e650abaa20599ff34
 
 
 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org 
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil
 Sent: Wednesday, June 17, 2015 3:43 AM
 To: GuangYang
 Cc: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
 Subject: Re: xattrs vs. omap with radosgw
 
 On Tue, 16 Jun 2015, GuangYang wrote:
 Hi Cephers,
 While looking at disk utilization on OSD, I noticed the disk was constantly 
 busy with large number of small writes, further investigation showed that, 
 as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), 
 which made the xattrs get from local to extents, which incurred extra I/O.
 
 I would like to check if anybody has experience with offloading the metadata 
 to omap:
   1 Offload everything to omap? If this is the case, should we make the 
 inode size as 512 (instead of 2k)?
   2 Partial offload the metadata to omap, e.g. only offloading the rgw 
 specified metadata to omap.
 
 Any sharing is deeply appreciated. Thanks!
 
 Hi Guang,
 
 Is this hammer or firefly?
 
 With hammer the size of object_info_t crossed the 255 byte boundary, which is 
 the max xattr value that XFS can inline. We've since merged something that 
 stripes over several small xattrs so that we can keep things inline, but it 
 hasn't been backported to hammer yet. See 
 c6cdb4081e366f471b372102905a1192910ab2da. Perhaps this is what you're seeing?
 
 I think we're still better off with larger XFS inodes and inline xattrs if it 
 means we avoid leveldb at all for most objects.
 
 sage
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at http://vger.kernel.org/majordomo-info.html
  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com