Thanks Sage for the quick response.

It is on Firefly v0.80.4.

While trying to put with *rados* directly, the xattrs can be inline. The 
problem comes to light when using radosgw, since we have a bunch of metadata to 
keep via xattrs, including:
   rgw.idtag  : 15 bytes
   rgw.manifest :  381 bytes
   rgw.acl : 121 bytes
   rgw.etag : 33 bytes

Given the background, it looks like the problem is that the rgw.manifest is too 
large so that XFS make it extents. If I understand correctly, if we port the 
change to Firefly, we should be able to inline the inode since the accumulated 
size is still less than 2K (please correct me if I am wrong here).

Thanks,
Guang


----------------------------------------
> Date: Tue, 16 Jun 2015 12:43:08 -0700
> From: s...@newdream.net
> To: yguan...@outlook.com
> CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
> Subject: Re: xattrs vs. omap with radosgw
>
> On Tue, 16 Jun 2015, GuangYang wrote:
>> Hi Cephers,
>> While looking at disk utilization on OSD, I noticed the disk was constantly 
>> busy with large number of small writes, further investigation showed that, 
>> as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), 
>> which made the xattrs get from local to extents, which incurred extra I/O.
>>
>> I would like to check if anybody has experience with offloading the metadata 
>> to omap:
>> 1> Offload everything to omap? If this is the case, should we make the inode 
>> size as 512 (instead of 2k)?
>> 2> Partial offload the metadata to omap, e.g. only offloading the rgw 
>> specified metadata to omap.
>>
>> Any sharing is deeply appreciated. Thanks!
>
> Hi Guang,
>
> Is this hammer or firefly?
>
> With hammer the size of object_info_t crossed the 255 byte boundary, which
> is the max xattr value that XFS can inline. We've since merged something
> that stripes over several small xattrs so that we can keep things inline,
> but it hasn't been backported to hammer yet. See
> c6cdb4081e366f471b372102905a1192910ab2da. Perhaps this is what you're
> seeing?
>
> I think we're still better off with larger XFS inodes and inline xattrs if
> it means we avoid leveldb at all for most objects.
>
> sage
                                          
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to