On Wed, 17 Jun 2015, Zhou, Yuan wrote:
> FWIW, there was some discussion in OpenStack Swift and their performance 
> tests showed 255 is not the best in recent XFS. They decided to use large 
> xattr boundary size(65535).
> 
> https://gist.github.com/smerritt/5e7e650abaa20599ff34

If I read this correctly the total metadata they are setting is pretty 
big:

PILE_O_METADATA = pickle.dumps(dict(
    ("attribute%d" % i, hashlib.sha512("thingy %d" % i).hexdigest())
    for i in range(200)))

So lots of small attrs won't really help since they'll have to spill out 
into other extents eventually no matter what.

In our case, we have big (2k) inodes and can easily fit everything in 
there.. as long as it is in <255 byte pieces.

sage


> 
> 
> -----Original Message-----
> From: ceph-devel-ow...@vger.kernel.org 
> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil
> Sent: Wednesday, June 17, 2015 3:43 AM
> To: GuangYang
> Cc: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
> Subject: Re: xattrs vs. omap with radosgw
> 
> On Tue, 16 Jun 2015, GuangYang wrote:
> > Hi Cephers,
> > While looking at disk utilization on OSD, I noticed the disk was constantly 
> > busy with large number of small writes, further investigation showed that, 
> > as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), 
> > which made the xattrs get from local to extents, which incurred extra I/O.
> > 
> > I would like to check if anybody has experience with offloading the 
> > metadata to omap:
> >   1> Offload everything to omap? If this is the case, should we make the 
> > inode size as 512 (instead of 2k)?
> >   2> Partial offload the metadata to omap, e.g. only offloading the rgw 
> > specified metadata to omap.
> > 
> > Any sharing is deeply appreciated. Thanks!
> 
> Hi Guang,
> 
> Is this hammer or firefly?
> 
> With hammer the size of object_info_t crossed the 255 byte boundary, which is 
> the max xattr value that XFS can inline.  We've since merged something that 
> stripes over several small xattrs so that we can keep things inline, but it 
> hasn't been backported to hammer yet.  See 
> c6cdb4081e366f471b372102905a1192910ab2da.  Perhaps this is what you're seeing?
> 
> I think we're still better off with larger XFS inodes and inline xattrs if it 
> means we avoid leveldb at all for most objects.
> 
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to