On 06/22/15 11:27, Jan Schermer wrote:
> I don’t run Ceph on btrfs, but isn’t this related to the btrfs
> snapshotting feature ceph uses to ensure a consistent journal?

It's possible: if I understand correctly the code, the btrfs filestore
backend creates a snapshot when syncing the journal. I'm a little
surprised that btrfs would need approximately 120MB written to disk to
perform a snapshot of a subvolume with ~160k files (and the removal of
the oldest one as the OSD maintains 2 active) but they aren't guaranteed
to be dirt cheap and probably weren't optimised for this frequency. I'm
surprised because I was under the impression that a snapshot on btrfs
was only a matter of keeping a reference to the root of the filesystem
btree which (at least in theory) seems cheap. In fact thinking while
writing this I realise it might very well be that it is the release of a
previous snapshot with its associated cleanups which is costly not the
snapshot creation.

We are about to add Intel DC SSDs for journals and I believe Krzysztof
is right: we should be able to disable the snapshots safely then. The
main reason for us to use btrfs is compression and crc at the fs level.
It seems performance could be too: we get constantly better latencies vs
xfs in our configuration. So I'm not particularly bothered by this: it
may be something useful to document (and at least leave a trace here for
others to find): btrfs with the default filestore max sync interval (5
seconds) may have serious performance problems in most configurations.

I'm not sure if I will have the time to trace the OSD processes to check
if I witness what Erik saw with CephFS (lots of xattr activity including
setxattr and removexattr): I'm not using CephFS and his findings didn't
specify if he was using btrfs and/or xfs backed OSD (we only see this
behaviour on btrfs).

Best regards,

Lionel
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to