> On 10 Dec 2015, at 15:14, Sage Weil <s...@newdream.net> wrote:
> 
> On Thu, 10 Dec 2015, Jan Schermer wrote:
>> Removing snapshot means looking for every *potential* object the snapshot 
>> can have, and this takes a very long time (6TB snapshot will consist of 1.5M 
>> objects (in one replica) assuming the default 4MB object size). The same 
>> applies to large thin volumes (don't try creating and then dropping a 1 EiB 
>> volume, even if you only have 1GB of physical space :)).
>> Doing this is simply expensive and might saturate your OSDs. If you don't 
>> have enough RAM to cache the structure then all the "is there a file 
>> /var/lib/ceph/...." will go to disk and that can hurt a lot.
>> I don't think there's any priority to this (is there?), so it competes with 
>> everything else.
>> 
>> I'm not sure how snapshots are exactly coded in Ceph, but in a COW 
>> filesystem you simply don't dereference blocks of the parent of the  
>> snapshot when doing writes to it and that's cheap, but Ceph stores "blocks" 
>> in files with computable names and has no pointers to them that could be 
>> modified,  so by creating a snapshot you hurt the performance a lot (you 
>> need to create a copy of the 4MB object into the snapshot(s) when you dirty 
>> a byte in there). Though I remember reading that the logic is actually 
>> reversed and it is the snapshot that gets the original blocks(??)...
>> Anyway if you are removing snapshot at the same time as writing to the 
>> parent there could be potentionaly a problem in what gets done first. Is 
>> Ceph smart enough to not care about snapshots that are getting deleted? I 
>> have no idea but I think it must be because we use snapshots a lot and 
>> haven't had that any issues with it.
> 
> It's not quite so bad... the OSD maintains a map (in leveldb) of the 
> objects that are referenced by a snapshot, so the amount of work is 
> proportional to the number of objects that were cloned for that snapshot.
> 


Nice. I saw a blueprint somewhere earlier this year, so that's a pretty new 
thing (Hammer or Infernalis?)
And is it a map (with pointers to objects) or just a bitmap of the overlay?

Jan

> There is certainly room for improvement in terms of the impact on client 
> IO, though.  :)
> 
> sage

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to