Re: rados_clone_range for different pgs
We use ceph to store huge files stripped into small (4mb) objects. Due to the fact that files can be changed unpredictably (data insertion/modification/deletion in any part of a file), we have to copy parts of the objects and it is done via the client. I see the following ways to solve this problem: - implement a client that is launched on the same host as the source osd, that will handle the copy process - add functionality to the osd, so it can do copy to other osds Which way best suits with the ceph ideology? 2013/8/2 Sage Weil s...@inktank.com: Hi Oleg, On Fri, 2 Aug 2013, Oleg Krasnianskiy wrote: Hi I have asked this question in ceph-users, but did not get any response, so I'll test my luck again, but with ceph-devel =) Sorry about that! Is there any way to copy part of one object into another one if they reside in different pgs? There is rados_clone_range, but it requires both objects to be inside one pg. There is no way currently. The clone_range can only (reliably) work on an OSD if it is stored with the same locator key; otherwise you have a ~R/N chance of that happening (where N is the number of OSDs, R is the number of replicas), which isn't worth optimizing for. If the objects aren't stored together, you need to read and then write the data; this avoids adding additional complexity to the OSD for minimal gain. Do you have a use-case in mind where this functionality is important? sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rados_clone_range for different pgs
On Tue, Oct 8, 2013 at 7:40 AM, Oleg Krasnianskiy oleg.krasnians...@gmail.com wrote: We use ceph to store huge files stripped into small (4mb) objects. Due to the fact that files can be changed unpredictably (data insertion/modification/deletion in any part of a file), we have to copy parts of the objects and it is done via the client. I see the following ways to solve this problem: - implement a client that is launched on the same host as the source osd, that will handle the copy process - add functionality to the osd, so it can do copy to other osds Which way best suits with the ceph ideology? I'm a bit confused; why does chunking of files into objects necessitate copying between objects? In any case, I suspect you will want to do this via OSD commands rather than by trying to put a client next to the OSD (this is subject to races if an OSD dies, for instance). We are currently implementing similar functionality for the first time, in order to support caching and tiering pools. It's not yet exposed to clients, but it shouldn't be difficult to extend our new copyfrom interface (used by the OSD) to a copy_chunk interface that we can expose to clients and copies part of an object into another, if somebody wants to take a stab at it! -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
rados_clone_range for different pgs
Hi I have asked this question in ceph-users, but did not get any response, so I'll test my luck again, but with ceph-devel =) Is there any way to copy part of one object into another one if they reside in different pgs? There is rados_clone_range, but it requires both objects to be inside one pg. Thanks! -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rados_clone_range for different pgs
Hi Oleg, On Fri, 2 Aug 2013, Oleg Krasnianskiy wrote: Hi I have asked this question in ceph-users, but did not get any response, so I'll test my luck again, but with ceph-devel =) Sorry about that! Is there any way to copy part of one object into another one if they reside in different pgs? There is rados_clone_range, but it requires both objects to be inside one pg. There is no way currently. The clone_range can only (reliably) work on an OSD if it is stored with the same locator key; otherwise you have a ~R/N chance of that happening (where N is the number of OSDs, R is the number of replicas), which isn't worth optimizing for. If the objects aren't stored together, you need to read and then write the data; this avoids adding additional complexity to the OSD for minimal gain. Do you have a use-case in mind where this functionality is important? sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html