Re: libcephfs create file with layout and replication
On Mon, Nov 19, 2012 at 7:28 PM, Sage Weil s...@inktank.com wrote: We could avoid the whole issue by passing 4 arguments to the function... I pushed a new patch that takes each of the 4 new arguments. wip-client-open-layout Thanks, -Noah -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libcephfs create file with layout and replication
On Sun, Nov 18, 2012 at 12:05 PM, Noah Watkins jayh...@cs.ucsc.edu wrote: Wanna have a look at a first pass on this patch? wip-client-open-layout Thanks, Noah Just glanced over this, and I'm curious: 1) Why symlink another reference to your file_layout.h? 2) There's already a ceph_file_layout struct which is used widely (MDS, kernel, userspace client). It also has an accompanying function that does basic validity checks. On Sat, Nov 17, 2012 at 5:20 PM, Noah Watkins jayh...@cs.ucsc.edu wrote: On Sat, Nov 17, 2012 at 4:15 PM, Sage Weil s...@inktank.com wrote: We ignore that for the purposes of getting the libcephfs API correct, though... Ok, make sense. Thanks. Noah FYI, there's an unused __le32 in the open struct (used to be for preferred PG). We should be able to steal that away without too much pain or massaging! :) -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libcephfs create file with layout and replication
On Mon, Nov 19, 2012 at 5:04 PM, Gregory Farnum g...@inktank.com wrote: Just glanced over this, and I'm curious: 1) Why symlink another reference to your file_layout.h? I followed the same pattern as page.h in librados, but may have misunderstood its use. When libcephfs.h is installed, it includes #include file_layout.h and we assume the user has -Iprefix/cephfs/. but in the build tree, include/cephfs isn't an includes path used, hence the symlink. 2) There's already a ceph_file_layout struct which is used widely (MDS, kernel, userspace client). It also has an accompanying function that does basic validity checks. I avoided ceph_file_layout because I was under the impression that all of the __le64 stuff in it was very much Linux-specific. I had run into a lot of this hacking on an OSX port. FYI, there's an unused __le32 in the open struct (used to be for preferred PG). We should be able to steal that away without too much pain or massaging! :) Nice. Do you think I should revert back to using ceph_file_layout? Thanks, Noah -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libcephfs create file with layout and replication
On Mon, 19 Nov 2012, Noah Watkins wrote: On Mon, Nov 19, 2012 at 5:04 PM, Gregory Farnum g...@inktank.com wrote: Just glanced over this, and I'm curious: 1) Why symlink another reference to your file_layout.h? I followed the same pattern as page.h in librados, but may have misunderstood its use. When libcephfs.h is installed, it includes #include file_layout.h and we assume the user has -Iprefix/cephfs/. but in the build tree, include/cephfs isn't an includes path used, hence the symlink. 2) There's already a ceph_file_layout struct which is used widely (MDS, kernel, userspace client). It also has an accompanying function that does basic validity checks. I avoided ceph_file_layout because I was under the impression that all of the __le64 stuff in it was very much Linux-specific. I had run into a lot of this hacking on an OSX port. FYI, there's an unused __le32 in the open struct (used to be for preferred PG). We should be able to steal that away without too much pain or massaging! :) Nice. Do you think I should revert back to using ceph_file_layout? We could avoid the whole issue by passing 4 arguments to the function... -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libcephfs create file with layout and replication
Wanna have a look at a first pass on this patch? wip-client-open-layout Thanks, Noah On Sat, Nov 17, 2012 at 5:20 PM, Noah Watkins jayh...@cs.ucsc.edu wrote: On Sat, Nov 17, 2012 at 4:15 PM, Sage Weil s...@inktank.com wrote: We ignore that for the purposes of getting the libcephfs API correct, though... Ok, make sense. Thanks. Noah -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libcephfs create file with layout and replication
On 11/17/2012 12:13 PM, Noah Watkins wrote: The Hadoop VFS layer assumes that block size and replication can be set on a per-file basis, which is important to users for file layout/workload optimizations. The libcephfs interface doesn't make this entirely easy. Here is one approach, but it isn't thread safe as the default values are global variables in the client. orig_obj_size = ceph_get_default_object_size() //save set_default_object_size(new size) open(path, O_CREAT) set_default_object_size(new size) //reset Something more convenient might be: ceph_open_layout(path, flags, mode, layout, replication) I think this makes the most sense, since changing the layout of a file after it's been created can't happen, and this interface makes that the most clear. It also avoids maintaining extra state in libcephfs between calls. Since replication count is a per-pool setting, I think the hadoop bindings would have to translate from a vfs request to a pool with the requested replication level. So something like this, where layout is a struct containing stripe unit, stripe count, and object size (the subset of struct ceph_file_layout related to objects that's useful currently): ceph_open_layout(path, flags, mode, layout, pool_name) BTW, for anyone interested, there's a nice description of the layout parameters here: http://ceph.com/docs/master/dev/file-striping/ where layout and replication are used with O_CREAT | O_EXCL, or and interface for setting these values explicitly on newly created files: ceph_open(path, O_CREAT|O_EXCL) ceph_set_layout(path, layout, replication) where ceph_set_layout would succeed ostensibly on zero-length files. Any thoughts on how to handle this? Thanks, Noah -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libcephfs create file with layout and replication
On Sat, 17 Nov 2012, Noah Watkins wrote: The Hadoop VFS layer assumes that block size and replication can be set on a per-file basis, which is important to users for file layout/workload optimizations. The libcephfs interface doesn't make this entirely easy. Here is one approach, but it isn't thread safe as the default values are global variables in the client. orig_obj_size = ceph_get_default_object_size() //save set_default_object_size(new size) open(path, O_CREAT) set_default_object_size(new size) //reset Something more convenient might be: ceph_open_layout(path, flags, mode, layout, replication) where layout and replication are used with O_CREAT | O_EXCL, or and interface for setting these values explicitly on newly created files: ceph_open(path, O_CREAT|O_EXCL) ceph_set_layout(path, layout, replication) This is basically what we have now... at least that's how things work for the kernel client. We should make sure there is a clean way via libcephfs to do that. The client/mds protocol also allows you to specify the layout on file creation. This is better since it has one less round trip to the MDS. Let's just create a new open call with those additional arguments. FWIW, the striping parameters are object size, stripe unit, stripe count, and data pool. sage where ceph_set_layout would succeed ostensibly on zero-length files. Any thoughts on how to handle this? Thanks, Noah -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libcephfs create file with layout and replication
On Sat, 17 Nov 2012, Noah Watkins wrote: On Sat, Nov 17, 2012 at 3:23 PM, Sage Weil s...@inktank.com wrote: On Sat, 17 Nov 2012, Noah Watkins wrote: FWIW, the striping parameters are object size, stripe unit, stripe count, and data pool. In ceph_mds_request_args.open I see the all the striping parameters except data pool, and I don't see any places that the file_replication parameter is being used. Should a pg_pool field be added? Yeah, I think this bit needs to be fixed in the on-write protocol. That is a delicate fix. We ignore that for the purposes of getting the libcephfs API correct, though... sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libcephfs create file with layout and replication
On Sat, Nov 17, 2012 at 4:15 PM, Sage Weil s...@inktank.com wrote: We ignore that for the purposes of getting the libcephfs API correct, though... Ok, make sense. Thanks. Noah -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html