Re: libcephfs create file with layout and replication

2012-11-20 Thread Noah Watkins
On Mon, Nov 19, 2012 at 7:28 PM, Sage Weil s...@inktank.com wrote:

 We could avoid the whole issue by passing 4 arguments to the function...

I pushed a new patch that takes each of the 4 new arguments.

  wip-client-open-layout

Thanks,
-Noah
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libcephfs create file with layout and replication

2012-11-19 Thread Gregory Farnum
On Sun, Nov 18, 2012 at 12:05 PM, Noah Watkins jayh...@cs.ucsc.edu wrote:
 Wanna have a look at a first pass on this patch?

wip-client-open-layout

 Thanks,
 Noah

Just glanced over this, and I'm curious:
1) Why symlink another reference to your file_layout.h?
2) There's already a ceph_file_layout struct which is used widely
(MDS, kernel, userspace client). It also has an accompanying function
that does basic validity checks.


 On Sat, Nov 17, 2012 at 5:20 PM, Noah Watkins jayh...@cs.ucsc.edu wrote:
 On Sat, Nov 17, 2012 at 4:15 PM, Sage Weil s...@inktank.com wrote:

 We ignore that for the purposes of getting the libcephfs API correct,
 though...

 Ok, make sense. Thanks.

 Noah

FYI, there's an unused __le32 in the open struct (used to be for
preferred PG). We should be able to steal that away without too much
pain or massaging! :)
-Greg
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libcephfs create file with layout and replication

2012-11-19 Thread Noah Watkins
On Mon, Nov 19, 2012 at 5:04 PM, Gregory Farnum g...@inktank.com wrote:

 Just glanced over this, and I'm curious:
 1) Why symlink another reference to your file_layout.h?

I followed the same pattern as page.h in librados, but may have
misunderstood its use. When libcephfs.h is installed, it includes

  #include file_layout.h

and we assume the user has -Iprefix/cephfs/.

but in the build tree, include/cephfs isn't an includes path used,
hence the symlink.

 2) There's already a ceph_file_layout struct which is used widely
 (MDS, kernel, userspace client). It also has an accompanying function
 that does basic validity checks.

I avoided ceph_file_layout because I was under the impression that all
of the __le64 stuff in it was very much Linux-specific. I had run into
a lot of this hacking on an OSX port.

 FYI, there's an unused __le32 in the open struct (used to be for
 preferred PG). We should be able to steal that away without too much
 pain or massaging! :)

Nice. Do you think I should revert back to using ceph_file_layout?

Thanks,
Noah
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libcephfs create file with layout and replication

2012-11-19 Thread Sage Weil
On Mon, 19 Nov 2012, Noah Watkins wrote:
 On Mon, Nov 19, 2012 at 5:04 PM, Gregory Farnum g...@inktank.com wrote:
 
  Just glanced over this, and I'm curious:
  1) Why symlink another reference to your file_layout.h?
 
 I followed the same pattern as page.h in librados, but may have
 misunderstood its use. When libcephfs.h is installed, it includes
 
   #include file_layout.h
 
 and we assume the user has -Iprefix/cephfs/.
 
 but in the build tree, include/cephfs isn't an includes path used,
 hence the symlink.
 
  2) There's already a ceph_file_layout struct which is used widely
  (MDS, kernel, userspace client). It also has an accompanying function
  that does basic validity checks.
 
 I avoided ceph_file_layout because I was under the impression that all
 of the __le64 stuff in it was very much Linux-specific. I had run into
 a lot of this hacking on an OSX port.
 
  FYI, there's an unused __le32 in the open struct (used to be for
  preferred PG). We should be able to steal that away without too much
  pain or massaging! :)
 
 Nice. Do you think I should revert back to using ceph_file_layout?

We could avoid the whole issue by passing 4 arguments to the function...
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libcephfs create file with layout and replication

2012-11-18 Thread Noah Watkins
Wanna have a look at a first pass on this patch?

   wip-client-open-layout

Thanks,
Noah

On Sat, Nov 17, 2012 at 5:20 PM, Noah Watkins jayh...@cs.ucsc.edu wrote:
 On Sat, Nov 17, 2012 at 4:15 PM, Sage Weil s...@inktank.com wrote:

 We ignore that for the purposes of getting the libcephfs API correct,
 though...

 Ok, make sense. Thanks.

 Noah
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libcephfs create file with layout and replication

2012-11-17 Thread Josh Durgin

On 11/17/2012 12:13 PM, Noah Watkins wrote:

The Hadoop VFS layer assumes that block size and replication can be
set on a per-file basis, which is important to users for file
layout/workload optimizations.

The libcephfs interface doesn't make this entirely easy. Here is one
approach, but it isn't thread safe as the default values are global
variables in the client.

   orig_obj_size = ceph_get_default_object_size() //save
   set_default_object_size(new size)
   open(path, O_CREAT)
   set_default_object_size(new size) //reset

Something more convenient might be:

   ceph_open_layout(path, flags, mode, layout, replication)


I think this makes the most sense, since changing the layout of a
file after it's been created can't happen, and this interface
makes that the most clear. It also avoids maintaining extra state
in libcephfs between calls.

Since replication count is a per-pool setting, I think the hadoop
bindings would have to translate from a vfs request to a pool
with the requested replication level. So something like this,
where layout is a struct containing stripe unit, stripe count,
and object size (the subset of struct ceph_file_layout related to
objects that's useful currently):

ceph_open_layout(path, flags, mode, layout, pool_name)

BTW, for anyone interested, there's a nice description of
the layout parameters here:

http://ceph.com/docs/master/dev/file-striping/


where layout and replication are used with O_CREAT | O_EXCL, or and
interface for setting these values explicitly on newly created files:

   ceph_open(path, O_CREAT|O_EXCL)
   ceph_set_layout(path, layout, replication)

where ceph_set_layout would succeed ostensibly on zero-length files.

Any thoughts on how to handle this?

Thanks,
Noah


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libcephfs create file with layout and replication

2012-11-17 Thread Sage Weil
On Sat, 17 Nov 2012, Noah Watkins wrote:
 The Hadoop VFS layer assumes that block size and replication can be
 set on a per-file basis, which is important to users for file
 layout/workload optimizations.
 
 The libcephfs interface doesn't make this entirely easy. Here is one
 approach, but it isn't thread safe as the default values are global
 variables in the client.
 
   orig_obj_size = ceph_get_default_object_size() //save
   set_default_object_size(new size)
   open(path, O_CREAT)
   set_default_object_size(new size) //reset
 
 Something more convenient might be:
 
   ceph_open_layout(path, flags, mode, layout, replication)
 
 where layout and replication are used with O_CREAT | O_EXCL, or and
 interface for setting these values explicitly on newly created files:
 
   ceph_open(path, O_CREAT|O_EXCL)
   ceph_set_layout(path, layout, replication)

This is basically what we have now... at least that's how things work for 
the kernel client.  We should make sure there is a clean way via libcephfs 
to do that.

The client/mds protocol also allows you to specify the layout on file 
creation.  This is better since it has one less round trip to the MDS.  
Let's just create a new open call with those additional arguments.

FWIW, the striping parameters are object size, stripe unit, stripe count, 
and data pool.

sage



 
 where ceph_set_layout would succeed ostensibly on zero-length files.
 
 Any thoughts on how to handle this?
 
 Thanks,
 Noah
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libcephfs create file with layout and replication

2012-11-17 Thread Sage Weil
On Sat, 17 Nov 2012, Noah Watkins wrote:
 On Sat, Nov 17, 2012 at 3:23 PM, Sage Weil s...@inktank.com wrote:
  On Sat, 17 Nov 2012, Noah Watkins wrote:
 
  FWIW, the striping parameters are object size, stripe unit, stripe count,
  and data pool.
 
 In ceph_mds_request_args.open I see the all the striping parameters
 except data pool, and I don't see any places that the file_replication
 parameter is being used. Should a pg_pool field be added?

Yeah, I think this bit needs to be fixed in the on-write protocol.  That 
is a delicate fix.

We ignore that for the purposes of getting the libcephfs API correct, 
though...

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libcephfs create file with layout and replication

2012-11-17 Thread Noah Watkins
On Sat, Nov 17, 2012 at 4:15 PM, Sage Weil s...@inktank.com wrote:

 We ignore that for the purposes of getting the libcephfs API correct,
 though...

Ok, make sense. Thanks.

Noah
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html