Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-20 Thread Andreas Bluemle
Hi Sean,

I will re-check until the end of the week; there is
some test scheduling issue with our test system, which
affects my access times.

Thanks

Andreas


On Mon, 19 Aug 2013 17:10:11 +
Hefty, Sean sean.he...@intel.com wrote:

 Can you see if the patch below fixes the hang?
 
 Signed-off-by: Sean Hefty sean.he...@intel.com
 ---
  src/rsocket.c |   11 ++-
  1 files changed, 10 insertions(+), 1 deletions(-)
 
 diff --git a/src/rsocket.c b/src/rsocket.c
 index d544dd0..e45b26d 100644
 --- a/src/rsocket.c
 +++ b/src/rsocket.c
 @@ -2948,10 +2948,12 @@ static int rs_poll_events(struct pollfd
 *rfds, struct pollfd *fds, nfds_t nfds) 
   rs = idm_lookup(idm, fds[i].fd);
   if (rs) {
 + fastlock_acquire(rs-cq_wait_lock);
   if (rs-type == SOCK_STREAM)
   rs_get_cq_event(rs);
   else
   ds_get_cq_event(rs);
 + fastlock_release(rs-cq_wait_lock);
   fds[i].revents = rs_poll_rs(rs,
 fds[i].events, 1, rs_poll_all); } else {
   fds[i].revents = rfds[i].revents;
 @@ -3098,7 +3100,8 @@ int rselect(int nfds, fd_set *readfds, fd_set
 *writefds, 
  /*
   * For graceful disconnect, notify the remote side that we're
 - * disconnecting and wait until all outstanding sends complete.
 + * disconnecting and wait until all outstanding sends complete,
 provided
 + * that the remote side has not sent a disconnect message.
   */
  int rshutdown(int socket, int how)
  {
 @@ -3138,6 +3141,12 @@ int rshutdown(int socket, int how)
   if (rs-state  rs_connected)
   rs_process_cq(rs, 0, rs_conn_all_sends_done);
  
 + if (rs-state  rs_disconnected) {
 + /* Generate event by flushing receives to unblock
 rpoll */
 + ibv_req_notify_cq(rs-cm_id-recv_cq, 0);
 + rdma_disconnect(rs-cm_id);
 + }
 +
   if ((rs-fd_flags  O_NONBLOCK)  (rs-state 
 rs_connected)) rs_set_nonblocking(rs, rs-fd_flags);
  
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma
 in the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 



-- 
Andreas Bluemle mailto:andreas.blue...@itxperts.de
Heinrich Boell Strasse 88   Phone: (+49) 89 4317582
D-81829 Muenchen (Germany)  Mobil: (+49) 177 522 0151
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RGW blueprint for plugin architecture

2013-08-20 Thread Roald van Loon
On Tue, Aug 20, 2013 at 2:58 AM, Yehuda Sadeh yeh...@inktank.com wrote:
 Well, practically I'd like to have such work doing baby steps, rather
 than swiping changes. Such changes have higher chances of getting
 completed and eventually merged upstream.  That's why I prefer the
 current model of directly linking the plugins (whether statically or
 dynamically), with (relatively) minor internal adjustments.

What current model of directly linking plugins do you refer to exactly?

 Maybe start with thinking about the use cases, and then figure what
 kind of api that would be. As I said, I'm not sure that an internal
 api is the way to go, but rather exposing some lower level
 functionality externally. The big difference is that with the former
 we tie in the internal architecture, while the latter hides the [gory]
 details.

The problem is that right now basically everything is 'lower level
functionality', because a lot of generic stuff depends on S3 stuff,
which in turn depends on generic stuff. Take for example the
following;

class RGWHandler_Usage : public RGWHandler_Auth_S3 { }
class RGWHandler_Auth_S3 : public RGWHandler_ObjStore { }

This basically ties usage statistics collection + authentication
handling + object store all together.

I think this needs to be completely unravelled, but before making all
kinds of use cases (like, usage statistics collection or
authentication in this case) it might be wise to know what the design
decisions were to make the S3 API so very much integrated into
everything else. Or is this just legacy?

Roald
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-20 Thread Andreas Bluemle
Hi,

I have added the patch and re-tested: I still encounter
hangs of my application. I am not quite sure whether the
I hit the same error on the shutdown because now I don't hit
the error always, but only every now and then.

WHen adding the patch to my code base (git tag v1.0.17) I notice
an offset of -34 lines. Which code base are you using?


Best Regards

Andreas Bluemle

On Tue, 20 Aug 2013 09:21:13 +0200
Andreas Bluemle andreas.blue...@itxperts.de wrote:

 Hi Sean,
 
 I will re-check until the end of the week; there is
 some test scheduling issue with our test system, which
 affects my access times.
 
 Thanks
 
 Andreas
 
 
 On Mon, 19 Aug 2013 17:10:11 +
 Hefty, Sean sean.he...@intel.com wrote:
 
  Can you see if the patch below fixes the hang?
  
  Signed-off-by: Sean Hefty sean.he...@intel.com
  ---
   src/rsocket.c |   11 ++-
   1 files changed, 10 insertions(+), 1 deletions(-)
  
  diff --git a/src/rsocket.c b/src/rsocket.c
  index d544dd0..e45b26d 100644
  --- a/src/rsocket.c
  +++ b/src/rsocket.c
  @@ -2948,10 +2948,12 @@ static int rs_poll_events(struct pollfd
  *rfds, struct pollfd *fds, nfds_t nfds) 
  rs = idm_lookup(idm, fds[i].fd);
  if (rs) {
  +   fastlock_acquire(rs-cq_wait_lock);
  if (rs-type == SOCK_STREAM)
  rs_get_cq_event(rs);
  else
  ds_get_cq_event(rs);
  +   fastlock_release(rs-cq_wait_lock);
  fds[i].revents = rs_poll_rs(rs,
  fds[i].events, 1, rs_poll_all); } else {
  fds[i].revents = rfds[i].revents;
  @@ -3098,7 +3100,8 @@ int rselect(int nfds, fd_set *readfds, fd_set
  *writefds, 
   /*
* For graceful disconnect, notify the remote side that we're
  - * disconnecting and wait until all outstanding sends complete.
  + * disconnecting and wait until all outstanding sends complete,
  provided
  + * that the remote side has not sent a disconnect message.
*/
   int rshutdown(int socket, int how)
   {
  @@ -3138,6 +3141,12 @@ int rshutdown(int socket, int how)
  if (rs-state  rs_connected)
  rs_process_cq(rs, 0, rs_conn_all_sends_done);
   
  +   if (rs-state  rs_disconnected) {
  +   /* Generate event by flushing receives to unblock
  rpoll */
  +   ibv_req_notify_cq(rs-cm_id-recv_cq, 0);
  +   rdma_disconnect(rs-cm_id);
  +   }
  +
  if ((rs-fd_flags  O_NONBLOCK)  (rs-state 
  rs_connected)) rs_set_nonblocking(rs, rs-fd_flags);
   
  
  
  --
  To unsubscribe from this list: send the line unsubscribe
  linux-rdma in the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
  
 
 
 



-- 
Andreas Bluemle mailto:andreas.blue...@itxperts.de
Heinrich Boell Strasse 88   Phone: (+49) 89 4317582
D-81829 Muenchen (Germany)  Mobil: (+49) 177 522 0151
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Review request : Erasure Code plugin loader implementation

2013-08-20 Thread Loic Dachary
Hi Sage,

I created erasure code : convenience functions to code / decode 
http://tracker.ceph.com/issues/6064 to implement the suggested functions. 
Please let me know if this should be merged with another task.

Cheers

On 19/08/2013 17:06, Loic Dachary wrote:
 
 
 On 19/08/2013 02:01, Sage Weil wrote:
 On Sun, 18 Aug 2013, Loic Dachary wrote:
 Hi Sage,

 Unless I misunderstood something ( which is still possible at this stage 
 ;-) decode() is used both for recovery of missing chunks and retrieval of 
 the original buffer. Decoding the M data chunks is a special case of 
 decoding N = M chunks out of the M+K chunks that were produced by 
 encode(). It can be used to recover parity chunks as well as data chunks.

 https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api

 mapint, buffer decode(const setint want_to_read, const mapint, 
 buffer chunks)

 decode chunks to read the content of the want_to_read chunks and return 
 a map associating the chunk number with its decoded content. For instance, 
 in the simplest case M=2,K=1 for an encoded payload of data A and B with 
 parity Z, calling

 decode([1,2], { 1 = 'A', 2 = 'B', 3 = 'Z' })
 = { 1 = 'A', 2 = 'B' }

 If however, the chunk B is to be read but is missing it will be:

 decode([2], { 1 = 'A', 3 = 'Z' })
 = { 2 = 'B' }

 Ah, I guess this works when some of the chunks contain the original 
 data (as with a parity code).  There are codes that don't work that way, 
 although I suspect we won't use them.

 Regardless, I wonder if we should generalize slightly and have some 
 methods work in terms of (offset,length) of the original stripe to 
 generalize that bit.  Then we would have something like

  mapint, buffer transcode(const setint want_to_read, const mapint, 
 buffer chunks);

 to go from chunks - chunks (as we would want to do with, say, a LRC-like 
 code where we can rebuild some shards from a subset of the other shards).  
 And then also have

  int decode(const mapint, buffer chunks, unsigned offset, 
  unsigned len, bufferlist *out);
 
 This function would be implemented more or less as:
 
   setint want_to_read = range_to_chunks(offset, len) // compute what chunks 
 must be retrieved
   setint available = the up set
   setint minimum = minimum_to_decode(want_to_read, available);
   mapint, buffer available_chunks = retrieve_chunks_from_osds(minimum);
   mapint, buffer chunks = transcode(want_to_read, available_chunks); // 
 repairs if necessary
   out = bufferptr(concat_chunks(chunks), offset - offset of the first chunk, 
 len)
 
 or do you have something else in mind ?
 

 that recovers the original data.

 In our case, the read path would use decode, and for recovery we would use 
 transcode.  

 We'd also want to have alternate minimum_to_decode* methods, like

 virtual setint minimum_to_decode(unsigned offset, unsigned len, const 
  setint available_chunks) = 0;
 
 I also have a convenience wrapper in mind for this but I feel I'm missing 
 something.
 
 Cheers
 

 What do you think?

 sage





 Cheers

 On 18/08/2013 19:34, Sage Weil wrote:
 On Sun, 18 Aug 2013, Loic Dachary wrote:
 Hi Ceph,

 I've implemented a draft of the Erasure Code plugin loader in the context 
 of http://tracker.ceph.com/issues/5878. It has a trivial unit test and an 
 example plugin. It would be great if someone could do a quick review. The 
 general idea is that the erasure code pool calls something like:

 ErasureCodePlugin::factory(erasure_code, example, parameters)

 as shown at

 https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/test/osd/TestErasureCode.cc#L28

 to get an object implementing the interface

 https://github.com/ceph/ceph/blob/5a2b1d66ae17b78addc14fee68c73985412f3c8c/src/osd/ErasureCodeInterface.h

 which matches the proposal described at

 https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api

 The draft is at

 https://github.com/ceph/ceph/commit/5a2b1d66ae17b78addc14fee68c73985412f3c8c

 Thanks in advance :-)

 I haven't been following this discussion too closely, but taking a look 
 now, the first 3 make sense, but

 virtual mapint, bufferptr decode(const setint want_to_read, const 
 mapint, bufferptr chunks) = 0;

 it seems like this one should be more like

 virtual int decode(const mapint, bufferptr chunks, bufferlist *out);

 As in, you'd decode the chunks you have to get the actual data.  If you 
 want to get (missing) chunks for recovery, you'd do

   minimum_to_decode(...);  // see what we need
   fetch those chunks from other nodes
   decode(...);   // reconstruct original buffer
   encode(...);   // encode missing chunks from original data

 sage
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  

Erasure Code plugin system with an example : review request

2013-08-20 Thread Loic Dachary
Hi Ceph,

Yesterday I implemented a simple erasure code plugin that can sustain the loss 
of a single chunk.

https://github.com/dachary/ceph/blob/wip-5878/src/osd/ErasureCodeExample.h

and it works as shown in the unit test

https://github.com/dachary/ceph/blob/wip-5878/src/test/osd/TestErasureCodeExample.cc

It would be of limited use in a production environment because it only saves 
25% space ( M=2 K=1 ) over a 2 replica pool, but it would work.

I would very much appreciate a review of the erasure code plugin system and the 
associated example plugin :

https://github.com/ceph/ceph/pull/515

When it's good enough, creating a jerasure plugin will be next :-)

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.



signature.asc
Description: OpenPGP digital signature


RE: [ceph-users] Help needed porting Ceph to RSockets

2013-08-20 Thread Hefty, Sean
 I have added the patch and re-tested: I still encounter
 hangs of my application. I am not quite sure whether the
 I hit the same error on the shutdown because now I don't hit
 the error always, but only every now and then.

I guess this is at least some progress... :/
 
 WHen adding the patch to my code base (git tag v1.0.17) I notice
 an offset of -34 lines. Which code base are you using?

This patch was generated against the tip of the git tree. 

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


libvirt: Removing RBD volumes with snapshots, auto purge or not?

2013-08-20 Thread Wido den Hollander

Hi,

The current [0] libvirt storage pool code simply calls rbd_remove 
without anything else.


As far as I know rbd_remove will fail if the image still has snapshots, 
you have to remove those snapshots first before you can remove the image.


The problem is that libvirt's storage pools do not support listing 
snapshots, so we can't integrate that.


Libvirt however has a flag you can pass down to tell you want the device 
to be zeroed.


The normal procedure is that the device is filled with zeros before 
actually removing it.


I was thinking about abusing this flag to use it as a snap purge for RBD.

So a regular volume removal will call only rbd_remove, but when the flag 
VIR_STORAGE_VOL_DELETE_ZEROED is passed it will purge all snapshots 
prior to calling rbd_remove.


Another way would be to always purge snapshots, but I'm afraid that 
could make somebody very unhappy at some point.


Currently virsh doesn't support flags, but that could be fixed in a 
different patch.


Does my idea sound sane?

[0]: 
http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/storage/storage_backend_rbd.c;h=e3340f63f412c22d025f615beb7cfed25f00107b;hb=master#l407


--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libvirt: Removing RBD volumes with snapshots, auto purge or not?

2013-08-20 Thread Andrey Korolyov
On Tue, Aug 20, 2013 at 7:36 PM, Wido den Hollander w...@42on.com wrote:
 Hi,

 The current [0] libvirt storage pool code simply calls rbd_remove without
 anything else.

 As far as I know rbd_remove will fail if the image still has snapshots, you
 have to remove those snapshots first before you can remove the image.

 The problem is that libvirt's storage pools do not support listing
 snapshots, so we can't integrate that.

 Libvirt however has a flag you can pass down to tell you want the device to
 be zeroed.

 The normal procedure is that the device is filled with zeros before actually
 removing it.

 I was thinking about abusing this flag to use it as a snap purge for RBD.

 So a regular volume removal will call only rbd_remove, but when the flag
 VIR_STORAGE_VOL_DELETE_ZEROED is passed it will purge all snapshots prior to
 calling rbd_remove.

 Another way would be to always purge snapshots, but I'm afraid that could
 make somebody very unhappy at some point.

 Currently virsh doesn't support flags, but that could be fixed in a
 different patch.

 Does my idea sound sane?

 [0]:
 http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/storage/storage_backend_rbd.c;h=e3340f63f412c22d025f615beb7cfed25f00107b;hb=master#l407

 --
 Wido den Hollander
 42on B.V.

Hi Wido,


You had mentioned not so long ago the same idea as I had about a year
and half ago about placing memory dumps along with the regular
snapshot in Ceph using libvirt mechanisms. That sounds pretty nice
since we`ll have something other than qcow2 with same snapshot
functionality but your current proposal does not extend to this.
Placing custom side hook seems much more expandable than putting snap
purge into specific flag.


 Phone: +31 (0)20 700 9902
 Skype: contact42on
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RGW blueprint for plugin architecture

2013-08-20 Thread Roald van Loon
On Tue, Aug 20, 2013 at 4:49 PM, Yehuda Sadeh yeh...@inktank.com wrote:
 I was referring to your work at wip-rgw-plugin, where the plugin code
 itself still needs to rely on the rgw utility code.

Right. So we can agree on ditching the dynamic loading thing and clean
internal API (for now), but at least start separating code into
plugins like this?

 That's not quite a hard dependency. At the moment it's like that, as
 we made a decision to use the S3 auth for the admin utilities.
 Switching to a different auth system (atm) would require defining a
 new auth class and inheriting from it instead. It's not very flexible,
 but it's not very intrusive.
 I'd certainly be interested in removing this inheritance relationship
 and switch to a different pipeline model.

I don't know if you looked at it in detail, but for the wip-rgw-plugin
work I created a RGWAuthManager / RGWAuthPipeline relation to
seggregate authentication specific stuff from the REST handlers. Is
that in general a model you like to see discussed in more detail? If
so, it would probably be wise to start a separate blueprint for it.

 As I said, I don't see it as such. We do use it all over the place,
 but the same way you could just switch these to use
 RGWHandler_Auth_Swift and it should work (give or take a few tweaks).

IMHO, REST handlers should leave
authentication/authorization/accounting specific tasks to a separate
component (like the aforementioned pipelining system, and maybe
integrate that with current RGWUser related code), although this will
likely never be purely abstracted (at least for authentication). This
just makes the whole system more modular (albeit it just a bit).

But for now I propose to implement a small plugin system where
plugins are still linked into the rgw core (but code wise as much
separated as possible), and keep the auth stuff for later.

Roald
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


app design recommendations

2013-08-20 Thread Nulik Nol
Hi,
I am creating an email system which will handle whole company's email,
mostly internal mail. There will be thousands of companies and
hundreds of users per company. So I am planning to use one pool per
company to store email messages. Can Ceph manage thousands or maybe
hundred thousands of pools ? Could there be any slowdown at production
with such design after some growth?

Every email will be stored as an individual ceph object (emails will
average 512 bytes and rarely have attachments) , is it ok to store
them as a ceph objects or will it be less efficient than storing
multiple emails in a ceph object,? What is the optimal ceph object
size to store individually, so it would be preferable to do this
instead of writing through omap with leveldb? (kind of ceph object vs
omap benchmark question)

Also I will be putting mini-chat sessions between users in a ceph
object, each time a user sends a message to another user, I will
append the text to the ceph object, so my question is, will Ceph
rewrite the whole object into a new physical location on disk when I
do an append? Or will it just rewrite the block that was modified?

And last questions: Which is faster, storing small key/value pairs in
omap or in xattrs ? Will storing key/value pairs in xattrs result in
space waste by allocating a block for zero-sized object on the OSD? (I
won't write any data to the object, just use xattrs)

Will appreciate very much your comments.

Best Regards
Nulik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libvirt: Removing RBD volumes with snapshots, auto purge or not?

2013-08-20 Thread Josh Durgin

On 08/20/2013 08:36 AM, Wido den Hollander wrote:

Hi,

The current [0] libvirt storage pool code simply calls rbd_remove
without anything else.

As far as I know rbd_remove will fail if the image still has snapshots,
you have to remove those snapshots first before you can remove the image.

The problem is that libvirt's storage pools do not support listing
snapshots, so we can't integrate that.


libvirt's storage pools don't have any concept of snapshots, which is
the real problem. Ideally they would have functions to at least create,
list and delete snapshots (and probably rollback and create a volume 
from a snapshot too).



Libvirt however has a flag you can pass down to tell you want the device
to be zeroed.

The normal procedure is that the device is filled with zeros before
actually removing it.

I was thinking about abusing this flag to use it as a snap purge for RBD.

So a regular volume removal will call only rbd_remove, but when the flag
VIR_STORAGE_VOL_DELETE_ZEROED is passed it will purge all snapshots
prior to calling rbd_remove.


I don't think we should reinterpret the flag like that. A new flag
for that purpose could work, but since libvirt storage pools don't
manage snapshots at all right now I'd rather CloudStack delete the
snapshots via librbd, since it's the service creating them in this case.
You could see what the libvirt devs think about a new flag though.


Another way would be to always purge snapshots, but I'm afraid that
could make somebody very unhappy at some point.


I agree this would be too unsafe for a default. It seems that's what
the LVM storage pool does now, maybe because it doesn't expect
snapshots to be used.


Currently virsh doesn't support flags, but that could be fixed in a
different patch.


No backend actually uses the flags yet either.

Josh
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some help with the RBD Java bindings

2013-08-20 Thread Noah Watkins
Wido,

I pushed up a patch to

   
https://github.com/ceph/rados-java/commit/ca16d82bc5b596620609880e429ec9f4eaa4d5ce

That includes a fix for this problem. The fix is a bit hacky, but the
tests pass now. I included more details about the hack in the code.

On Thu, Aug 15, 2013 at 9:57 AM, Noah Watkins noah.watk...@inktank.com wrote:
 On Thu, Aug 15, 2013 at 8:51 AM, Wido den Hollander w...@42on.com wrote:

 public ListRbdSnapInfo snapList() throws RbdException {
 IntByReference numSnaps = new IntByReference(16);
 PointerByReference snaps = new PointerByReference();
 ListRbdSnapInfo list = new ArrayListRbdSnapInfo();
 RbdSnapInfo snapInfo, snapInfos[];

 while (true) {
 int r = rbd.rbd_snap_list(this.getPointer(), snaps, numSnaps);

 I think you need to allocate the memory for `snaps` yourself. Here is
 the RBD wrapper for Python which does that:

   self.snaps = (rbd_snap_info_t * num_snaps.value)()
   ret = self.librbd.rbd_snap_list(image.image, byref(self.snaps),
byref(num_snaps))

 - Noah
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


do not upgrade bobtail - dumpling directly until 0.67.2

2013-08-20 Thread Sage Weil
We've identified a problem when upgrading directly from bobtail to 
dumpling; please wait until 0.67.2 before doing so.

Upgrades from bobtail - cuttlefish - dumpling are fine.  It is only the 
long jump between versions that is problematic.

The fix is already in the dumpling branch.  Another point release will be 
out in the next day or two.

Thanks!
sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RGW blueprint for plugin architecture

2013-08-20 Thread Yehuda Sadeh
On Tue, Aug 20, 2013 at 9:03 AM, Roald van Loon roaldvanl...@gmail.com wrote:
 On Tue, Aug 20, 2013 at 4:49 PM, Yehuda Sadeh yeh...@inktank.com wrote:
 I was referring to your work at wip-rgw-plugin, where the plugin code
 itself still needs to rely on the rgw utility code.

 Right. So we can agree on ditching the dynamic loading thing and clean
 internal API (for now), but at least start separating code into
 plugins like this?

 That's not quite a hard dependency. At the moment it's like that, as
 we made a decision to use the S3 auth for the admin utilities.
 Switching to a different auth system (atm) would require defining a
 new auth class and inheriting from it instead. It's not very flexible,
 but it's not very intrusive.
 I'd certainly be interested in removing this inheritance relationship
 and switch to a different pipeline model.

 I don't know if you looked at it in detail, but for the wip-rgw-plugin
 work I created a RGWAuthManager / RGWAuthPipeline relation to
 seggregate authentication specific stuff from the REST handlers. Is
 that in general a model you like to see discussed in more detail? If
 so, it would probably be wise to start a separate blueprint for it.

I didn't look closely at all the details, but yeah, something along
those lines. But it'll need to be clearly defined.


 As I said, I don't see it as such. We do use it all over the place,
 but the same way you could just switch these to use
 RGWHandler_Auth_Swift and it should work (give or take a few tweaks).

 IMHO, REST handlers should leave
 authentication/authorization/accounting specific tasks to a separate
 component (like the aforementioned pipelining system, and maybe
 integrate that with current RGWUser related code), although this will
 likely never be purely abstracted (at least for authentication). This
 just makes the whole system more modular (albeit it just a bit).

Can't think of examples off the top of my head right now, but the
devil's always in the details. Hopefully wrt the auth system there
aren't many hidden issues.

 But for now I propose to implement a small plugin system where
 plugins are still linked into the rgw core (but code wise as much
 separated as possible), and keep the auth stuff for later.

Sounds good.

Yehuda
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


New Defects reported by Coverity Scan for ceph (fwd)

2013-08-20 Thread Sage Weil
Coverity picked up some issues with the filestore code.  These are mostly 
old issues that appear new becuase code moved around, but this is probably 
a good opportunity to fix them... :)

sage---BeginMessage---


Hi,

Please find the latest report on new defect(s) introduced to ceph found with 
Coverity Scan

Defect(s) Reported-by: Coverity Scan
Showing 7 of 9 defects

** CID 1063704: Uninitialized scalar field (UNINIT_CTOR)
/os/BtrfsFileStoreBackend.cc: 57

** CID 1063703: Time of check time of use (TOCTOU)
/os/GenericFileStoreBackend.cc: 170

** CID 1063702: Time of check time of use (TOCTOU)
/os/BtrfsFileStoreBackend.cc: 246

** CID 1063701: Copy into fixed size buffer (STRING_OVERFLOW)
/os/BtrfsFileStoreBackend.cc: 458

** CID 1063700: Copy into fixed size buffer (STRING_OVERFLOW)
/os/BtrfsFileStoreBackend.cc: 370

** CID 1063699: Resource leak (RESOURCE_LEAK)
/os/BtrfsFileStoreBackend.cc: 345

** CID 1063698: Improper use of negative value (NEGATIVE_RETURNS)



CID 1063704: Uninitialized scalar field (UNINIT_CTOR)

/os/BtrfsFileStoreBackend.h: 25 ( member_decl)
   22private:
   23  bool has_clone_range;   /// clone range ioctl is supported
   24  bool has_snap_create;   /// snap create ioctl is supported
 Class member declaration for has_snap_destroy.
   25  bool has_snap_destroy;  /// snap destroy ioctl is supported
   26  bool has_snap_create_v2;/// snap create v2 ioctl (async!) is 
supported
   27  bool has_wait_sync; /// wait sync ioctl is supported
   28  bool stable_commits;
   29  bool m_filestore_btrfs_clone_range;
  

/os/BtrfsFileStoreBackend.cc: 57 ( uninit_member)
   54GenericFileStoreBackend(fs), has_clone_range(false), 
has_snap_create(false),
   55has_snap_create_v2(false), has_wait_sync(false), 
stable_commits(false),
   56m_filestore_btrfs_clone_range(g_conf-filestore_btrfs_clone_range),
 CID 1063704: Uninitialized scalar field (UNINIT_CTOR)
 Non-static class member has_snap_destroy is not initialized in this 
 constructor nor in any functions that it calls.
   57m_filestore_btrfs_snap (g_conf-filestore_btrfs_snap) { }
   58
   59int BtrfsFileStoreBackend::detect_features()
   60{
   61  int r;
  

CID 1063703: Time of check time of use (TOCTOU)

/os/GenericFileStoreBackend.cc: 170 ( fs_check_call)
   167int GenericFileStoreBackend::create_current()
   168{
   169  struct stat st;
 CID 1063703: Time of check time of use (TOCTOU)
 Calling function stat(char const *, stat *) to perform check on 
 this-get_current_path()-c_str().
   170  int ret = ::stat(get_current_path().c_str(), st);
   171  if (ret == 0) {
   172// current/ exists
   173if (!S_ISDIR(st.st_mode)) {
   174  dout(0)  _create_current: current/ exists but is not a 
directory  dendl;
  

/os/GenericFileStoreBackend.cc: 178 ( toctou)
   175  ret = -EINVAL;
   176}
   177  } else {
 Calling function mkdir(char const *, __mode_t) that uses 
 this-get_current_path()-c_str() after a check function. This can cause 
 a time-of-check, time-of-use race condition.
   178ret = ::mkdir(get_current_path().c_str(), 0755);
   179if (ret  0) {
   180  ret = -errno;
   181  dout(0)  _create_current: mkdir   get_current_path()   
failed:  cpp_strerror(ret)  dendl;
   182}
  

CID 1063702: Time of check time of use (TOCTOU)

/os/BtrfsFileStoreBackend.cc: 246 ( fs_check_call)
   243int BtrfsFileStoreBackend::create_current()
   244{
   245  struct stat st;
 CID 1063702: Time of check time of use (TOCTOU)
 Calling function stat(char const *, stat *) to perform check on 
 this-get_current_path()-c_str().
   246  int ret = ::stat(get_current_path().c_str(), st);
   247  if (ret == 0) {
   248// current/ exists
   249if (!S_ISDIR(st.st_mode)) {
   250  dout(0)  create_current: current/ exists but is not a 
directory  dendl;
  

/os/BtrfsFileStoreBackend.cc: 288 ( toctou)
   285  }
   286
   287  dout(2)  create_current: created btrfs subvol   
get_current_path()  dendl;
 Calling function chmod(char const *, __mode_t) that uses 
 this-get_current_path()-c_str() after a check function. This can cause 
 a time-of-check, time-of-use race condition.
   288  if (::chmod(get_current_path().c_str(), 0755)  0) {
   289ret = -errno;
   290dout(0)  create_current: failed to chmod   
get_current_path()   to 0755: 
   291   cpp_strerror(ret)  dendl;
   292return ret;
  

CID 1063701: Copy into fixed size buffer (STRING_OVERFLOW)

/os/BtrfsFileStoreBackend.cc: 458 (