Fwd: OSD daemon changes port no

2012-11-29 Thread hemant surale
 does 'ceph mds dump' list pool 3 in teh data_pools line?

Yes. It lists the desired poolids I wanted to put data in.


-- Forwarded message --
From: hemant surale hemant.sur...@gmail.com
Date: Thu, Nov 29, 2012 at 2:59 PM
Subject: Re: OSD daemon changes port no
To: Sage Weil s...@inktank.com


I used a little different version of cephfs as cephfs
/home/hemant/a set_layout --pool 3 -c 1 -u  4194304 -s  4194304
 and cephfs /home/hemant/b set_layout --pool 5 -c 1 -u  4194304 -s  4194304.


Now cmd didnt showed any error but When I put data to dir a  b
ideally it should go to different pool but its not working as of now.
Whatever I am doing is it possible (to use 2 dir pointing to 2
different pools for data placement) ?



-
Hemant Surale.

On Tue, Nov 27, 2012 at 10:21 PM, Sage Weil s...@inktank.com wrote:
 On Tue, 27 Nov 2012, hemant surale wrote:
 I did mkdir a  chmod 777 a . So dir a is /home/hemant/a .
 then I used mount.ceph 10.72.148.245:/ /ho

 root@hemantsec-virtual-machine:/home/hemant# cephfs /home/hemant/a
 set_layout --pool 3
 Error setting layout: Invalid argument

 does 'ceph mds dump' list pool 3 in teh data_pools line?

 sage


 On Mon, Nov 26, 2012 at 9:56 PM, Sage Weil s...@inktank.com wrote:
  On Mon, 26 Nov 2012, hemant surale wrote:
  While I was using cephfs following error is observed -
  
  root@hemantsec-virtual-machine:~# cephfs /mnt/ceph/a --pool 3
  invalid command
 
  Try
 
   cephfs /mnt/ceph/a set_layout --pool 3
 
  (set_layout is the command)
 
  sage
 
  usage: cephfs path command [options]*
  Commands:
 show_layout-- view the layout information on a file or dir
 set_layout -- set the layout on an empty file,
   or the default layout on a directory
 show_location  -- view the location information on a file
  Options:
 Useful for setting layouts:
 --stripe_unit, -u:  set the size of each stripe
 --stripe_count, -c: set the number of objects to stripe across
 --object_size, -s:  set the size of the objects to stripe across
 --pool, -p: set the pool to use
 
 Useful for getting location data:
 --offset, -l:   the offset to retrieve location data for
 
  
  It may be silly question but unable to figure it out.
 
  :(
 
 
 
 
  On Wed, Nov 21, 2012 at 8:59 PM, Sage Weil s...@inktank.com wrote:
   On Wed, 21 Nov 2012, hemant surale wrote:
Oh I see.  Generally speaking, the only way to guarantee separation 
is to
put them in different pools and distribute the pools across 
different sets
of OSDs.
  
   yeah that was correct approach but i found problem doing so from
   abstract level i.e. when I put file inside mounted dir
   /home/hemant/cephfs  ( mounted using mount.ceph cmd ) . At that
   time anyways ceph is going to use default pool data to store files (
   here files were striped into different objects and then sent to
   appropriate osd ) .
  So how to tell ceph to use different pools in this case ?
  
   Goal : separate read and write operations , where read will be done
   from one group of OSD and write is done to other group of OSD.
  
   First create the other pool,
  
ceph osd pool create name
  
   and then adjust the CRUSH rule to distributed to a different set of OSDs
   for that pool.
  
   To allow cephfs use it,
  
ceph mds add_data_pool poolid
  
   and then:
  
cephfs /mnt/ceph/foo --pool poolid
  
   will set the policy on the directory such that new files beneath that
   point will be stored in a different pool.
  
   Hope that helps!
   sage
  
  
  
  
  
  
   -
   Hemant Surale.
  
  
   On Wed, Nov 21, 2012 at 12:33 PM, Sage Weil s...@inktank.com wrote:
On Wed, 21 Nov 2012, hemant surale wrote:
Its a little confusing question I believe .
   
Actually there are two files X  Y.  When I am reading X from its
primary .I want to make sure simultaneous writing of Y should go to
any other OSD except primary OSD for X (from where my current read 
is
getting served ) .
   
Oh I see.  Generally speaking, the only way to guarantee separation 
is to
put them in different pools and distribute the pools across 
different sets
of OSDs.  Otherwise, it's all (pseudo)random and you never know.  
Usually,
they will be different, particularly as the cluster size increases, 
but
sometimes they will be the same.
   
sage
   
   
   
   
-
Hemant Sural.e
   
On Wed, Nov 21, 2012 at 11:50 AM, Sage Weil s...@inktank.com 
wrote:
 On Wed, 21 Nov 2012, hemant surale wrote:
 and one more thing how can it be possible to read from one 
  osd and
  then simultaneous write to direct on other osd with less/no 
  traffic?
 
  I'm not sure I understand the 

[PATCHv3] rbd block driver fix race between aio completition and aio cancel

2012-11-29 Thread Stefan Priebe
This one fixes a race which qemu had also in iscsi block driver
between cancellation and io completition.

qemu_rbd_aio_cancel was not synchronously waiting for the end of
the command.

To archieve this it introduces a new status flag which uses
-EINPROGRESS.

Changes since last PATCH:
- fixed missing braces
- added vfree for bounce

Signed-off-by: Stefan Priebe s.pri...@profihost.ag
---
 block/rbd.c |   27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 0aaacaf..917c64c 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -77,6 +77,7 @@ typedef struct RBDAIOCB {
 int error;
 struct BDRVRBDState *s;
 int cancelled;
+int status;
 } RBDAIOCB;
 
 typedef struct RADOSCB {
@@ -376,12 +377,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 RBDAIOCB *acb = rcb-acb;
 int64_t r;
 
-if (acb-cancelled) {
-qemu_vfree(acb-bounce);
-qemu_aio_release(acb);
-goto done;
-}
-
 r = rcb-ret;
 
 if (acb-cmd == RBD_AIO_WRITE ||
@@ -406,10 +401,11 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 acb-ret = r;
 }
 }
+acb-status = 0;
+
 /* Note that acb-bh can be NULL in case where the aio was cancelled */
 acb-bh = qemu_bh_new(rbd_aio_bh_cb, acb);
 qemu_bh_schedule(acb-bh);
-done:
 g_free(rcb);
 }
 
@@ -568,6 +564,13 @@ static void qemu_rbd_aio_cancel(BlockDriverAIOCB *blockacb)
 {
 RBDAIOCB *acb = (RBDAIOCB *) blockacb;
 acb-cancelled = 1;
+
+while (acb-status == -EINPROGRESS) {
+qemu_aio_wait();
+}
+
+qemu_vfree(acb-bounce);
+qemu_aio_release(acb);
 }
 
 static const AIOCBInfo rbd_aiocb_info = {
@@ -640,7 +643,9 @@ static void rbd_aio_bh_cb(void *opaque)
 qemu_bh_delete(acb-bh);
 acb-bh = NULL;
 
-qemu_aio_release(acb);
+if (!acb-cancelled) {
+qemu_aio_release(acb);
+}
 }
 
 static int rbd_aio_discard_wrapper(rbd_image_t image,
@@ -685,6 +690,7 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
 acb-s = s;
 acb-cancelled = 0;
 acb-bh = NULL;
+acb-status = -EINPROGRESS;
 
 if (cmd == RBD_AIO_WRITE) {
 qemu_iovec_to_buf(acb-qiov, 0, acb-bounce, qiov-size);
@@ -731,7 +737,10 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState 
*bs,
 failed:
 g_free(rcb);
 s-qemu_aio_count--;
-qemu_aio_release(acb);
+if (!acb-cancelled) {
+qemu_vfree(acb-bounce);
+qemu_aio_release(acb);
+}
 return NULL;
 }
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rbd STDIN import does not work / wip-rbd-export-stdout

2012-11-29 Thread Stefan Priebe - Profihost AG

Hi Josh,

Am 28.11.2012 19:51, schrieb Josh Durgin:


No idea how to archieve this with git send-email ;-( But still more
important is also the patch for discards...


Use git format-patch, edit the patch file (it includes the basic
headers already), then send it with git send-email.


done / fixed


Your return value type change was already merged into master of
qemu.git as 08448d5195aeff49bf25fb62b4a6218f079f5284.


Oh thanks i didn't get a reply that it is applied.

Stefan
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RBD: periodic cephx issue ? CephxAuthorizeHandler::verify_authorizer isvalid=0

2012-11-29 Thread Sylvain Munaut
Hi,

I'm using RBD to store VM image and they're accessed through the
kernel client (xen vms).


In the client dmesg log, I see periodically :

Nov 29 10:46:48 b53-04 kernel: [160055.012206] libceph: osd8
10.208.2.213:6806 socket closed
Nov 29 10:46:48 b53-04 kernel: [160055.013635] libceph: osd8
10.208.2.213:6806 socket error on read


And in the matching osd log I find :

2012-11-29 10:46:48.130673 7f6018127700  0 -- 192.168.2.213:6806/944
 192.168.2.28:0/869804615 pipe(0xcf80600 sd=46 pgs=0 cs=0
l=0).accept peer addr is really 192.168.2.28:0/869804615 (socket is
192.168.2.28:40567/0)
2012-11-29 10:46:48.130902 7f6018127700  0 auth: could not find secret_id=0
2012-11-29 10:46:48.130912 7f6018127700  0 cephx: verify_authorizer
could not get service secret for service osd secret_id=0
2012-11-29 10:46:48.130915 7f6018127700  1
CephxAuthorizeHandler::verify_authorizer isvalid=0
2012-11-29 10:46:48.130917 7f6018127700  0 -- 192.168.2.213:6806/944
 192.168.2.28:0/869804615 pipe(0xcf80600 sd=46 pgs=0 cs=0
l=1).accept bad authorizer
2012-11-29 10:46:48.131132 7f6018127700  0 auth: could not find secret_id=0
2012-11-29 10:46:48.131146 7f6018127700  0 cephx: verify_authorizer
could not get service secret for service osd secret_id=0
2012-11-29 10:46:48.131151 7f6018127700  1
CephxAuthorizeHandler::verify_authorizer isvalid=0
2012-11-29 10:46:48.131154 7f6018127700  0 -- 192.168.2.213:6806/944
 192.168.2.28:0/869804615 pipe(0xcf80600 sd=46 pgs=0 cs=0
l=1).accept bad authorizer
2012-11-29 10:46:48.824180 7f6018127700  0 -- 192.168.2.213:6806/944
 192.168.2.28:0/869804615 pipe(0xaf5de00 sd=46 pgs=0 cs=0
l=0).accept peer addr is really 192.168.2.28:0/869804615 (socket is
192.168.2.28:40568/0)
2012-11-29 10:46:48.824585 7f6018127700  1
CephxAuthorizeHandler::verify_authorizer isvalid=1
2012-11-29 10:46:48.825013 7f601f484700  0 osd.8 951 pg[3.514( v
950'1138 (223'137,950'1138] n=15 ec=10 les/c 941/948 916/916/916)
[8,7] r=0 lpr=916 mlcod 950'1137 active+clean] watch:
ctx-obc=0xb72e340 cookie=2 oi.version=1109 ctx-at_version=951'1139
2012-11-29 10:46:48.825024 7f601f484700  0 osd.8 951 pg[3.514( v
950'1138 (223'137,950'1138] n=15 ec=10 les/c 941/948 916/916/916)
[8,7] r=0 lpr=916 mlcod 950'1137 active+clean] watch:
oi.user_version=755


Note that this doesn't seem to pose any operational issue (i.e. it
works ... when it retries it eventually connects).

My configuration: The client currently runs on a debian wheezy and use
a custom built 3.6.8 kernel that contains all the latest ceph rbd
patch AFAIK but the problem was also showing up with earlier kernel
versions. The cluster is a 0.48.2 running on Ubuntu 12.04 LTS.

Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rbd block driver fix race between aio completition and aio cancel

2012-11-29 Thread Stefan Hajnoczi
On Thu, Nov 22, 2012 at 11:00:19AM +0100, Stefan Priebe wrote:
 @@ -406,10 +401,11 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
  acb-ret = r;
  }
  }
 +acb-status = 0;
 +

I suggest doing this in the BH.  The qemu_aio_wait() loop in
qemu_rbd_aio_cancel() needs to wait until the BH has executed.  By
clearing status in the BH we ensure that no matter in which order
qemu_aio_wait() invokes BHs and callbacks, we'll always wait until the
BH has completed before ending the while loop in qemu_rbd_aio_cancel().

 @@ -737,7 +741,8 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState 
 *bs,
  failed:
  g_free(rcb);
  s-qemu_aio_count--;
 -qemu_aio_release(acb);
 +if (!acb-cancelled)
 +qemu_aio_release(acb);
  return NULL;
  }

This scenario is impossible.  We haven't returned the acb back to the
caller yet so they could not have invoked qemu_aio_cancel().

Stefan
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv4] rbd block driver fix race between aio completition and aio cancel

2012-11-29 Thread Stefan Priebe
This one fixes a race which qemu had also in iscsi block driver
between cancellation and io completition.

qemu_rbd_aio_cancel was not synchronously waiting for the end of
the command.

To archieve this it introduces a new status flag which uses
-EINPROGRESS.

Changes since PATCHv3:
- removed unnecessary if condition in rbd_start_aio as we
  haven't start io yet
- moved acb-status = 0 to rbd_aio_bh_cb so qemu_aio_wait always
  waits until BH was executed

Changes since PATCHv2:
- fixed missing braces
- added vfree for bounce

Signed-off-by: Stefan Priebe s.pri...@profihost.ag
---
 block/rbd.c |   16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index f3becc7..28e94ab 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -77,6 +77,7 @@ typedef struct RBDAIOCB {
 int error;
 struct BDRVRBDState *s;
 int cancelled;
+int status;
 } RBDAIOCB;
 
 typedef struct RADOSCB {
@@ -376,12 +377,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 RBDAIOCB *acb = rcb-acb;
 int64_t r;
 
-if (acb-cancelled) {
-qemu_vfree(acb-bounce);
-qemu_aio_release(acb);
-goto done;
-}
-
 r = rcb-ret;
 
 if (acb-cmd == RBD_AIO_WRITE ||
@@ -409,7 +404,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 /* Note that acb-bh can be NULL in case where the aio was cancelled */
 acb-bh = qemu_bh_new(rbd_aio_bh_cb, acb);
 qemu_bh_schedule(acb-bh);
-done:
 g_free(rcb);
 }
 
@@ -568,6 +562,12 @@ static void qemu_rbd_aio_cancel(BlockDriverAIOCB *blockacb)
 {
 RBDAIOCB *acb = (RBDAIOCB *) blockacb;
 acb-cancelled = 1;
+
+while (acb-status == -EINPROGRESS) {
+qemu_aio_wait();
+}
+
+qemu_vfree(acb-bounce);
 }
 
 static const AIOCBInfo rbd_aiocb_info = {
@@ -639,6 +639,7 @@ static void rbd_aio_bh_cb(void *opaque)
 acb-common.cb(acb-common.opaque, (acb-ret  0 ? 0 : acb-ret));
 qemu_bh_delete(acb-bh);
 acb-bh = NULL;
+acb-status = 0;
 
 qemu_aio_release(acb);
 }
@@ -685,6 +686,7 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
 acb-s = s;
 acb-cancelled = 0;
 acb-bh = NULL;
+acb-status = -EINPROGRESS;
 
 if (cmd == RBD_AIO_WRITE) {
 qemu_iovec_to_buf(acb-qiov, 0, acb-bounce, qiov-size);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rbd block driver fix race between aio completition and aio cancel

2012-11-29 Thread Stefan Priebe - Profihost AG

Hi,

i hope i've done everything correctly. I've send a new v4 patch.

Am 29.11.2012 14:58, schrieb Stefan Hajnoczi:

On Thu, Nov 22, 2012 at 11:00:19AM +0100, Stefan Priebe wrote:

@@ -406,10 +401,11 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
  acb-ret = r;
  }
  }
+acb-status = 0;
+


I suggest doing this in the BH.  The qemu_aio_wait() loop in
qemu_rbd_aio_cancel() needs to wait until the BH has executed.  By
clearing status in the BH we ensure that no matter in which order
qemu_aio_wait() invokes BHs and callbacks, we'll always wait until the
BH has completed before ending the while loop in qemu_rbd_aio_cancel().


@@ -737,7 +741,8 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
  failed:
  g_free(rcb);
  s-qemu_aio_count--;
-qemu_aio_release(acb);
+if (!acb-cancelled)
+qemu_aio_release(acb);
  return NULL;
  }


This scenario is impossible.  We haven't returned the acb back to the
caller yet so they could not have invoked qemu_aio_cancel().


Greets,
Stefan
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ceph: don't reference req after put

2012-11-29 Thread Alex Elder
In __unregister_request(), there is a call to list_del_init()
referencing a request that was the subject of a call to
ceph_osdc_put_request() on the previous line.  This is not
safe, because the request structure could have been freed
by the time we reach the list_del_init().

Fix this by reversing the order of these lines.

Signed-off-by: Alex Elder el...@inktank.com
---
 net/ceph/osd_client.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 9b6f0e4..d1177ec 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -797,9 +797,9 @@ static void __unregister_request(struct
ceph_osd_client *osdc,
req-r_osd = NULL;
}

+   list_del_init(req-r_req_lru_item);
ceph_osdc_put_request(req);

-   list_del_init(req-r_req_lru_item);
if (osdc-num_requests == 0) {
dout( no requests, canceling timeout\n);
__cancel_osd_timeout(osdc);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4] rbd block driver fix race between aio completition and aio cancel

2012-11-29 Thread Paolo Bonzini


- Messaggio originale -
 Da: Stefan Priebe s.pri...@profihost.ag
 A: qemu-de...@nongnu.org
 Cc: stefa...@gmail.com, josh durgin josh.dur...@inktank.com, 
 ceph-devel@vger.kernel.org, pbonz...@redhat.com,
 Stefan Priebe s.pri...@profihost.ag
 Inviato: Giovedì, 29 novembre 2012 15:28:35
 Oggetto: [PATCHv4] rbd block driver fix race between aio completition and aio 
 cancel
 
 This one fixes a race which qemu had also in iscsi block driver
 between cancellation and io completition.
 
 qemu_rbd_aio_cancel was not synchronously waiting for the end of
 the command.
 
 To archieve this it introduces a new status flag which uses
 -EINPROGRESS.
 
 Changes since PATCHv3:
 - removed unnecessary if condition in rbd_start_aio as we
   haven't start io yet
 - moved acb-status = 0 to rbd_aio_bh_cb so qemu_aio_wait always
   waits until BH was executed
 
 Changes since PATCHv2:
 - fixed missing braces
 - added vfree for bounce
 
 Signed-off-by: Stefan Priebe s.pri...@profihost.ag
 ---
  block/rbd.c |   16 +---
  1 file changed, 9 insertions(+), 7 deletions(-)
 
 diff --git a/block/rbd.c b/block/rbd.c
 index f3becc7..28e94ab 100644
 --- a/block/rbd.c
 +++ b/block/rbd.c
 @@ -77,6 +77,7 @@ typedef struct RBDAIOCB {
  int error;
  struct BDRVRBDState *s;
  int cancelled;
 +int status;
  } RBDAIOCB;
  
  typedef struct RADOSCB {
 @@ -376,12 +377,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
  RBDAIOCB *acb = rcb-acb;
  int64_t r;
  
 -if (acb-cancelled) {
 -qemu_vfree(acb-bounce);
 -qemu_aio_release(acb);
 -goto done;
 -}
 -
  r = rcb-ret;
  
  if (acb-cmd == RBD_AIO_WRITE ||
 @@ -409,7 +404,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
  /* Note that acb-bh can be NULL in case where the aio was
  cancelled */
  acb-bh = qemu_bh_new(rbd_aio_bh_cb, acb);
  qemu_bh_schedule(acb-bh);
 -done:
  g_free(rcb);
  }
  
 @@ -568,6 +562,12 @@ static void qemu_rbd_aio_cancel(BlockDriverAIOCB
 *blockacb)
  {
  RBDAIOCB *acb = (RBDAIOCB *) blockacb;
  acb-cancelled = 1;
 +
 +while (acb-status == -EINPROGRESS) {
 +qemu_aio_wait();
 +}
 +
 +qemu_vfree(acb-bounce);

This vfree is not needed, since the BH will run and do the free.

Otherwise looks ok.

  }
  
  static const AIOCBInfo rbd_aiocb_info = {
 @@ -639,6 +639,7 @@ static void rbd_aio_bh_cb(void *opaque)
  acb-common.cb(acb-common.opaque, (acb-ret  0 ? 0 :
  acb-ret));
  qemu_bh_delete(acb-bh);
  acb-bh = NULL;
 +acb-status = 0;
  
  qemu_aio_release(acb);
  }
 @@ -685,6 +686,7 @@ static BlockDriverAIOCB
 *rbd_start_aio(BlockDriverState *bs,
  acb-s = s;
  acb-cancelled = 0;
  acb-bh = NULL;
 +acb-status = -EINPROGRESS;
  
  if (cmd == RBD_AIO_WRITE) {
  qemu_iovec_to_buf(acb-qiov, 0, acb-bounce, qiov-size);
 --
 1.7.10.4
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rbd block driver fix race between aio completition and aio cancel

2012-11-29 Thread Paolo Bonzini

  @@ -574,6 +570,12 @@ static void
  qemu_rbd_aio_cancel(BlockDriverAIOCB *blockacb)
{
RBDAIOCB *acb = (RBDAIOCB *) blockacb;
acb-cancelled = 1;
  +
  +while (acb-status == -EINPROGRESS) {
  +qemu_aio_wait();
  +}
  +
 
 There should be a qemu_vfree(acb-bounce); here

No, because the BH will have run at this point and you'd doubly-free
the buffer.

Paolo

  +qemu_aio_release(acb);
}
 
static AIOPool rbd_aio_pool = {
  @@ -646,7 +648,8 @@ static void rbd_aio_bh_cb(void *opaque)
qemu_bh_delete(acb-bh);
acb-bh = NULL;
 
  -qemu_aio_release(acb);
  +if (!acb-cancelled)
  +qemu_aio_release(acb);
}
 
static int rbd_aio_discard_wrapper(rbd_image_t image,
  @@ -691,6 +694,7 @@ static BlockDriverAIOCB
  *rbd_start_aio(BlockDriverState *bs,
acb-s = s;
acb-cancelled = 0;
acb-bh = NULL;
  +acb-status = -EINPROGRESS;
 
if (cmd == RBD_AIO_WRITE) {
qemu_iovec_to_buf(acb-qiov, 0, acb-bounce, qiov-size);
  @@ -737,7 +741,8 @@ static BlockDriverAIOCB
  *rbd_start_aio(BlockDriverState *bs,
failed:
g_free(rcb);
s-qemu_aio_count--;
  -qemu_aio_release(acb);
  +if (!acb-cancelled)
 
 qemu_vfree(acb-bounce) should be here as well, although that's a
 separate bug that's probably never hit.
 
  +qemu_aio_release(acb);
return NULL;
}
 
 
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parsing in the ceph osd subsystem

2012-11-29 Thread Sage Weil
On Thu, 29 Nov 2012, Andrey Korolyov wrote:
 $ ceph osd down -
 osd.0 is already down
 $ ceph osd down ---
 osd.0 is already down
 
 the same for ``+'', ``/'', ``%'' and so - I think that for osd subsys
 ceph cli should explicitly work only with positive integers plus zero,
 refusing all other input.

which branch is this?  this parsing is cleaned u pin the latest 
next/master.



 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RBD: periodic cephx issue ? CephxAuthorizeHandler::verify_authorizer isvalid=0

2012-11-29 Thread Sage Weil
Hi Sylvian,

Can you attach/post the whole log somewhere?  I'm curious what is leading 
up to it not having secret_id=0.  Ideally with 'debug auth = 20' and 
'debug osd = 20' and 'debug ms = 1'.

Thanks!
sage


On Thu, 29 Nov 2012, Sylvain Munaut wrote:

 Hi,
 
 I'm using RBD to store VM image and they're accessed through the
 kernel client (xen vms).
 
 
 In the client dmesg log, I see periodically :
 
 Nov 29 10:46:48 b53-04 kernel: [160055.012206] libceph: osd8
 10.208.2.213:6806 socket closed
 Nov 29 10:46:48 b53-04 kernel: [160055.013635] libceph: osd8
 10.208.2.213:6806 socket error on read
 
 
 And in the matching osd log I find :
 
 2012-11-29 10:46:48.130673 7f6018127700  0 -- 192.168.2.213:6806/944
  192.168.2.28:0/869804615 pipe(0xcf80600 sd=46 pgs=0 cs=0
 l=0).accept peer addr is really 192.168.2.28:0/869804615 (socket is
 192.168.2.28:40567/0)
 2012-11-29 10:46:48.130902 7f6018127700  0 auth: could not find secret_id=0
 2012-11-29 10:46:48.130912 7f6018127700  0 cephx: verify_authorizer
 could not get service secret for service osd secret_id=0
 2012-11-29 10:46:48.130915 7f6018127700  1
 CephxAuthorizeHandler::verify_authorizer isvalid=0
 2012-11-29 10:46:48.130917 7f6018127700  0 -- 192.168.2.213:6806/944
  192.168.2.28:0/869804615 pipe(0xcf80600 sd=46 pgs=0 cs=0
 l=1).accept bad authorizer
 2012-11-29 10:46:48.131132 7f6018127700  0 auth: could not find secret_id=0
 2012-11-29 10:46:48.131146 7f6018127700  0 cephx: verify_authorizer
 could not get service secret for service osd secret_id=0
 2012-11-29 10:46:48.131151 7f6018127700  1
 CephxAuthorizeHandler::verify_authorizer isvalid=0
 2012-11-29 10:46:48.131154 7f6018127700  0 -- 192.168.2.213:6806/944
  192.168.2.28:0/869804615 pipe(0xcf80600 sd=46 pgs=0 cs=0
 l=1).accept bad authorizer
 2012-11-29 10:46:48.824180 7f6018127700  0 -- 192.168.2.213:6806/944
  192.168.2.28:0/869804615 pipe(0xaf5de00 sd=46 pgs=0 cs=0
 l=0).accept peer addr is really 192.168.2.28:0/869804615 (socket is
 192.168.2.28:40568/0)
 2012-11-29 10:46:48.824585 7f6018127700  1
 CephxAuthorizeHandler::verify_authorizer isvalid=1
 2012-11-29 10:46:48.825013 7f601f484700  0 osd.8 951 pg[3.514( v
 950'1138 (223'137,950'1138] n=15 ec=10 les/c 941/948 916/916/916)
 [8,7] r=0 lpr=916 mlcod 950'1137 active+clean] watch:
 ctx-obc=0xb72e340 cookie=2 oi.version=1109 ctx-at_version=951'1139
 2012-11-29 10:46:48.825024 7f601f484700  0 osd.8 951 pg[3.514( v
 950'1138 (223'137,950'1138] n=15 ec=10 les/c 941/948 916/916/916)
 [8,7] r=0 lpr=916 mlcod 950'1137 active+clean] watch:
 oi.user_version=755
 
 
 Note that this doesn't seem to pose any operational issue (i.e. it
 works ... when it retries it eventually connects).
 
 My configuration: The client currently runs on a debian wheezy and use
 a custom built 3.6.8 kernel that contains all the latest ceph rbd
 patch AFAIK but the problem was also showing up with earlier kernel
 versions. The cluster is a 0.48.2 running on Ubuntu 12.04 LTS.
 
 Cheers,
 
 Sylvain
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ceph: don't reference req after put

2012-11-29 Thread Sage Weil
Reviewed-by: Sage Weil s...@inktank.com

On Thu, 29 Nov 2012, Alex Elder wrote:

 In __unregister_request(), there is a call to list_del_init()
 referencing a request that was the subject of a call to
 ceph_osdc_put_request() on the previous line.  This is not
 safe, because the request structure could have been freed
 by the time we reach the list_del_init().
 
 Fix this by reversing the order of these lines.
 
 Signed-off-by: Alex Elder el...@inktank.com
 ---
  net/ceph/osd_client.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
 index 9b6f0e4..d1177ec 100644
 --- a/net/ceph/osd_client.c
 +++ b/net/ceph/osd_client.c
 @@ -797,9 +797,9 @@ static void __unregister_request(struct
 ceph_osd_client *osdc,
   req-r_osd = NULL;
   }
 
 + list_del_init(req-r_req_lru_item);
   ceph_osdc_put_request(req);
 
 - list_del_init(req-r_req_lru_item);
   if (osdc-num_requests == 0) {
   dout( no requests, canceling timeout\n);
   __cancel_osd_timeout(osdc);
 -- 
 1.7.9.5
 
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parsing in the ceph osd subsystem

2012-11-29 Thread Andrey Korolyov
On Thu, Nov 29, 2012 at 8:34 PM, Sage Weil s...@inktank.com wrote:
 On Thu, 29 Nov 2012, Andrey Korolyov wrote:
 $ ceph osd down -
 osd.0 is already down
 $ ceph osd down ---
 osd.0 is already down

 the same for ``+'', ``/'', ``%'' and so - I think that for osd subsys
 ceph cli should explicitly work only with positive integers plus zero,
 refusing all other input.

 which branch is this?  this parsing is cleaned u pin the latest
 next/master.



It was produced by 0.54-tag. I have built
dd3a24a647d0b0f1153cf1b102ed1f51d51be2f2 today and problem has
gone(except parsing ``-0'' as 0 and 0/001 as 0 and 1
correspondingly).


 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parsing in the ceph osd subsystem

2012-11-29 Thread Joao Eduardo Luis
On 11/29/2012 07:01 PM, Andrey Korolyov wrote:
 On Thu, Nov 29, 2012 at 8:34 PM, Sage Weil s...@inktank.com wrote:
 On Thu, 29 Nov 2012, Andrey Korolyov wrote:
 $ ceph osd down -
 osd.0 is already down
 $ ceph osd down ---
 osd.0 is already down

 the same for ``+'', ``/'', ``%'' and so - I think that for osd subsys
 ceph cli should explicitly work only with positive integers plus zero,
 refusing all other input.

 which branch is this?  this parsing is cleaned u pin the latest
 next/master.


 
 It was produced by 0.54-tag. I have built
 dd3a24a647d0b0f1153cf1b102ed1f51d51be2f2 today and problem has
 gone(except parsing ``-0'' as 0 and 0/001 as 0 and 1
 correspondingly).

We use strtol() to parse numeric values, and '-0', '0' or '1'
are valid numeric values. I suppose we could enforce the argument to be
numeric only, hence getting rid of '-0' and enforce stricter checks on
the parameters to rule out valid numeric values that look funny, which
in the '0*\d' cases it should be fairly simple.

  -Joao
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Client crash on getcwd with non-default root mount

2012-11-29 Thread Noah Watkins
I'm getting the assert failure below with the following test:

  ceph_mount(cmount, /otherdir);
  ceph_getcwd(cmount);

--

client/Inode.h: In function 'Dentry* Inode::get_first_parent()' thread
7fded47c8780 time 2012-11-29 11:49:00.890184
client/Inode.h: 165: FAILED assert(!dn_set.empty())
 ceph version 0.54-808-g1ed5a1f (1ed5a1f984d8260d86cc25b1ae95ffedf597e579)
 1: (()+0x11ee89) [0x7fded36fae89]
 2: (()+0x1429d3) [0x7fded371e9d3]
 3: (ceph_getcwd()+0x11) [0x7fded36fdb41]
 4: (MountedTest2_XYZ_Test::TestBody()+0x63a) [0x42563a]
 5: (testing::Test::Run()+0xaa) [0x45017a]
 6: (testing::internal::TestInfoImpl::Run()+0x100) [0x450280]
 7: (testing::TestCase::Run()+0xbd) [0x45034d]
 8: (testing::internal::UnitTestImpl::RunAllTests()+0x217) [0x4505b7]
 9: (main()+0x35) [0x423115]
 10: (__libc_start_main()+0xed) [0x7fded2d2876d]
 11: /home/nwatkins/projects/ceph/ceph/src/.libs/lt-test_libcephfs() [0x423171]
 NOTE: a copy of the executable, or `objdump -rdS executable` is
needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted (core dumped)

Thanks,
Noah
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Client crash on getcwd with non-default root mount

2012-11-29 Thread Sam Lang

On 11/29/2012 01:52 PM, Noah Watkins wrote:

I'm getting the assert failure below with the following test:

   ceph_mount(cmount, /otherdir);


This should fail with ENOENT if you check the return code.
-sam


   ceph_getcwd(cmount);

--

client/Inode.h: In function 'Dentry* Inode::get_first_parent()' thread
7fded47c8780 time 2012-11-29 11:49:00.890184
client/Inode.h: 165: FAILED assert(!dn_set.empty())
  ceph version 0.54-808-g1ed5a1f (1ed5a1f984d8260d86cc25b1ae95ffedf597e579)
  1: (()+0x11ee89) [0x7fded36fae89]
  2: (()+0x1429d3) [0x7fded371e9d3]
  3: (ceph_getcwd()+0x11) [0x7fded36fdb41]
  4: (MountedTest2_XYZ_Test::TestBody()+0x63a) [0x42563a]
  5: (testing::Test::Run()+0xaa) [0x45017a]
  6: (testing::internal::TestInfoImpl::Run()+0x100) [0x450280]
  7: (testing::TestCase::Run()+0xbd) [0x45034d]
  8: (testing::internal::UnitTestImpl::RunAllTests()+0x217) [0x4505b7]
  9: (main()+0x35) [0x423115]
  10: (__libc_start_main()+0xed) [0x7fded2d2876d]
  11: /home/nwatkins/projects/ceph/ceph/src/.libs/lt-test_libcephfs() [0x423171]
  NOTE: a copy of the executable, or `objdump -rdS executable` is
needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted (core dumped)

Thanks,
Noah
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Client crash on getcwd with non-default root mount

2012-11-29 Thread Noah Watkins
Oh, let me clarify. /otherdir exists, and the mount succeeds.

- Noah

On Thu, Nov 29, 2012 at 11:58 AM, Sam Lang sam.l...@inktank.com wrote:
 On 11/29/2012 01:52 PM, Noah Watkins wrote:

 I'm getting the assert failure below with the following test:

ceph_mount(cmount, /otherdir);


 This should fail with ENOENT if you check the return code.
 -sam

ceph_getcwd(cmount);

 --

 client/Inode.h: In function 'Dentry* Inode::get_first_parent()' thread
 7fded47c8780 time 2012-11-29 11:49:00.890184
 client/Inode.h: 165: FAILED assert(!dn_set.empty())
   ceph version 0.54-808-g1ed5a1f
 (1ed5a1f984d8260d86cc25b1ae95ffedf597e579)
   1: (()+0x11ee89) [0x7fded36fae89]
   2: (()+0x1429d3) [0x7fded371e9d3]
   3: (ceph_getcwd()+0x11) [0x7fded36fdb41]
   4: (MountedTest2_XYZ_Test::TestBody()+0x63a) [0x42563a]
   5: (testing::Test::Run()+0xaa) [0x45017a]
   6: (testing::internal::TestInfoImpl::Run()+0x100) [0x450280]
   7: (testing::TestCase::Run()+0xbd) [0x45034d]
   8: (testing::internal::UnitTestImpl::RunAllTests()+0x217) [0x4505b7]
   9: (main()+0x35) [0x423115]
   10: (__libc_start_main()+0xed) [0x7fded2d2876d]
   11: /home/nwatkins/projects/ceph/ceph/src/.libs/lt-test_libcephfs()
 [0x423171]
   NOTE: a copy of the executable, or `objdump -rdS executable` is
 needed to interpret this.
 terminate called after throwing an instance of 'ceph::FailedAssertion'
 Aborted (core dumped)

 Thanks,
 Noah
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Client crash on getcwd with non-default root mount

2012-11-29 Thread Noah Watkins
Here is the full test case:

TEST(LibCephFS, MountRootChdir) {
  struct ceph_mount_info *cmount;

  /* create mount and new directory */
  ASSERT_EQ(ceph_create(cmount, NULL), 0);
  ASSERT_EQ(ceph_conf_read_file(cmount, NULL), 0);
  ASSERT_EQ(ceph_mount(cmount, /), 0);
  ASSERT_EQ(ceph_mkdir(cmount, /xyz, 0700), 0);
  ceph_shutdown(cmount);

  /* create mount with non-/ root */
  ASSERT_EQ(ceph_create(cmount, NULL), 0);
  ASSERT_EQ(ceph_conf_read_file(cmount, NULL), 0);
  ASSERT_EQ(ceph_mount(cmount, /xyz), 0);

   /* should be at root directory, but blows up */
  ASSERT_STREQ(ceph_getcwd(cmount), /);
}

On Thu, Nov 29, 2012 at 12:02 PM, Noah Watkins jayh...@cs.ucsc.edu wrote:
 Oh, let me clarify. /otherdir exists, and the mount succeeds.

 - Noah

 On Thu, Nov 29, 2012 at 11:58 AM, Sam Lang sam.l...@inktank.com wrote:
 On 11/29/2012 01:52 PM, Noah Watkins wrote:

 I'm getting the assert failure below with the following test:

ceph_mount(cmount, /otherdir);


 This should fail with ENOENT if you check the return code.
 -sam

ceph_getcwd(cmount);

 --

 client/Inode.h: In function 'Dentry* Inode::get_first_parent()' thread
 7fded47c8780 time 2012-11-29 11:49:00.890184
 client/Inode.h: 165: FAILED assert(!dn_set.empty())
   ceph version 0.54-808-g1ed5a1f
 (1ed5a1f984d8260d86cc25b1ae95ffedf597e579)
   1: (()+0x11ee89) [0x7fded36fae89]
   2: (()+0x1429d3) [0x7fded371e9d3]
   3: (ceph_getcwd()+0x11) [0x7fded36fdb41]
   4: (MountedTest2_XYZ_Test::TestBody()+0x63a) [0x42563a]
   5: (testing::Test::Run()+0xaa) [0x45017a]
   6: (testing::internal::TestInfoImpl::Run()+0x100) [0x450280]
   7: (testing::TestCase::Run()+0xbd) [0x45034d]
   8: (testing::internal::UnitTestImpl::RunAllTests()+0x217) [0x4505b7]
   9: (main()+0x35) [0x423115]
   10: (__libc_start_main()+0xed) [0x7fded2d2876d]
   11: /home/nwatkins/projects/ceph/ceph/src/.libs/lt-test_libcephfs()
 [0x423171]
   NOTE: a copy of the executable, or `objdump -rdS executable` is
 needed to interpret this.
 terminate called after throwing an instance of 'ceph::FailedAssertion'
 Aborted (core dumped)

 Thanks,
 Noah
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rbd map command hangs for 15 minutes during system start up

2012-11-29 Thread Alex Elder
On 11/22/2012 12:04 PM, Nick Bartos wrote:
 Here are the ceph log messages (including the libceph kernel debug
 stuff you asked for) from a node boot with the rbd command hung for a
 couple of minutes:

Nick, I have put together a branch that includes two fixes
that might be helpful.  I don't expect these fixes will
necessarily *fix* what you're seeing, but one of them
pulls a big hunk of processing out of the picture and
might help eliminate some potential causes.  I had to
pull in several other patches as prerequisites in order
to get those fixes to apply cleanly.

Would you be able to give it a try, and let us know what
results you get?  The branch contains:
- Linux 3.5.5
- Plus the first 49 patches you listed
- Plus four patches, which are prerequisites...
libceph: define ceph_extract_encoded_string()
rbd: define some new format constants
rbd: define rbd_dev_image_id()
rbd: kill create_snap sysfs entry
- ...for these two bug fixes:
libceph: remove 'osdtimeout' option
ceph: don't reference req after put

The branch is available in the ceph-client git repository
under the name wip-nick and has commit id dd9323aa.
https://github.com/ceph/ceph-client/tree/wip-nick

 https://raw.github.com/gist/4132395/7cb5f0150179b012429c6e57749120dd88616cce/gistfile1.txt

This full debug output is very helpful.  Please supply
that again as well.

Thanks.

-Alex

 On Wed, Nov 21, 2012 at 9:49 PM, Nick Bartos n...@pistoncloud.com wrote:
 It's very easy to reproduce now with my automated install script, the
 most I've seen it succeed with that patch is 2 in a row, and hanging
 on the 3rd, although it hangs on most builds.  So it shouldn't take
 much to get it to do it again.  I'll try and get to that tomorrow,
 when I'm a bit more rested and my brain is working better.

 Yes during this the OSDs are probably all syncing up.  All the osd and
 mon daemons have started by the time the rdb commands are ran, though.

 On Wed, Nov 21, 2012 at 8:47 PM, Sage Weil s...@inktank.com wrote:
 On Wed, 21 Nov 2012, Nick Bartos wrote:
 FYI the build which included all 3.5 backports except patch #50 is
 still going strong after 21 builds.

 Okay, that one at least makes some sense.  I've opened

 http://tracker.newdream.net/issues/3519

 How easy is this to reproduce?  If it is something you can trigger with
 debugging enabled ('echo module libceph +p 
 /sys/kernel/debug/dynamic_debug/control') that would help tremendously.

 I'm guessing that during this startup time the OSDs are still in the
 process of starting?

 Alex, I bet that a test that does a lot of map/unmap stuff in a loop while
 thrashing OSDs could hit this.

 Thanks!
 sage



 On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos n...@pistoncloud.com wrote:
 With 8 successful installs already done, I'm reasonably confident that
 it's patch #50.  I'm making another build which applies all patches
 from the 3.5 backport branch, excluding that specific one.  I'll let
 you know if that turns up any unexpected failures.

 What will the potential fall out be for removing that specific patch?


 On Wed, Nov 21, 2012 at 9:02 AM, Nick Bartos n...@pistoncloud.com wrote:
 It's really looking like it's the
 libceph_resubmit_linger_ops_when_pg_mapping_changes commit.  When
 patches 1-50 (listed below) are applied to 3.5.7, the hang is present.
  So far I have gone through 4 successful installs with no hang with
 only 1-49 applied.  I'm still leaving my test run to make sure it's
 not a fluke, but since previously it hangs within the first couple of
 builds, it really looks like this is where the problem originated.

 1-libceph_eliminate_connection_state_DEAD.patch
 2-libceph_kill_bad_proto_ceph_connection_op.patch
 3-libceph_rename_socket_callbacks.patch
 4-libceph_rename_kvec_reset_and_kvec_add_functions.patch
 5-libceph_embed_ceph_messenger_structure_in_ceph_client.patch
 6-libceph_start_separating_connection_flags_from_state.patch
 7-libceph_start_tracking_connection_socket_state.patch
 8-libceph_provide_osd_number_when_creating_osd.patch
 9-libceph_set_CLOSED_state_bit_in_con_init.patch
 10-libceph_embed_ceph_connection_structure_in_mon_client.patch
 11-libceph_drop_connection_refcounting_for_mon_client.patch
 12-libceph_init_monitor_connection_when_opening.patch
 13-libceph_fully_initialize_connection_in_con_init.patch
 14-libceph_tweak_ceph_alloc_msg.patch
 15-libceph_have_messages_point_to_their_connection.patch
 16-libceph_have_messages_take_a_connection_reference.patch
 17-libceph_make_ceph_con_revoke_a_msg_operation.patch
 18-libceph_make_ceph_con_revoke_message_a_msg_op.patch
 19-libceph_fix_overflow_in___decode_pool_names.patch
 20-libceph_fix_overflow_in_osdmap_decode.patch
 21-libceph_fix_overflow_in_osdmap_apply_incremental.patch
 22-libceph_transition_socket_state_prior_to_actual_connect.patch
 23-libceph_fix_NULL_dereference_in_reset_connection.patch
 24-libceph_use_con_get_put_methods.patch
 

Re: Client crash on getcwd with non-default root mount

2012-11-29 Thread Sam Lang

On 11/29/2012 02:12 PM, Noah Watkins wrote:

Here is the full test case:


Sorry - I was assuming it was just an issue with checking the return 
code.  I've pushed a one-line fix to wip-mount-subdir.  You can 
cherry-pick to your branch if you want.


-sam



TEST(LibCephFS, MountRootChdir) {
   struct ceph_mount_info *cmount;

   /* create mount and new directory */
   ASSERT_EQ(ceph_create(cmount, NULL), 0);
   ASSERT_EQ(ceph_conf_read_file(cmount, NULL), 0);
   ASSERT_EQ(ceph_mount(cmount, /), 0);
   ASSERT_EQ(ceph_mkdir(cmount, /xyz, 0700), 0);
   ceph_shutdown(cmount);

   /* create mount with non-/ root */
   ASSERT_EQ(ceph_create(cmount, NULL), 0);
   ASSERT_EQ(ceph_conf_read_file(cmount, NULL), 0);
   ASSERT_EQ(ceph_mount(cmount, /xyz), 0);

/* should be at root directory, but blows up */
   ASSERT_STREQ(ceph_getcwd(cmount), /);
}

On Thu, Nov 29, 2012 at 12:02 PM, Noah Watkins jayh...@cs.ucsc.edu wrote:

Oh, let me clarify. /otherdir exists, and the mount succeeds.

- Noah

On Thu, Nov 29, 2012 at 11:58 AM, Sam Lang sam.l...@inktank.com wrote:

On 11/29/2012 01:52 PM, Noah Watkins wrote:


I'm getting the assert failure below with the following test:

ceph_mount(cmount, /otherdir);



This should fail with ENOENT if you check the return code.
-sam


ceph_getcwd(cmount);

--

client/Inode.h: In function 'Dentry* Inode::get_first_parent()' thread
7fded47c8780 time 2012-11-29 11:49:00.890184
client/Inode.h: 165: FAILED assert(!dn_set.empty())
   ceph version 0.54-808-g1ed5a1f
(1ed5a1f984d8260d86cc25b1ae95ffedf597e579)
   1: (()+0x11ee89) [0x7fded36fae89]
   2: (()+0x1429d3) [0x7fded371e9d3]
   3: (ceph_getcwd()+0x11) [0x7fded36fdb41]
   4: (MountedTest2_XYZ_Test::TestBody()+0x63a) [0x42563a]
   5: (testing::Test::Run()+0xaa) [0x45017a]
   6: (testing::internal::TestInfoImpl::Run()+0x100) [0x450280]
   7: (testing::TestCase::Run()+0xbd) [0x45034d]
   8: (testing::internal::UnitTestImpl::RunAllTests()+0x217) [0x4505b7]
   9: (main()+0x35) [0x423115]
   10: (__libc_start_main()+0xed) [0x7fded2d2876d]
   11: /home/nwatkins/projects/ceph/ceph/src/.libs/lt-test_libcephfs()
[0x423171]
   NOTE: a copy of the executable, or `objdump -rdS executable` is
needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted (core dumped)

Thanks,
Noah
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RBD: periodic cephx issue ? CephxAuthorizeHandler::verify_authorizer isvalid=0

2012-11-29 Thread Sylvain Munaut
fwding to the list as I forgot to hit reply all ...

 Can you attach/post the whole log somewhere?  I'm curious what is leading
 up to it not having secret_id=0.  Ideally with 'debug auth = 20' and
 'debug osd = 20' and 'debug ms = 1'.

 Well without the debug options there isn't anything else than that in
 the log, just this snippet several time.

 I haven't caught him with the log active ... unfortunately I can't
 leave it more than a couple of hours with those options as it fills up
 the root filesystem with log way too fast and it seems to not happen
 when you just restarted an OSD ...

 Usually it show up by burst (so like 10 times or so in an hour then
 nothing for a day for example).

 Cheers,

 Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv5] rbd block driver fix race between aio completition and aio cancel

2012-11-29 Thread Stefan Priebe
This one fixes a race which qemu had also in iscsi block driver
between cancellation and io completition.

qemu_rbd_aio_cancel was not synchronously waiting for the end of
the command.

To archieve this it introduces a new status flag which uses
-EINPROGRESS.

Changes since PATCHv4:
- removed unnecessary qemu_vfree of acb-bounce as BH will always
  run

Changes since PATCHv3:
- removed unnecessary if condition in rbd_start_aio as we
  haven't start io yet
- moved acb-status = 0 to rbd_aio_bh_cb so qemu_aio_wait always
  waits until BH was executed

Changes since PATCHv2:
- fixed missing braces
- added vfree for bounce

Signed-off-by: Stefan Priebe s.pri...@profihost.ag

---
 block/rbd.c |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index f3becc7..3bc9c7a 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -77,6 +77,7 @@ typedef struct RBDAIOCB {
 int error;
 struct BDRVRBDState *s;
 int cancelled;
+int status;
 } RBDAIOCB;
 
 typedef struct RADOSCB {
@@ -376,12 +377,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 RBDAIOCB *acb = rcb-acb;
 int64_t r;
 
-if (acb-cancelled) {
-qemu_vfree(acb-bounce);
-qemu_aio_release(acb);
-goto done;
-}
-
 r = rcb-ret;
 
 if (acb-cmd == RBD_AIO_WRITE ||
@@ -409,7 +404,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 /* Note that acb-bh can be NULL in case where the aio was cancelled */
 acb-bh = qemu_bh_new(rbd_aio_bh_cb, acb);
 qemu_bh_schedule(acb-bh);
-done:
 g_free(rcb);
 }
 
@@ -568,6 +562,10 @@ static void qemu_rbd_aio_cancel(BlockDriverAIOCB *blockacb)
 {
 RBDAIOCB *acb = (RBDAIOCB *) blockacb;
 acb-cancelled = 1;
+
+while (acb-status == -EINPROGRESS) {
+qemu_aio_wait();
+}
 }
 
 static const AIOCBInfo rbd_aiocb_info = {
@@ -639,6 +637,7 @@ static void rbd_aio_bh_cb(void *opaque)
 acb-common.cb(acb-common.opaque, (acb-ret  0 ? 0 : acb-ret));
 qemu_bh_delete(acb-bh);
 acb-bh = NULL;
+acb-status = 0;
 
 qemu_aio_release(acb);
 }
@@ -685,6 +684,7 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
 acb-s = s;
 acb-cancelled = 0;
 acb-bh = NULL;
+acb-status = -EINPROGRESS;
 
 if (cmd == RBD_AIO_WRITE) {
 qemu_iovec_to_buf(acb-qiov, 0, acb-bounce, qiov-size);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4] rbd block driver fix race between aio completition and aio cancel

2012-11-29 Thread Stefan Priebe

Hi Paolo,

Am 29.11.2012 16:23, schrieb Paolo Bonzini:

+qemu_vfree(acb-bounce);


This vfree is not needed, since the BH will run and do the free.


new patch v5 sent.

Greets,
Stefan
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parsing in the ceph osd subsystem

2012-11-29 Thread Joao Eduardo Luis
On 11/29/2012 07:01 PM, Andrey Korolyov wrote:
 On Thu, Nov 29, 2012 at 8:34 PM, Sage Weil s...@inktank.com wrote:
 On Thu, 29 Nov 2012, Andrey Korolyov wrote:
 $ ceph osd down -
 osd.0 is already down
 $ ceph osd down ---
 osd.0 is already down

 the same for ``+'', ``/'', ``%'' and so - I think that for osd subsys
 ceph cli should explicitly work only with positive integers plus zero,
 refusing all other input.

 which branch is this?  this parsing is cleaned u pin the latest
 next/master.


 
 It was produced by 0.54-tag. I have built
 dd3a24a647d0b0f1153cf1b102ed1f51d51be2f2 today and problem has
 gone(except parsing ``-0'' as 0 and 0/001 as 0 and 1
 correspondingly).

A fix for the signed parameter has been pushed to next. However, after
consideration, when it comes to the '0+\d' parameters, that kind of
input was considered valid; Greg put it best on IRC, and I quote:

gregaf joao: not sure we want to prevent 01 from parsing as 1, I
suspect some people with large clusters will find that useful so they
can conflate the name and ID while keeping everything three digits

Hope this makes sense to you.

  -Joao
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What is the new command to add osd to the crushmap to enable it to receive data

2012-11-29 Thread Joao Eduardo Luis
On 11/30/2012 01:22 AM, Isaac Otsiabah wrote:
 This command below which adds a new to the crushmap to enable it to receive 
 data has changed and does not work anymore.
 
 
 ceph osd crush set {id} {name}
 
 
 Please, what is the new command to add a new osd to the crushmap to enable it 
 to receive data?

You must specify a weight and a location. For instance,

  ceph osd crush set 0 osd.0 1.0 root=default

Also, you can check up on the docs for more infos.

From the docs at [1]:

Add the OSD to the CRUSH map so that it can begin receiving data. You
may also decompile the CRUSH map, add the OSD to the device list, add
the host as a bucket (if it’s not already in the CRUSH map), add the
device as an item in the host, assign it a weight, recompile it and set
it. See Add/Move an OSD for details.

ceph osd crush set {id} {name} {weight} pool={pool-name}
[{bucket-type}={bucket-name} ...]


[1] http://ceph.com/docs/master/rados/operations/add-or-rm-osds/


  -Joao
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] mds: fix journaling issue regarding rstat accounting

2012-11-29 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com

Rename operation can call predirty_journal_parents() twice. So
directory fragment's rstat can be modified twice. But only the
first modification is journaled because EMetaBlob::add_dir() does
not update existing dirlump.

Signed-off-by: Yan, Zheng zheng.z@intel.com
---
 src/mds/events/EMetaBlob.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/mds/events/EMetaBlob.h b/src/mds/events/EMetaBlob.h
index 9c281e9..116b704 100644
--- a/src/mds/events/EMetaBlob.h
+++ b/src/mds/events/EMetaBlob.h
@@ -635,12 +635,12 @@ private:
   dirty, complete, isnew);
   }
   dirlump add_dir(dirfrag_t df, fnode_t *pf, version_t pv, bool dirty, bool 
complete=false, bool isnew=false) {
-if (lump_map.count(df) == 0) {
+if (lump_map.count(df) == 0)
   lump_order.push_back(df);
-  lump_map[df].fnode = *pf;
-  lump_map[df].fnode.version = pv;
-}
+
 dirlump l = lump_map[df];
+l.fnode = *pf;
+l.fnode.version = pv;
 if (complete) l.mark_complete();
 if (dirty) l.mark_dirty();
 if (isnew) l.mark_new();
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] mds: fixes for mds

2012-11-29 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com

Hi,

The 1st patch fixes a rstat accounting bug. The 5th patch fixes journal
replay bug, the fix requires a minor disk format change.   

These patches are also in:
  git://github.com/ukernel/ceph.git wip-mds

Regards
Yan, Zheng
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] mds: call eval() after cap is imported

2012-11-29 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com

The migrator calls eval() for imported caps after importing a
directory tree. We should do the same thing after importing a
inode.

Signed-off-by: Yan, Zheng zheng.z@intel.com
---
 src/mds/Migrator.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mds/Migrator.cc b/src/mds/Migrator.cc
index 41d97e9..fcc06cd 100644
--- a/src/mds/Migrator.cc
+++ b/src/mds/Migrator.cc
@@ -2613,6 +2613,7 @@ void Migrator::logged_import_caps(CInode *in,
 
   assert(cap_imports.count(in));
   finish_import_inode_caps(in, from, cap_imports[in]);  
+  mds-locker-eval(in, CEPH_CAP_LOCKS);
 
   mds-send_message_mds(new MExportCapsAck(in-ino()), from);
 }
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] mds: fix handle_client_openc() hang

2012-11-29 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com

handle_client_openc() calls handle_client_open() if the linkage isn't
null. handle_client_open() calls rdlock_path_pin_ref() which returns
mdr-in[0] directly because mdr-done_locking is true. the problem here
is that mdr-in[0] can be NULL if the linkage is remote.

Signed-off-by: Yan, Zheng zheng.z@intel.com
---
 src/mds/Server.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/mds/Server.cc b/src/mds/Server.cc
index 4c66f4a..59d7d3c 100644
--- a/src/mds/Server.cc
+++ b/src/mds/Server.cc
@@ -2637,6 +2637,12 @@ void Server::handle_client_openc(MDRequest *mdr)
   reply_request(mdr, -EEXIST, dnl-get_inode(), dn);
   return;
 } 
+
+mdcache-request_drop_non_rdlocks(mdr);
+
+// remote link, avoid rdlock_path_pin_ref() returning null
+if (!mdr-in[0])
+  mdr-done_locking = false;
 
 handle_client_open(mdr);
 return;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] mds: alllow handle_client_readdir() fetching freezing dir.

2012-11-29 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com

At that point, the request already auth pins some objects.
So CDir::fetch() should ignore can_auth_pin check.

Signed-off-by: Yan, Zheng zheng.z@intel.com
---
 src/mds/Server.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/mds/Server.cc b/src/mds/Server.cc
index 59d7d3c..2c59f25 100644
--- a/src/mds/Server.cc
+++ b/src/mds/Server.cc
@@ -2741,9 +2741,14 @@ void Server::handle_client_readdir(MDRequest *mdr)
   assert(dir-is_auth());
 
   if (!dir-is_complete()) {
+if (dir-is_frozen()) {
+  dout(7)  dir is frozen   *dir  dendl;
+  dir-add_waiter(CDir::WAIT_UNFREEZE, new C_MDS_RetryRequest(mdcache, 
mdr));
+  return;
+}
 // fetch
 dout(10)   incomplete dir contents for readdir on   *dir  , 
fetching  dendl;
-dir-fetch(new C_MDS_RetryRequest(mdcache, mdr));
+dir-fetch(new C_MDS_RetryRequest(mdcache, mdr), true);
 return;
   }
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] mds: compare sessionmap version before replaying imported sessions

2012-11-29 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com

Otherwise we may wrongly increase mds-sessionmap.version, which
will confuse future journal replays that involving sessionmap.

Signed-off-by: Yan, Zheng zheng.z@intel.com
---
 src/mds/Server.cc|  2 ++
 src/mds/events/EUpdate.h |  8 ++--
 src/mds/journal.cc   | 24 +---
 3 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/src/mds/Server.cc b/src/mds/Server.cc
index 2c59f25..a10c503 100644
--- a/src/mds/Server.cc
+++ b/src/mds/Server.cc
@@ -5425,6 +5425,8 @@ void Server::handle_client_rename(MDRequest *mdr)
   }
   
   _rename_prepare(mdr, le-metablob, le-client_map, srcdn, destdn, straydn);
+  if (le-client_map.length())
+le-cmapv = mds-sessionmap.projected;
 
   // -- commit locally --
   C_MDS_rename_finish *fin = new C_MDS_rename_finish(mds, mdr, srcdn, destdn, 
straydn);
diff --git a/src/mds/events/EUpdate.h b/src/mds/events/EUpdate.h
index 6ce18fe..a302a5a 100644
--- a/src/mds/events/EUpdate.h
+++ b/src/mds/events/EUpdate.h
@@ -23,13 +23,14 @@ public:
   EMetaBlob metablob;
   string type;
   bufferlist client_map;
+  version_t cmapv;
   metareqid_t reqid;
   bool had_slaves;
 
   EUpdate() : LogEvent(EVENT_UPDATE) { }
   EUpdate(MDLog *mdlog, const char *s) : 
 LogEvent(EVENT_UPDATE), metablob(mdlog),
-type(s), had_slaves(false) { }
+type(s), cmapv(0), had_slaves(false) { }
   
   void print(ostream out) {
 if (type.length())
@@ -38,12 +39,13 @@ public:
   }
 
   void encode(bufferlist bl) const {
-__u8 struct_v = 2;
+__u8 struct_v = 3;
 ::encode(struct_v, bl);
 ::encode(stamp, bl);
 ::encode(type, bl);
 ::encode(metablob, bl);
 ::encode(client_map, bl);
+::encode(cmapv, bl);
 ::encode(reqid, bl);
 ::encode(had_slaves, bl);
   } 
@@ -55,6 +57,8 @@ public:
 ::decode(type, bl);
 ::decode(metablob, bl);
 ::decode(client_map, bl);
+if (struct_v = 3)
+  ::decode(cmapv, bl);
 ::decode(reqid, bl);
 ::decode(had_slaves, bl);
   }
diff --git a/src/mds/journal.cc b/src/mds/journal.cc
index 04b1a92..3f6d5eb 100644
--- a/src/mds/journal.cc
+++ b/src/mds/journal.cc
@@ -999,14 +999,24 @@ void EUpdate::replay(MDS *mds)
 mds-mdcache-add_uncommitted_master(reqid, _segment, slaves);
   }
   
-  // open client sessions?
-  mapclient_t,entity_inst_t cm;
-  mapclient_t, uint64_t seqm;
   if (client_map.length()) {
-bufferlist::iterator blp = client_map.begin();
-::decode(cm, blp);
-mds-server-prepare_force_open_sessions(cm, seqm);
-mds-server-finish_force_open_sessions(cm, seqm);
+if (mds-sessionmap.version = cmapv) {
+  dout(10)  EUpdate.replay sessionmap v   cmapv
+= table   mds-sessionmap.version  dendl;
+} else {
+  dout(10)  EUpdate.replay sessionmap   mds-sessionmap.version
+   cmapv  dendl;
+  // open client sessions?
+  mapclient_t,entity_inst_t cm;
+  mapclient_t, uint64_t seqm;
+  bufferlist::iterator blp = client_map.begin();
+  ::decode(cm, blp);
+  mds-server-prepare_force_open_sessions(cm, seqm);
+  mds-server-finish_force_open_sessions(cm, seqm);
+
+  assert(mds-sessionmap.version = cmapv);
+  mds-sessionmap.projected = mds-sessionmap.version;
+}
   }
 }
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ceph: re-calculate truncate_size for strip object

2012-11-29 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com

Otherwise osd may truncate the object to larger size.

Signed-off-by: Yan, Zheng zheng.z@intel.com
---
 net/ceph/osd_client.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index ccbdfbb..f8b0e56 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -76,8 +76,16 @@ int ceph_calc_raw_layout(struct ceph_osd_client *osdc,
 orig_len - *plen, off, *plen);
 
if (op_has_extent(op-op)) {
+   u32 osize = le32_to_cpu(layout-fl_object_size);
op-extent.offset = objoff;
op-extent.length = objlen;
+   if (op-extent.truncate_size = off - objoff) {
+   op-extent.truncate_size = 0;
+   } else {
+   op-extent.truncate_size -= off - objoff;
+   if (op-extent.truncate_size  osize)
+   op-extent.truncate_size = osize;
+   }
}
req-r_num_pages = calc_pages_for(off, *plen);
req-r_page_alignment = off  ~PAGE_MASK;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD daemon changes port no

2012-11-29 Thread hemant surale
Hi Sage,Community ,
   I am unable to use 2 directories to direct data to 2 different
pools. I did following expt.

Created 2 pool host  ghost to seperate data placement .
--//crushmap file
---
# begin crush map

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3

# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 pool
type 7 ghost

# buckets
host hemantone-mirror-virtual-machine {
id -6   # do not change unnecessarily
# weight 1.000
alg straw
hash 0  # rjenkins1
item osd.2 weight 1.000
}
host hemantone-virtual-machine {
id -7   # do not change unnecessarily
# weight 1.000
alg straw
hash 0  # rjenkins1
item osd.1 weight 1.000
}
rack one {
id -2   # do not change unnecessarily
# weight 2.000
alg straw
hash 0  # rjenkins1
item hemantone-mirror-virtual-machine weight 1.000
item hemantone-virtual-machine weight 1.000
}
ghost hemant-virtual-machine {
id -4   # do not change unnecessarily
# weight 1.000
alg straw
hash 0  # rjenkins1
item osd.0 weight 1.000
}
ghost hemant-mirror-virtual-machine {
id -5   # do not change unnecessarily
# weight 1.000
alg straw
hash 0  # rjenkins1
item osd.3 weight 1.000
}
rack two {
id -3   # do not change unnecessarily
# weight 2.000
alg straw
hash 0  # rjenkins1
item hemant-virtual-machine weight 1.000
item hemant-mirror-virtual-machine weight 1.000
}
pool default {
id -1   # do not change unnecessarily
# weight 4.000
alg straw
hash 0  # rjenkins1
item one weight 2.000
item two weight 2.000
}

# rules
rule data {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step take one
step chooseleaf firstn 0 type host
step emit
}
rule metadata {
ruleset 1
type replicated
min_size 1
max_size 10
step take default
step take one
step chooseleaf firstn 0 type host
step emit
}
rule rbd {
ruleset 2
type replicated
min_size 1
max_size 10
step take default
step take one
step chooseleaf firstn 0 type host
step emit
}
rule forhost {
ruleset 3
type replicated
min_size 1
max_size 10
step take default
step take one
step chooseleaf firstn 0 type host
step emit
}
rule forghost {
ruleset 4
type replicated
min_size 1
max_size 10
step take default
step take two
step chooseleaf firstn 0 type ghost
step emit
}

# end crush map

1) set replication factor to 2. and crushrule accordingly . ( host
got crush_ruleset = 3  ghost pool got  crush_ruleset = 4).
2) Now I mounted data to dir.  using mount.ceph 10.72.148.245:6789:/
/home/hemant/xmount.ceph 10.72.148.245:6789:/ /home/hemant/y
3) then mds add_data_pool 5   mds add_data_pool 6  ( here pool id
are host = 5, ghost = 6)
4) cephfs /home/hemant/x set_layout --pool 5 -c 1 -u 4194304 -s
4194304   cephfs /home/hemant/y set_layout --pool 6 -c 1 -u 4194304
-s 4194304

PROBLEM:
 $ cephfs /home/hemant/x show_layout
layout.data_pool: 6
layout.object_size:   4194304
layout.stripe_unit:   4194304
layout.stripe_count:  1
cephfs /home/hemant/y show_layout
layout.data_pool: 6
layout.object_size:   4194304
layout.stripe_unit:   4194304
layout.stripe_count:  1

Both dir are using same pool to place data even after I specified to
use separate using cephfs cmd.
Please help me figure this out.

-
Hemant Surale.


On Thu, Nov 29, 2012 at 3:45 PM, hemant surale hemant.sur...@gmail.com wrote:
 does 'ceph mds dump' list pool 3 in teh data_pools line?

 Yes. It lists the desired poolids I wanted to put data in.


 -- Forwarded message --
 From: hemant surale hemant.sur...@gmail.com
 Date: Thu, Nov 29, 2012 at 2:59 PM
 Subject: Re: OSD daemon changes port no
 To: Sage Weil s...@inktank.com


 I used a little different version of cephfs as cephfs
 /home/hemant/a set_layout --pool 3 -c 1 -u  4194304 -s  4194304
  and cephfs /home/hemant/b set_layout --pool 5 -c 1 -u  4194304 -s  4194304.


 Now cmd didnt showed any error but When I put data to dir a  b
 ideally it should go to different pool but its not working as of now.
 Whatever I am doing is it possible (to use 2 dir pointing to 2
 different pools for data placement) ?



 -
 Hemant Surale.

 On Tue, Nov 27, 2012 at 10:21 PM, Sage Weil