Re: [ceph-users] OSD Restarts cause excessively high load average and requests are blocked 32 sec

2014-03-23 Thread Martin B Nielsen
Hi,

I can see ~17% hardware interrupts which I find a little high - can you
make sure all load is spread over all your cores (/proc/interrupts)?

What about disk util once you restart them? Are they all 100% utilized or
is it 'only' mostly cpu-bound?

Also you're running a monitor on this node - how is the load on the nodes
where you run a monitor compared to those where you dont?

Cheers,
Martin


On Thu, Mar 20, 2014 at 10:18 AM, Quenten Grasso qgra...@onq.com.au wrote:

  Hi All,



 I left out my OS/kernel version, Ubuntu 12.04.4 LTS w/ Kernel
 3.10.33-031033-generic (We upgrade our kernels to 3.10 due to Dell Drivers).



 Here's an example of starting all the OSD's after a reboot.



 top - 09:10:51 up 2 min,  1 user,  load average: 332.93, 112.28, 39.96

 Tasks: 310 total,   1 running, 309 sleeping,   0 stopped,   0 zombie

 Cpu(s): 50.3%us, 32.5%sy,  0.0%ni,  0.0%id,  0.0%wa, 17.2%hi,  0.0%si,
 0.0%st

 Mem:  32917276k total,  6331224k used, 26586052k free, 1332k buffers

 Swap: 33496060k total,0k used, 33496060k free,  1474084k cached



   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

 15875 root  20   0  910m 381m  50m S   60  1.2   0:50.57 ceph-osd

 2996 root  20   0  867m 330m  44m S   59  1.0   0:58.32 ceph-osd

 4502 root  20   0  907m 372m  47m S   58  1.2   0:55.14 ceph-osd

 12465 root  20   0  949m 418m  55m S   58  1.3   0:51.79 ceph-osd

 4171 root  20   0  886m 348m  45m S   57  1.1   0:56.17 ceph-osd

 3707 root  20   0  941m 405m  50m S   57  1.3   0:59.68 ceph-osd

 3560 root  20   0  924m 394m  51m S   56  1.2   0:59.37 ceph-osd

 4318 root  20   0  965m 435m  55m S   56  1.4   0:54.80 ceph-osd

 3337 root  20   0  935m 407m  51m S   56  1.3   1:01.96 ceph-osd

 3854 root  20   0  897m 366m  48m S   55  1.1   1:00.55 ceph-osd

 3143 root  20   0 1364m 424m  24m S   16  1.3   1:08.72 ceph-osd

 2509 root  20   0  652m 261m  62m S2  0.8   0:26.42 ceph-mon

 4 root  20   0 000 S0  0.0   0:00.08 kworker/0:0



 Regards,

 Quenten Grasso



 *From:* ceph-users-boun...@lists.ceph.com [mailto:
 ceph-users-boun...@lists.ceph.com] *On Behalf Of *Quenten Grasso
 *Sent:* Tuesday, 18 March 2014 10:19 PM
 *To:* 'ceph-users@lists.ceph.com'
 *Subject:* [ceph-users] OSD Restarts cause excessively high load average
 and requests are blocked  32 sec



 Hi All,



 I'm trying to troubleshoot a strange issue with my Ceph cluster.



 We're Running Ceph Version 0.72.2

 All Nodes are Dell R515's w/ 6C AMD CPU w/ 32GB Ram, 12 x 3TB NearlineSAS
 Drives and 2 x 100GB Intel DC S3700 SSD's for Journals.

 All Pools have a replica of 2 or better. I.e. metadata replica of 3.



 I have 55 OSD's in the cluster across 5 nodes. When I restart the OSD's on
 a single node (any node) the load average of that node shoots up to 230+
 and the whole cluster starts blocking IO requests until it settles down and
 its fine again.



 Any ideas on why the load average goes so crazy  starts to block IO?





 snips from my ceph.conf

 [osd]

 osd data = /var/ceph/osd.$id

 osd journal size = 15000

 osd mkfs type = xfs

 osd mkfs options xfs = -i size=2048 -f

 osd mount options xfs =
 rw,noexec,nodev,noatime,nodiratime,barrier=0,inode64,logbufs=8,logbsize=256k

 osd max backfills = 5

 osd recovery max active = 3



 [osd.0]

 host = pbnerbd01

 public addr = 10.100.96.10

 cluster addr = 10.100.128.10

 osd journal =
 /dev/disk/by-id/scsi-36b8ca3a0eaa2660019deaf8d3a40bec4-part1

 devs = /dev/sda4

 /end



 Thanks,

 Quenten



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mounting with dmcrypt still fails

2014-03-23 Thread Michael Lukzak
Hi,

After looking to code in ceph-disk I came to the same conclusion, problem is 
with the mapping.

Here are quote form ceph-disk

def get_partition_dev(dev, pnum):

get the device name for a partition

assume that partitions are named like the base dev, with a number, and 
optionally
some intervening characters (like 'p').  e.g.,

   sda 1 - sda1
   cciss/c0d1 1 - cciss!c0d1p1


Script are looking for partitions labeled as sdb[X] or p[X], where [x] 
means number of partitions (counted from 1).
Dm-crypt are creating some new mapping in /dev/mapper/, example 
/dev/mapper/osd0 as main block device and /dev/mapper/osdp1 as first partition 
and /dev/mapper/osdp2 as second partition.

But real path to osd0 device is NOT /dev/mapper/osd0 but /dev/dm-0 (sic!), and 
/dev/dm-1 is as first partition (osdp1), /dev/dm-2 is as second partition 
(osdp2).

Conlusion. If we are using dm-crypt the script in ceph-disk should not looking 
partitions like sda partition 1 - sda1 or osd0 partition 1- osdp1 but should 
looking for partitions labeled as /dev/dm-X (counted from 1).

Block deviceReal path
/dev/mapper/osd0 - /dev/dm-0

First partition   Real path
/dev/mapper/osd0p1 - /dev/dm-1

Second partition  Real path
/dev/mapper/osd0p2 - /dev/dm-2

Continuing, 'ceph-disk activate' should mount dm-crypted partitions not by 
using /dev/disk/by-partuuid, but /dev/disk/by-uuid

--
Best regards,
Michel Lukzak



 ceph-disk-prepare --fs-type xfs --dmcrypt --dmcrypt-key-dir 
 /etc/ceph/dmcrypt-keys --cluster ceph -- /dev/sdb
 ceph-disk: Error: Device /dev/sdb2 is in use by a device-mapper mapping 
 (dm-crypt?): dm-0

 It sounds like device-mapper still thinks it's using the the volume,
 you might be able to track it down with this:

 for i in `ls -1 /sys/block/ | grep sd`; do echo $i: `ls
 /sys/block/$i/${i}1/holders/`; done

 Then it's a matter of making sure there are no open file handles on
 the encrypted volume and unmounting it. You will still need to
 completely clear out the partition table on that disk, which can be
 tricky with GPT because it's not as simple as dd'in the start of the
 volume. This is what the zapdisk parameter is for in
 ceph-disk-prepare, I don't know enough about ceph-deploy to know if
 you can somehow pass it.

 After you know the device/dm mapping you can use udevadm to find out
 where it should map to (uuids replaced with xxx's):

 udevadm test /block/sdc/sdc1
 snip
 run: '/sbin/cryptsetup --key-file /etc/ceph/dmcrypt-keys/x
 --key-size 256 create  /dev/sdc1'
 run: '/bin/bash -c 'while [ ! -e /dev/mapper/x ];do sleep 1; done''
 run: '/usr/sbin/ceph-disk-activate /dev/mapper/x'


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 答复: why object can't be recovered when delete one replica

2014-03-23 Thread ljm李嘉敏
Hi Kyle,

Thank you very much for your explanation, I have triggered the relative pg to 
scrub, but the secondary replica which I remove manually isn't recovered, 
it only shows that instructing pg xx.xxx on osd.x to scrub.

PS: I use the ceph-deploy to deploy the ceph cluster, and the ceph.conf is the 
default configuration.

Thanks  Regards
Li JiaMin

-邮件原件-
发件人: Kyle Bader [mailto:kyle.ba...@gmail.com] 
发送时间: 2014年3月23日 10:05
收件人: ljm李嘉敏
抄送: ceph-users@lists.ceph.com
主题: Re: [ceph-users] why object can't be recovered when delete one replica

 I upload a file through swift API, then delete it in the “current” 
 directory in the secondary OSD manually, why the object can’t be recovered?

 If I delete it in the primary OSD, the object is deleted directly in 
 the pool .rgw.bucket and it can’t be recovered from the secondary OSD.

 Do anyone know this behavior?

This is because the placement group containing that object likely needs to 
scrub (just a light scrub should do). The scrub will compare the two replicas, 
notice the replica is missing from the secondary and trigger recovery/backfill. 
Can you try scrubbing the placement group containing the removed object and let 
us know if it triggers recovery?

-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crash when client goes to sleep

2014-03-23 Thread Mohd Bazli Ab Karim
Hi Hong,

Could you apply the patch and see if it crash after sleep?
This could lead us to find the correct fix to MDS/client too.

As what I can see here, this patch should fix the crash, but how to fix MDS if 
the crash happens?
It happened to us, when it crashed, it was totally crash, and even restart the 
ceph-mds service with --reset-journal also not helping.
Anyone can shed some lights on this matter?

p/s: Is there any steps/tools to backup the MDS metadata? Say if MDS crash and 
refuse to run normally, can we restore the backup metadata? I'm thinking of it 
as a preventive steps, just in case if it happens again in future.

Many thanks.
Bazli

-Original Message-
From: Yan, Zheng [mailto:uker...@gmail.com]
Sent: Sunday, March 23, 2014 2:53 PM
To: Sage Weil
Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] MDS crash when client goes to sleep

On Sun, Mar 23, 2014 at 11:47 AM, Sage Weil s...@inktank.com wrote:
 Hi,

 I looked at this a bit earlier and wasn't sure why we would be getting
 a remote_reset event after a sleep/wake cycle.  The patch should fix
 the crash, but I'm a bit worried something is not quite right on the
 client side, too...


When client wakes up, it first tries reconnecting the old session. MDS refuses 
the reconnect request and sends a session close message to the client. After 
receiving the session close message, client closes the old session, then sends 
a session open message to the MDS.  The MDS receives the open request and 
triggers a remote reset
(Pipe.cc:466)

 sage

 On Sun, 23 Mar 2014, Yan, Zheng wrote:

 thank you for reporting this. Below patch should fix this issue

 ---
 diff --git a/src/mds/MDS.cc b/src/mds/MDS.cc index 57c7f4a..6b53c14
 100644
 --- a/src/mds/MDS.cc
 +++ b/src/mds/MDS.cc
 @@ -2110,6 +2110,7 @@ bool MDS::ms_handle_reset(Connection *con)
if (session-is_closed()) {
   dout(3)  ms_handle_reset closing connection for session  
 session-info.inst  dendl;
   messenger-mark_down(con);
 + con-set_priv(NULL);
   sessionmap.remove_session(session);
}
session-put();
 @@ -2138,6 +2139,7 @@ void MDS::ms_handle_remote_reset(Connection *con)
if (session-is_closed()) {
   dout(3)  ms_handle_remote_reset closing connection for session 
  session-info.inst  dendl;
   messenger-mark_down(con);
 + con-set_priv(NULL);
   sessionmap.remove_session(session);
}
session-put();

 On Fri, Mar 21, 2014 at 4:16 PM, Mohd Bazli Ab Karim
 bazli.abka...@mimos.my wrote:
  Hi Hong,
 
 
 
  How's the client now? Would it able to mount to the filesystem now?
  It looks similar to our case,
  http://www.spinics.net/lists/ceph-devel/msg18395.html
 
  However, you need to collect some logs to confirm this.
 
 
 
  Thanks.
 
 
 
 
 
  From: hjcho616 [mailto:hjcho...@yahoo.com]
  Sent: Friday, March 21, 2014 2:30 PM
 
 
  To: Luke Jing Yuan
  Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
  Subject: Re: [ceph-users] MDS crash when client goes to sleep
 
 
 
  Luke,
 
 
 
  Not sure what flapping ceph-mds daemon mean, but when I connected
  to MDS when this happened there no longer was any process with
  ceph-mds when I ran one daemon.  When I ran three there were one
  left but wasn't doing much.  I didn't record the logs but behavior
  was very similar in 0.72 emperor.  I am using debian packages.
 
 
 
  Client went to sleep for a while (like 8+ hours).  There was no I/O
  prior to the sleep other than the fact that cephfs was still mounted.
 
 
 
  Regards,
 
  Hong
 
 
 
  
 
  From: Luke Jing Yuan jyl...@mimos.my
 
 
  To: hjcho616 hjcho...@yahoo.com
  Cc: Mohd Bazli Ab Karim bazli.abka...@mimos.my;
  ceph-users@lists.ceph.com ceph-users@lists.ceph.com
  Sent: Friday, March 21, 2014 1:17 AM
 
  Subject: RE: [ceph-users] MDS crash when client goes to sleep
 
 
  Hi Hong,
 
  That's interesting, for Mr. Bazli and I, we ended with MDS stuck in
  (up:replay) and a flapping ceph-mds daemon, but then again we are
  using version 0.72.2. Having said so the triggering point seem
  similar to us as well, which is the following line:
 
-38 2014-03-20 20:08:44.495565 7fee3d7c4700  0 --
  192.168.1.20:6801/17079
  192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0
  cs=0 l=0
  c=0x1f0e2160).accept we reset (peer sent cseq 2), sending
  RESETSESSION
 
  So how long did your client go into sleep? Was there any I/O prior
  to the sleep?
 
  Regards,
  Luke
 
  From: hjcho616 [mailto:hjcho...@yahoo.com]
  Sent: Friday, 21 March, 2014 12:09 PM
  To: Luke Jing Yuan
  Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
  Subject: Re: [ceph-users] MDS crash when client goes to sleep
 
  Nope just these segfaults.
 
  [149884.709608] ceph-mds[17366]: segfault at 200 ip
  7f09de9d60b8 sp
  7f09db461520 error 4 in libgcc_s.so.1[7f09de9c7000+15000]
  [211263.265402] ceph-mds[17135]: segfault at 200 ip
  7f59eec280b8 sp
  7f59eb6b3520 error 4 in