Re: [ceph-users] OSD Restarts cause excessively high load average and requests are blocked 32 sec
Hi, I can see ~17% hardware interrupts which I find a little high - can you make sure all load is spread over all your cores (/proc/interrupts)? What about disk util once you restart them? Are they all 100% utilized or is it 'only' mostly cpu-bound? Also you're running a monitor on this node - how is the load on the nodes where you run a monitor compared to those where you dont? Cheers, Martin On Thu, Mar 20, 2014 at 10:18 AM, Quenten Grasso qgra...@onq.com.au wrote: Hi All, I left out my OS/kernel version, Ubuntu 12.04.4 LTS w/ Kernel 3.10.33-031033-generic (We upgrade our kernels to 3.10 due to Dell Drivers). Here's an example of starting all the OSD's after a reboot. top - 09:10:51 up 2 min, 1 user, load average: 332.93, 112.28, 39.96 Tasks: 310 total, 1 running, 309 sleeping, 0 stopped, 0 zombie Cpu(s): 50.3%us, 32.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 17.2%hi, 0.0%si, 0.0%st Mem: 32917276k total, 6331224k used, 26586052k free, 1332k buffers Swap: 33496060k total,0k used, 33496060k free, 1474084k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 15875 root 20 0 910m 381m 50m S 60 1.2 0:50.57 ceph-osd 2996 root 20 0 867m 330m 44m S 59 1.0 0:58.32 ceph-osd 4502 root 20 0 907m 372m 47m S 58 1.2 0:55.14 ceph-osd 12465 root 20 0 949m 418m 55m S 58 1.3 0:51.79 ceph-osd 4171 root 20 0 886m 348m 45m S 57 1.1 0:56.17 ceph-osd 3707 root 20 0 941m 405m 50m S 57 1.3 0:59.68 ceph-osd 3560 root 20 0 924m 394m 51m S 56 1.2 0:59.37 ceph-osd 4318 root 20 0 965m 435m 55m S 56 1.4 0:54.80 ceph-osd 3337 root 20 0 935m 407m 51m S 56 1.3 1:01.96 ceph-osd 3854 root 20 0 897m 366m 48m S 55 1.1 1:00.55 ceph-osd 3143 root 20 0 1364m 424m 24m S 16 1.3 1:08.72 ceph-osd 2509 root 20 0 652m 261m 62m S2 0.8 0:26.42 ceph-mon 4 root 20 0 000 S0 0.0 0:00.08 kworker/0:0 Regards, Quenten Grasso *From:* ceph-users-boun...@lists.ceph.com [mailto: ceph-users-boun...@lists.ceph.com] *On Behalf Of *Quenten Grasso *Sent:* Tuesday, 18 March 2014 10:19 PM *To:* 'ceph-users@lists.ceph.com' *Subject:* [ceph-users] OSD Restarts cause excessively high load average and requests are blocked 32 sec Hi All, I'm trying to troubleshoot a strange issue with my Ceph cluster. We're Running Ceph Version 0.72.2 All Nodes are Dell R515's w/ 6C AMD CPU w/ 32GB Ram, 12 x 3TB NearlineSAS Drives and 2 x 100GB Intel DC S3700 SSD's for Journals. All Pools have a replica of 2 or better. I.e. metadata replica of 3. I have 55 OSD's in the cluster across 5 nodes. When I restart the OSD's on a single node (any node) the load average of that node shoots up to 230+ and the whole cluster starts blocking IO requests until it settles down and its fine again. Any ideas on why the load average goes so crazy starts to block IO? snips from my ceph.conf [osd] osd data = /var/ceph/osd.$id osd journal size = 15000 osd mkfs type = xfs osd mkfs options xfs = -i size=2048 -f osd mount options xfs = rw,noexec,nodev,noatime,nodiratime,barrier=0,inode64,logbufs=8,logbsize=256k osd max backfills = 5 osd recovery max active = 3 [osd.0] host = pbnerbd01 public addr = 10.100.96.10 cluster addr = 10.100.128.10 osd journal = /dev/disk/by-id/scsi-36b8ca3a0eaa2660019deaf8d3a40bec4-part1 devs = /dev/sda4 /end Thanks, Quenten ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mounting with dmcrypt still fails
Hi, After looking to code in ceph-disk I came to the same conclusion, problem is with the mapping. Here are quote form ceph-disk def get_partition_dev(dev, pnum): get the device name for a partition assume that partitions are named like the base dev, with a number, and optionally some intervening characters (like 'p'). e.g., sda 1 - sda1 cciss/c0d1 1 - cciss!c0d1p1 Script are looking for partitions labeled as sdb[X] or p[X], where [x] means number of partitions (counted from 1). Dm-crypt are creating some new mapping in /dev/mapper/, example /dev/mapper/osd0 as main block device and /dev/mapper/osdp1 as first partition and /dev/mapper/osdp2 as second partition. But real path to osd0 device is NOT /dev/mapper/osd0 but /dev/dm-0 (sic!), and /dev/dm-1 is as first partition (osdp1), /dev/dm-2 is as second partition (osdp2). Conlusion. If we are using dm-crypt the script in ceph-disk should not looking partitions like sda partition 1 - sda1 or osd0 partition 1- osdp1 but should looking for partitions labeled as /dev/dm-X (counted from 1). Block deviceReal path /dev/mapper/osd0 - /dev/dm-0 First partition Real path /dev/mapper/osd0p1 - /dev/dm-1 Second partition Real path /dev/mapper/osd0p2 - /dev/dm-2 Continuing, 'ceph-disk activate' should mount dm-crypted partitions not by using /dev/disk/by-partuuid, but /dev/disk/by-uuid -- Best regards, Michel Lukzak ceph-disk-prepare --fs-type xfs --dmcrypt --dmcrypt-key-dir /etc/ceph/dmcrypt-keys --cluster ceph -- /dev/sdb ceph-disk: Error: Device /dev/sdb2 is in use by a device-mapper mapping (dm-crypt?): dm-0 It sounds like device-mapper still thinks it's using the the volume, you might be able to track it down with this: for i in `ls -1 /sys/block/ | grep sd`; do echo $i: `ls /sys/block/$i/${i}1/holders/`; done Then it's a matter of making sure there are no open file handles on the encrypted volume and unmounting it. You will still need to completely clear out the partition table on that disk, which can be tricky with GPT because it's not as simple as dd'in the start of the volume. This is what the zapdisk parameter is for in ceph-disk-prepare, I don't know enough about ceph-deploy to know if you can somehow pass it. After you know the device/dm mapping you can use udevadm to find out where it should map to (uuids replaced with xxx's): udevadm test /block/sdc/sdc1 snip run: '/sbin/cryptsetup --key-file /etc/ceph/dmcrypt-keys/x --key-size 256 create /dev/sdc1' run: '/bin/bash -c 'while [ ! -e /dev/mapper/x ];do sleep 1; done'' run: '/usr/sbin/ceph-disk-activate /dev/mapper/x' ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] 答复: why object can't be recovered when delete one replica
Hi Kyle, Thank you very much for your explanation, I have triggered the relative pg to scrub, but the secondary replica which I remove manually isn't recovered, it only shows that instructing pg xx.xxx on osd.x to scrub. PS: I use the ceph-deploy to deploy the ceph cluster, and the ceph.conf is the default configuration. Thanks Regards Li JiaMin -邮件原件- 发件人: Kyle Bader [mailto:kyle.ba...@gmail.com] 发送时间: 2014年3月23日 10:05 收件人: ljm李嘉敏 抄送: ceph-users@lists.ceph.com 主题: Re: [ceph-users] why object can't be recovered when delete one replica I upload a file through swift API, then delete it in the “current” directory in the secondary OSD manually, why the object can’t be recovered? If I delete it in the primary OSD, the object is deleted directly in the pool .rgw.bucket and it can’t be recovered from the secondary OSD. Do anyone know this behavior? This is because the placement group containing that object likely needs to scrub (just a light scrub should do). The scrub will compare the two replicas, notice the replica is missing from the secondary and trigger recovery/backfill. Can you try scrubbing the placement group containing the removed object and let us know if it triggers recovery? -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS crash when client goes to sleep
Hi Hong, Could you apply the patch and see if it crash after sleep? This could lead us to find the correct fix to MDS/client too. As what I can see here, this patch should fix the crash, but how to fix MDS if the crash happens? It happened to us, when it crashed, it was totally crash, and even restart the ceph-mds service with --reset-journal also not helping. Anyone can shed some lights on this matter? p/s: Is there any steps/tools to backup the MDS metadata? Say if MDS crash and refuse to run normally, can we restore the backup metadata? I'm thinking of it as a preventive steps, just in case if it happens again in future. Many thanks. Bazli -Original Message- From: Yan, Zheng [mailto:uker...@gmail.com] Sent: Sunday, March 23, 2014 2:53 PM To: Sage Weil Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com Subject: Re: [ceph-users] MDS crash when client goes to sleep On Sun, Mar 23, 2014 at 11:47 AM, Sage Weil s...@inktank.com wrote: Hi, I looked at this a bit earlier and wasn't sure why we would be getting a remote_reset event after a sleep/wake cycle. The patch should fix the crash, but I'm a bit worried something is not quite right on the client side, too... When client wakes up, it first tries reconnecting the old session. MDS refuses the reconnect request and sends a session close message to the client. After receiving the session close message, client closes the old session, then sends a session open message to the MDS. The MDS receives the open request and triggers a remote reset (Pipe.cc:466) sage On Sun, 23 Mar 2014, Yan, Zheng wrote: thank you for reporting this. Below patch should fix this issue --- diff --git a/src/mds/MDS.cc b/src/mds/MDS.cc index 57c7f4a..6b53c14 100644 --- a/src/mds/MDS.cc +++ b/src/mds/MDS.cc @@ -2110,6 +2110,7 @@ bool MDS::ms_handle_reset(Connection *con) if (session-is_closed()) { dout(3) ms_handle_reset closing connection for session session-info.inst dendl; messenger-mark_down(con); + con-set_priv(NULL); sessionmap.remove_session(session); } session-put(); @@ -2138,6 +2139,7 @@ void MDS::ms_handle_remote_reset(Connection *con) if (session-is_closed()) { dout(3) ms_handle_remote_reset closing connection for session session-info.inst dendl; messenger-mark_down(con); + con-set_priv(NULL); sessionmap.remove_session(session); } session-put(); On Fri, Mar 21, 2014 at 4:16 PM, Mohd Bazli Ab Karim bazli.abka...@mimos.my wrote: Hi Hong, How's the client now? Would it able to mount to the filesystem now? It looks similar to our case, http://www.spinics.net/lists/ceph-devel/msg18395.html However, you need to collect some logs to confirm this. Thanks. From: hjcho616 [mailto:hjcho...@yahoo.com] Sent: Friday, March 21, 2014 2:30 PM To: Luke Jing Yuan Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com Subject: Re: [ceph-users] MDS crash when client goes to sleep Luke, Not sure what flapping ceph-mds daemon mean, but when I connected to MDS when this happened there no longer was any process with ceph-mds when I ran one daemon. When I ran three there were one left but wasn't doing much. I didn't record the logs but behavior was very similar in 0.72 emperor. I am using debian packages. Client went to sleep for a while (like 8+ hours). There was no I/O prior to the sleep other than the fact that cephfs was still mounted. Regards, Hong From: Luke Jing Yuan jyl...@mimos.my To: hjcho616 hjcho...@yahoo.com Cc: Mohd Bazli Ab Karim bazli.abka...@mimos.my; ceph-users@lists.ceph.com ceph-users@lists.ceph.com Sent: Friday, March 21, 2014 1:17 AM Subject: RE: [ceph-users] MDS crash when client goes to sleep Hi Hong, That's interesting, for Mr. Bazli and I, we ended with MDS stuck in (up:replay) and a flapping ceph-mds daemon, but then again we are using version 0.72.2. Having said so the triggering point seem similar to us as well, which is the following line: -38 2014-03-20 20:08:44.495565 7fee3d7c4700 0 -- 192.168.1.20:6801/17079 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION So how long did your client go into sleep? Was there any I/O prior to the sleep? Regards, Luke From: hjcho616 [mailto:hjcho...@yahoo.com] Sent: Friday, 21 March, 2014 12:09 PM To: Luke Jing Yuan Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com Subject: Re: [ceph-users] MDS crash when client goes to sleep Nope just these segfaults. [149884.709608] ceph-mds[17366]: segfault at 200 ip 7f09de9d60b8 sp 7f09db461520 error 4 in libgcc_s.so.1[7f09de9c7000+15000] [211263.265402] ceph-mds[17135]: segfault at 200 ip 7f59eec280b8 sp 7f59eb6b3520 error 4 in