Re: [ceph-users] Remove ceph
Hi All, Can anyone let me know how to accomplish this ? Thanks Kumar From: Gnan Kumar, Yalla Sent: Friday, March 21, 2014 5:04 PM To: 'ceph-users@lists.ceph.com' Subject: Remove ceph Hi All, I have a ceph cluster with four nodes including the admin node. I have integrated it with openstack. Now, if I create volumes or boot up a VM for volumes, it creates the block storage in Ceph. One of the openstack node acts as ceph client. Now I want to remove ceph from openstack . What is the procedure ? Thanks Kumar This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. __ www.accenture.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] 答复: 答复: why object can't be recovered when delete one replica
Hi sage, I have run the repair command, and the warning info disappears in the output of "ceph health detail", but the replicas isn't recovered in the "current" directory. In all, the ceph cluster status can recover (the pg's status recover from inconsistent to active and clean), but not the replica. I am sorry for not provide enough info I found in the ceph cluster. Thanks & Regards Li JiaMin -邮件原件- 发件人: Sage Weil [mailto:s...@inktank.com] 发送时间: 2014年3月24日 9:36 收件人: ljm李嘉敏 抄送: Kyle Bader; ceph-users@lists.ceph.com 主题: Re: [ceph-users] 答复: why object can't be recovered when delete one replica When you do ceph pg scrub it will notice the missing object (you should see it go by with ceph -w or the message in /var/log/ceph/ceph.log on a monitor node), and the PG will get an 'inconsistent' flag set. To trigger repair, you need to do ceph pg repair sage On Mon, 24 Mar 2014, ljm李嘉敏 wrote: > Hi Kyle, > > Thank you very much for your explanation, I have triggered the > relative pg to scrub, but the secondary replica which I remove manually isn't > recovered, it only shows that instructing pg xx.xxx on osd.x to scrub. > > PS: I use the ceph-deploy to deploy the ceph cluster, and the ceph.conf is > the default configuration. > > Thanks & Regards > Li JiaMin > > -邮件原件- > 发件人: Kyle Bader [mailto:kyle.ba...@gmail.com] > 发送时间: 2014年3月23日 10:05 > 收件人: ljm李嘉敏 > 抄送: ceph-users@lists.ceph.com > 主题: Re: [ceph-users] why object can't be recovered when delete one > replica > > > I upload a file through swift API, then delete it in the “current” > > directory in the secondary OSD manually, why the object can’t be recovered? > > > > If I delete it in the primary OSD, the object is deleted directly in > > the pool .rgw.bucket and it can’t be recovered from the secondary OSD. > > > > Do anyone know this behavior? > > This is because the placement group containing that object likely needs to > scrub (just a light scrub should do). The scrub will compare the two > replicas, notice the replica is missing from the secondary and trigger > recovery/backfill. Can you try scrubbing the placement group containing the > removed object and let us know if it triggers recovery? > > -- > > Kyle > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS crash when client goes to sleep
Hi Hong, Could you apply the patch and see if it crash after sleep? This could lead us to find the correct fix to MDS/client too. As what I can see here, this patch should fix the crash, but how to fix MDS if the crash happens? It happened to us, when it crashed, it was totally crash, and even restart the ceph-mds service with --reset-journal also not helping. Anyone can shed some lights on this matter? p/s: Is there any steps/tools to backup the MDS metadata? Say if MDS crash and refuse to run normally, can we restore the backup metadata? I'm thinking of it as a preventive steps, just in case if it happens again in future. Many thanks. Bazli -Original Message- From: Yan, Zheng [mailto:uker...@gmail.com] Sent: Sunday, March 23, 2014 2:53 PM To: Sage Weil Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com Subject: Re: [ceph-users] MDS crash when client goes to sleep On Sun, Mar 23, 2014 at 11:47 AM, Sage Weil wrote: > Hi, > > I looked at this a bit earlier and wasn't sure why we would be getting > a remote_reset event after a sleep/wake cycle. The patch should fix > the crash, but I'm a bit worried something is not quite right on the > client side, too... > When client wakes up, it first tries reconnecting the old session. MDS refuses the reconnect request and sends a session close message to the client. After receiving the session close message, client closes the old session, then sends a session open message to the MDS. The MDS receives the open request and triggers a remote reset (Pipe.cc:466) > sage > > On Sun, 23 Mar 2014, Yan, Zheng wrote: > >> thank you for reporting this. Below patch should fix this issue >> >> --- >> diff --git a/src/mds/MDS.cc b/src/mds/MDS.cc index 57c7f4a..6b53c14 >> 100644 >> --- a/src/mds/MDS.cc >> +++ b/src/mds/MDS.cc >> @@ -2110,6 +2110,7 @@ bool MDS::ms_handle_reset(Connection *con) >>if (session->is_closed()) { >> dout(3) << "ms_handle_reset closing connection for session " << >> session->info.inst << dendl; >> messenger->mark_down(con); >> + con->set_priv(NULL); >> sessionmap.remove_session(session); >>} >>session->put(); >> @@ -2138,6 +2139,7 @@ void MDS::ms_handle_remote_reset(Connection *con) >>if (session->is_closed()) { >> dout(3) << "ms_handle_remote_reset closing connection for session " >> << session->info.inst << dendl; >> messenger->mark_down(con); >> + con->set_priv(NULL); >> sessionmap.remove_session(session); >>} >>session->put(); >> >> On Fri, Mar 21, 2014 at 4:16 PM, Mohd Bazli Ab Karim >> wrote: >> > Hi Hong, >> > >> > >> > >> > How's the client now? Would it able to mount to the filesystem now? >> > It looks similar to our case, >> > http://www.spinics.net/lists/ceph-devel/msg18395.html >> > >> > However, you need to collect some logs to confirm this. >> > >> > >> > >> > Thanks. >> > >> > >> > >> > >> > >> > From: hjcho616 [mailto:hjcho...@yahoo.com] >> > Sent: Friday, March 21, 2014 2:30 PM >> > >> > >> > To: Luke Jing Yuan >> > Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com >> > Subject: Re: [ceph-users] MDS crash when client goes to sleep >> > >> > >> > >> > Luke, >> > >> > >> > >> > Not sure what flapping ceph-mds daemon mean, but when I connected >> > to MDS when this happened there no longer was any process with >> > ceph-mds when I ran one daemon. When I ran three there were one >> > left but wasn't doing much. I didn't record the logs but behavior >> > was very similar in 0.72 emperor. I am using debian packages. >> > >> > >> > >> > Client went to sleep for a while (like 8+ hours). There was no I/O >> > prior to the sleep other than the fact that cephfs was still mounted. >> > >> > >> > >> > Regards, >> > >> > Hong >> > >> > >> > >> > >> > >> > From: Luke Jing Yuan >> > >> > >> > To: hjcho616 >> > Cc: Mohd Bazli Ab Karim ; >> > "ceph-users@lists.ceph.com" >> > Sent: Friday, March 21, 2014 1:17 AM >> > >> > Subject: RE: [ceph-users] MDS crash when client goes to sleep >> > >> > >> > Hi Hong, >> > >> > That's interesting, for Mr. Bazli and I, we ended with MDS stuck in >> > (up:replay) and a flapping ceph-mds daemon, but then again we are >> > using version 0.72.2. Having said so the triggering point seem >> > similar to us as well, which is the following line: >> > >> > -38> 2014-03-20 20:08:44.495565 7fee3d7c4700 0 -- >> > 192.168.1.20:6801/17079 >> >>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 >> >>> cs=0 l=0 >> > c=0x1f0e2160).accept we reset (peer sent cseq 2), sending >> > RESETSESSION >> > >> > So how long did your client go into sleep? Was there any I/O prior >> > to the sleep? >> > >> > Regards, >> > Luke >> > >> > From: hjcho616 [mailto:hjcho...@yahoo.com] >> > Sent: Friday, 21 March, 2014 12:09 PM >> > To: Luke Jing Yuan >> > Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com >> > Subject: Re: [ceph-users] MDS crash when client goes to sleep >> > >> > Nope just these segfaults. >> > >
Re: [ceph-users] 答复: why object can't be recovered when delete one replica
When you do ceph pg scrub it will notice the missing object (you should see it go by with ceph -w or the message in /var/log/ceph/ceph.log on a monitor node), and the PG will get an 'inconsistent' flag set. To trigger repair, you need to do ceph pg repair sage On Mon, 24 Mar 2014, ljm李嘉敏 wrote: > Hi Kyle, > > Thank you very much for your explanation, I have triggered the relative pg to > scrub, but the secondary replica which I remove manually isn't recovered, > it only shows that instructing pg xx.xxx on osd.x to scrub. > > PS: I use the ceph-deploy to deploy the ceph cluster, and the ceph.conf is > the default configuration. > > Thanks & Regards > Li JiaMin > > -邮件原件- > 发件人: Kyle Bader [mailto:kyle.ba...@gmail.com] > 发送时间: 2014年3月23日 10:05 > 收件人: ljm李嘉敏 > 抄送: ceph-users@lists.ceph.com > 主题: Re: [ceph-users] why object can't be recovered when delete one replica > > > I upload a file through swift API, then delete it in the “current” > > directory in the secondary OSD manually, why the object can’t be recovered? > > > > If I delete it in the primary OSD, the object is deleted directly in > > the pool .rgw.bucket and it can’t be recovered from the secondary OSD. > > > > Do anyone know this behavior? > > This is because the placement group containing that object likely needs to > scrub (just a light scrub should do). The scrub will compare the two > replicas, notice the replica is missing from the secondary and trigger > recovery/backfill. Can you try scrubbing the placement group containing the > removed object and let us know if it triggers recovery? > > -- > > Kyle > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] 答复: why object can't be recovered when delete one replica
Hi Kyle, Thank you very much for your explanation, I have triggered the relative pg to scrub, but the secondary replica which I remove manually isn't recovered, it only shows that instructing pg xx.xxx on osd.x to scrub. PS: I use the ceph-deploy to deploy the ceph cluster, and the ceph.conf is the default configuration. Thanks & Regards Li JiaMin -邮件原件- 发件人: Kyle Bader [mailto:kyle.ba...@gmail.com] 发送时间: 2014年3月23日 10:05 收件人: ljm李嘉敏 抄送: ceph-users@lists.ceph.com 主题: Re: [ceph-users] why object can't be recovered when delete one replica > I upload a file through swift API, then delete it in the “current” > directory in the secondary OSD manually, why the object can’t be recovered? > > If I delete it in the primary OSD, the object is deleted directly in > the pool .rgw.bucket and it can’t be recovered from the secondary OSD. > > Do anyone know this behavior? This is because the placement group containing that object likely needs to scrub (just a light scrub should do). The scrub will compare the two replicas, notice the replica is missing from the secondary and trigger recovery/backfill. Can you try scrubbing the placement group containing the removed object and let us know if it triggers recovery? -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mounting with dmcrypt still fails
Hi, After looking to code in ceph-disk I came to the same conclusion, problem is with the mapping. Here are quote form ceph-disk def get_partition_dev(dev, pnum): """ get the device name for a partition assume that partitions are named like the base dev, with a number, and optionally some intervening characters (like 'p'). e.g., sda 1 -> sda1 cciss/c0d1 1 -> cciss!c0d1p1 """ Script are looking for partitions labeled as "sdb[X]" or "p[X]", where [x] means number of partitions (counted from 1). Dm-crypt are creating some new mapping in /dev/mapper/, example /dev/mapper/osd0 as main block device and /dev/mapper/osdp1 as first partition and /dev/mapper/osdp2 as second partition. But real path to osd0 device is NOT /dev/mapper/osd0 but /dev/dm-0 (sic!), and /dev/dm-1 is as first partition (osdp1), /dev/dm-2 is as second partition (osdp2). Conlusion. If we are using dm-crypt the script in ceph-disk should not looking partitions like sda partition 1 -> sda1 or osd0 partition 1-> osdp1 but should looking for partitions labeled as /dev/dm-X (counted from 1). Block deviceReal path /dev/mapper/osd0 -> /dev/dm-0 First partition Real path /dev/mapper/osd0p1 -> /dev/dm-1 Second partition Real path /dev/mapper/osd0p2 -> /dev/dm-2 Continuing, 'ceph-disk activate' should mount dm-crypted partitions not by using /dev/disk/by-partuuid, but /dev/disk/by-uuid -- Best regards, Michel Lukzak >> ceph-disk-prepare --fs-type xfs --dmcrypt --dmcrypt-key-dir >> /etc/ceph/dmcrypt-keys --cluster ceph -- /dev/sdb >> ceph-disk: Error: Device /dev/sdb2 is in use by a device-mapper mapping >> (dm-crypt?): dm-0 > It sounds like device-mapper still thinks it's using the the volume, > you might be able to track it down with this: > for i in `ls -1 /sys/block/ | grep sd`; do echo $i: `ls > /sys/block/$i/${i}1/holders/`; done > Then it's a matter of making sure there are no open file handles on > the encrypted volume and unmounting it. You will still need to > completely clear out the partition table on that disk, which can be > tricky with GPT because it's not as simple as dd'in the start of the > volume. This is what the zapdisk parameter is for in > ceph-disk-prepare, I don't know enough about ceph-deploy to know if > you can somehow pass it. > After you know the device/dm mapping you can use udevadm to find out > where it should map to (uuids replaced with xxx's): > udevadm test /block/sdc/sdc1 > > run: '/sbin/cryptsetup --key-file /etc/ceph/dmcrypt-keys/x > --key-size 256 create /dev/sdc1' > run: '/bin/bash -c 'while [ ! -e /dev/mapper/x ];do sleep 1; done'' > run: '/usr/sbin/ceph-disk-activate /dev/mapper/x' ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs fast on a single big file but very slow on may files
Hi list, I'm new to ceph and so I installed a four node ceph cluster for testing purposes. Each node has two 6-core sandy bridge Xeons, 64 GiB of RAM, 6 15k rpm SAS drives, one SSD drive for journals and 10G ethernet. We're using Debian GNU/Linux 7.4 (Wheezy) with kernel 3.13 from Debian backports repository and Ceph 0.72.2-1~bpo70+1. Every node runs six OSDs (one for every SAS disk). The SSD is partitioned into six parts for journals. Monitors are three of the same nodes (no extra hardware for mons and MDS for testing). First, I used node #4 as a MDS and later I installed Ceph-MDS on all four nodes with set_max_mds=3. I did increase pg_num and pgp_num to 1200 each for both data and metadata pools. I mounted the cephfs on one node using the kernel client. Writing to a single big file is fast: $ dd if=/dev/zero of=bigfile bs=1M count=1M 1048576+0 records in 1048576+0 records out 1099511627776 bytes (1.1 TB) copied, 1240.52 s, 886 MB/s Reading is less fast: $ dd if=bigfile of=/dev/null bs=1M 1048576+0 records in 1048576+0 records out 1099511627776 bytes (1.1 TB) copied, 3226.8 s, 341 MB/s (during reading, the nodes are mostly idle (>90%, 1-1.8% wa)) After this, I tried to copy the linux kernel source tree (source and dest dirs both on cephfs, 600 MiB, 45k files): $ time cp -a linux-3.13.6 linux-3.13.6-copy real35m34.184s user0m1.884s sys 0m11.372s That's much too slow. The same process takes just a few seconds on one desktop class SATA drive. I can't see any load or I/O wait on any of the four nodes. I tried different mount options: mon1,mon2,mon3:/ on /export type ceph (rw,relatime,name=someuser,secret=,nodcache,nofsc) mon1,mon2,mon3:/ on /export type ceph (rw,relatime,name=someuser,secret=,dcache,fsc,wsize=10485760,rsize=10485760) Output of 'ceph status': ceph status cluster 32ea6593-8cd6-40d6-ac3b-7450f1d92d16 health HEALTH_OK monmap e1: 3 mons at {howard=xxx.yyy.zzz.199:6789/0,leonard=xxx.yyy.zzz.196:6789/0,penny=xxx.yyy.zzz.198:6789/0}, election epoch 32, quorum 0,1,2 howard,leonard,penny mdsmap e107: 1/1/1 up {0=penny=up:active}, 3 up:standby osdmap e276: 24 osds: 24 up, 24 in pgmap v8932: 2464 pgs, 3 pools, 1028 GB data, 514 kobjects 2061 GB used, 11320 GB / 13382 GB avail 2464 active+clean client io 119 MB/s rd, 509 B/s wr, 43 op/s I appreciate if someone may help me to find the reason for that odd behaviour. Cheers, Sascha ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD Restarts cause excessively high load average and "requests are blocked > 32 sec"
Hi, I can see ~17% hardware interrupts which I find a little high - can you make sure all load is spread over all your cores (/proc/interrupts)? What about disk util once you restart them? Are they all 100% utilized or is it 'only' mostly cpu-bound? Also you're running a monitor on this node - how is the load on the nodes where you run a monitor compared to those where you dont? Cheers, Martin On Thu, Mar 20, 2014 at 10:18 AM, Quenten Grasso wrote: > Hi All, > > > > I left out my OS/kernel version, Ubuntu 12.04.4 LTS w/ Kernel > 3.10.33-031033-generic (We upgrade our kernels to 3.10 due to Dell Drivers). > > > > Here's an example of starting all the OSD's after a reboot. > > > > top - 09:10:51 up 2 min, 1 user, load average: 332.93, 112.28, 39.96 > > Tasks: 310 total, 1 running, 309 sleeping, 0 stopped, 0 zombie > > Cpu(s): 50.3%us, 32.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 17.2%hi, 0.0%si, > 0.0%st > > Mem: 32917276k total, 6331224k used, 26586052k free, 1332k buffers > > Swap: 33496060k total,0k used, 33496060k free, 1474084k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > > 15875 root 20 0 910m 381m 50m S 60 1.2 0:50.57 ceph-osd > > 2996 root 20 0 867m 330m 44m S 59 1.0 0:58.32 ceph-osd > > 4502 root 20 0 907m 372m 47m S 58 1.2 0:55.14 ceph-osd > > 12465 root 20 0 949m 418m 55m S 58 1.3 0:51.79 ceph-osd > > 4171 root 20 0 886m 348m 45m S 57 1.1 0:56.17 ceph-osd > > 3707 root 20 0 941m 405m 50m S 57 1.3 0:59.68 ceph-osd > > 3560 root 20 0 924m 394m 51m S 56 1.2 0:59.37 ceph-osd > > 4318 root 20 0 965m 435m 55m S 56 1.4 0:54.80 ceph-osd > > 3337 root 20 0 935m 407m 51m S 56 1.3 1:01.96 ceph-osd > > 3854 root 20 0 897m 366m 48m S 55 1.1 1:00.55 ceph-osd > > 3143 root 20 0 1364m 424m 24m S 16 1.3 1:08.72 ceph-osd > > 2509 root 20 0 652m 261m 62m S2 0.8 0:26.42 ceph-mon > > 4 root 20 0 000 S0 0.0 0:00.08 kworker/0:0 > > > > Regards, > > Quenten Grasso > > > > *From:* ceph-users-boun...@lists.ceph.com [mailto: > ceph-users-boun...@lists.ceph.com] *On Behalf Of *Quenten Grasso > *Sent:* Tuesday, 18 March 2014 10:19 PM > *To:* 'ceph-users@lists.ceph.com' > *Subject:* [ceph-users] OSD Restarts cause excessively high load average > and "requests are blocked > 32 sec" > > > > Hi All, > > > > I'm trying to troubleshoot a strange issue with my Ceph cluster. > > > > We're Running Ceph Version 0.72.2 > > All Nodes are Dell R515's w/ 6C AMD CPU w/ 32GB Ram, 12 x 3TB NearlineSAS > Drives and 2 x 100GB Intel DC S3700 SSD's for Journals. > > All Pools have a replica of 2 or better. I.e. metadata replica of 3. > > > > I have 55 OSD's in the cluster across 5 nodes. When I restart the OSD's on > a single node (any node) the load average of that node shoots up to 230+ > and the whole cluster starts blocking IO requests until it settles down and > its fine again. > > > > Any ideas on why the load average goes so crazy & starts to block IO? > > > > > > > > [osd] > > osd data = /var/ceph/osd.$id > > osd journal size = 15000 > > osd mkfs type = xfs > > osd mkfs options xfs = "-i size=2048 -f" > > osd mount options xfs = > "rw,noexec,nodev,noatime,nodiratime,barrier=0,inode64,logbufs=8,logbsize=256k" > > osd max backfills = 5 > > osd recovery max active = 3 > > > > [osd.0] > > host = pbnerbd01 > > public addr = 10.100.96.10 > > cluster addr = 10.100.128.10 > > osd journal = > /dev/disk/by-id/scsi-36b8ca3a0eaa2660019deaf8d3a40bec4-part1 > > devs = /dev/sda4 > > > > > > Thanks, > > Quenten > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com