[ceph-users] ceph-deploy install on remote machine error
I am new to ceph.I am trying to follow the official document to install ceph on the machine .All things work fine but whenever I try to install ceph using ceph-deploy install ceph-server I get the following error http://pastebin.com/HxzP5Npi . I am behind a proxy server .Initially I thought the problem is with the ssh not using environment variables. But ssh ceph-server env gives the required proxy variables.I also tried using -v to get more detailed output .But this does not seem to be helping. I stiil think the error is related to ssh not using environment variables . Can someone please look into it.Or else is there a more verbose way to get the output ? -- Kumar Rishabh UG2, BTech CS+MS IIIT-Hyderabad ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy install on remote machine error
Hi, I believe you need to tell apt about your proxy server: cat /etc/apt/apt.conf Acquire::http::Proxy http://my.proxy.server:3142;; wogri On 09/11/2013 08:28 AM, kumar rishabh wrote: I am new to ceph.I am trying to follow the official document to install ceph on the machine .All things work fine but whenever I try to install ceph using ceph-deploy install ceph-server I get the following error http://pastebin.com/HxzP5Npi . I am behind a proxy server .Initially I thought the problem is with the ssh not using environment variables. But ssh ceph-server env gives the required proxy variables.I also tried using -v to get more detailed output .But this does not seem to be helping. I stiil think the error is related to ssh not using environment variables . Can someone please look into it.Or else is there a more verbose way to get the output ? -- Kumar Rishabh UG2, BTech CS+MS IIIT-Hyderabad ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- http://www.wogri.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] errors after kernel-upgrade
Does noone have an idea ? I can't mount the cluster anymore. Thank you, Markus Am 10.09.2013 09:43, schrieb Markus Goldberg: Hi, i made a 'stop ceph-all' on my ceph-admin-host and then a kernel-upgrade from 3.9 to 3.11 on all of my 3 nodes. Ubuntu 13.04, ceph 0,68 The kernel-upgrade required a reboot. Now after rebooting i get the following errors: /root@bd-a:~# ceph -s// //cluster e0dbf70d-af59-42a5-b834-7ad739a7f89b// // health HEALTH_WARN 133 pgs peering; 272 pgs stale; 265 pgs stuck unclean; 2 requests are blocked 32 sec; mds cluster is degraded// // monmap e1: 3 mons at {bd-0=xxx.xxx.xxx.20:6789/0,bd-1=///xxx.xxx.xxx/.21:6789/0,bd-2=///xxx.xxx.xxx/.22:6789/0}, election epoch 782, quorum 0,1,2 bd-0,bd-1,bd-2// // mdsmap e451467: 1/1/1 up {0=bd-0=up:replay}, 2 up:standby// // osdmap e464358: 3 osds: 3 up, 3 in// // pgmap v1343477: 792 pgs, 9 pools, 15145 MB data, 4986 objects// //30927 MB used, 61372 GB / 61408 GB avail// // 387 active+clean// // 122 stale+active// // 140 stale+active+clean// // 133 peering// // 10 stale+active+replay// // //root@bd-a:~# ceph -s// //cluster e0dbf70d-af59-42a5-b834-7ad739a7f89b// // health HEALTH_WARN 6 pgs down; 377 pgs peering; 296 pgs stuck unclean; mds cluster is degraded// // monmap e1: 3 mons at {bd-0=///xxx.xxx.xxx/.20:6789/0,bd-1=///xxx.xxx.xxx/.21:6789/0,bd-2=///xxx.xxx.xxx/.22:6789/0}, election epoch 782, quorum 0,1,2 bd-0,bd-1,bd-2// // mdsmap e451467: 1/1/1 up {0=bd-0=up:replay}, 2 up:standby// // osdmap e464400: 3 osds: 3 up, 3 in// // pgmap v1343586: 792 pgs, 9 pools, 15145 MB data, 4986 objects// //31046 MB used, 61372 GB / 61408 GB avail// // 142 active// // 270 active+clean// // 3 active+replay// // 371 peering// // 6 down+peering// // //root@bd-a:~# ceph -s// //cluster e0dbf70d-af59-42a5-b834-7ad739a7f89b// // health HEALTH_WARN 257 pgs peering; 359 pgs stuck unclean; 1 requests are blocked 32 sec; mds cluster is degraded// // monmap e1: 3 mons at {bd-0=///xxx.xxx.xxx/.20:6789/0,bd-1=///xxx.xxx.xxx/.21:6789/0,bd-2=///xxx.xxx.xxx/.22:6789/0}, election epoch 782, quorum 0,1,2 bd-0,bd-1,bd-2// // mdsmap e451467: 1/1/1 up {0=bd-0=up:replay}, 2 up:standby// // osdmap e464403: 3 osds: 3 up, 3 in// // pgmap v1343594: 792 pgs, 9 pools, 15145 MB data, 4986 objects// //31103 MB used, 61372 GB / 61408 GB avail// // 373 active// // 157 active+clean// // 5 active+replay// // 257 peering// // //root@bd-a:~#/ As you can see above, the errors are changing, perhaps any selfrepair is on the run in the background. But this is since 12 hours. What should i do ? Thank you, Markus Am 09.09.2013 13:52, schrieb Yan, Zheng: The bug has been fixed in 3.11 kernel by commit ccca4e37b1 (libceph: fix truncate size calculation). We don't backport cephfs bug fixes to old kernel. please update the kernel or use ceph-fuse. Regards Yan, Zheng Best regards, Tobi ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- MfG, Markus Goldberg Markus Goldberg | Universität Hildesheim | Rechenzentrum Tel +49 5121 883212 | Marienburger Platz 22, D-31141 Hildesheim, Germany Fax +49 5121 883205 | emailgoldb...@uni-hildesheim.de ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- MfG, Markus Goldberg Markus Goldberg | Universität Hildesheim | Rechenzentrum Tel +49 5121 883212 | Marienburger Platz 22, D-31141 Hildesheim, Germany Fax +49 5121 883205 | email goldb...@uni-hildesheim.de ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph space problem, garbage collector ?
Hi, do you need more information about that ? thanks, Olivier Le mardi 10 septembre 2013 à 11:19 -0700, Samuel Just a écrit : Can you post the rest of you crush map? -Sam On Tue, Sep 10, 2013 at 5:52 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote: I also checked that all files in that PG still are on that PG : for IMG in `find . -type f -printf '%f\n' | awk -F '__' '{ print $1 }' | sort --unique` ; do echo -n $IMG ; ceph osd map ssd3copies $IMG | grep -v 6\\.31f ; echo ; done And all objects are referenced in rados (compared with rados --pool ssd3copies ls rados.ssd3copies.dump). Le mardi 10 septembre 2013 à 13:46 +0200, Olivier Bonvalet a écrit : Some additionnal informations : if I look on one PG only, for example the 6.31f. ceph pg dump report a size of 616GB : # ceph pg dump | grep ^6\\. | awk '{ SUM+=($6/1024/1024) } END { print SUM }' 631717 But on disk, on the 3 replica I have : # du -sh /var/lib/ceph/osd/ceph-50/current/6.31f_head/ 1,3G /var/lib/ceph/osd/ceph-50/current/6.31f_head/ Since I was suspected a snapshot problem, I try to count only head files : # find /var/lib/ceph/osd/ceph-50/current/6.31f_head/ -type f -name '*head*' -print0 | xargs -r -0 du -hc | tail -n1 448M total and the content of the directory : http://pastebin.com/u73mTvjs Le mardi 10 septembre 2013 à 10:31 +0200, Olivier Bonvalet a écrit : Hi, I have a space problem on a production cluster, like if there is unused data not freed : ceph df and rados df reports 613GB of data, and disk usage is 2640GB (with 3 replica). It should be near 1839GB. I have 5 hosts, 3 with SAS storage and 2 with SSD storage. I use crush rules to put pools on SAS or on SSD. My pools : # ceph osd dump | grep ^pool pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 576 pgp_num 576 last_change 68315 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 576 pgp_num 576 last_change 68317 owner 0 pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 576 pgp_num 576 last_change 68321 owner 0 pool 3 'hdd3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash rjenkins pg_num 200 pgp_num 200 last_change 172933 owner 0 pool 6 'ssd3copies' rep size 3 min_size 1 crush_ruleset 7 object_hash rjenkins pg_num 800 pgp_num 800 last_change 172929 owner 0 pool 9 'sas3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 172935 owner 0 Only hdd3copies, sas3copies and ssd3copies are really used : # ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 76498G 51849G 24648G 32.22 POOLS: NAME ID USED %USED OBJECTS data 0 46753 0 72 metadata 1 0 0 0 rbd2 8 0 1 hdd3copies 3 2724G 3.56 5190954 ssd3copies 6 613G 0.80 347668 sas3copies 9 3692G 4.83 764394 My CRUSH rules was : rule SASperHost { ruleset 4 type replicated min_size 1 max_size 10 step take SASroot step chooseleaf firstn 0 type host step emit } and : rule SSDperOSD { ruleset 3 type replicated min_size 1 max_size 10 step take SSDroot step choose firstn 0 type osd step emit } but, since the cluster was full because of that space problem, I swith to a different rule : rule SSDperOSDfirst { ruleset 7 type replicated min_size 1 max_size 10 step take SSDroot step choose firstn 1 type osd step emit step take SASroot step chooseleaf firstn -1 type net step emit } So with that last rule, I should have only one replica on my SSD OSD, so 613GB of space used. But if I check on OSD I see 1212GB really used. I also use snapshots, maybe snapshots are ignored by ceph df and rados df ? Thanks for any help. Olivier ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] orphans rbd_data
Hello, I deleted rbd images in format 2 but it seems to remain rbd_data objects : $ rados -p datashare ls | grep rbd_data | wc -l 1479669 $ rados -p datashare ls | grep rbd_header | wc -l 0 $ rados -p datashare ls | grep rbd_id | wc -l 0 Does anyone know when rbd_data object must be destroyed ? In this pool I keep rbd image in format 1 but I think they do not use objects called rbd_data* and there should not be any format 2 images. It's seems these images have been used with a buggy rc kernel (corrected by this patch : https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=3a96d5cd7bdce45d5dded75c3a62d4fb98050280 ) and I notice that the name suffix size is only 12 characters : rbd_data.11ae2ae8944a.d3b2 rbd_data.11292ae8944a.00090f98 rbd_data.11292ae8944a.000814bc rbd_data.11292ae8944a.00020c81 Is it possible that these objects are not deleted when I make rbd rm ? I guess there any hope for these crumbs disappear by themselves ? Regards, Laurent Barbe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Number of Monitors per OSDs
Dell - Internal Use - Confidential Hi, What's a good rule of thumb to work out the number of monitors per OSDs in a cluster Regards Ian Dell Corporation Limited is registered in England and Wales. Company Registration Number: 2081369 Registered address: Dell House, The Boulevard, Cain Road, Bracknell, Berkshire, RG12 1LF, UK. Company details for other Dell UK entities can be found on www.dell.co.uk. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3
Hey Gregory, FYI: I just attempted to upgrade a second cluster where CephFS was in use and got this: -24 2013-09-11 12:00:36.674469 7f0de1438700 1 -- 194.109.43.76:6800/8335 -- 194.109.43.73:6789/0 -- mon_subscribe({mdsmap=525898+,monmap=26+,osdmap=529384}) v2 -- ?+0 0x35b0700 con 0x35a9580 -23 2013-09-11 12:00:36.674487 7f0de1438700 1 -- 194.109.43.76:6800/8335 mark_down 194.109.43.72:6800/2439 -- pipe dne -22 2013-09-11 12:00:36.674492 7f0de1438700 5 mds.0.28 handle_mds_failure for myself; not doing anything -21 2013-09-11 12:00:36.677244 7f0de1438700 1 -- 194.109.43.76:6800/8335 == mon.2 194.109.43.73:6789/0 25 osd_map(529384..529385 src has 528815..529385) v3 592+0+0 (3556993101 0 0) 0x35d9240 con 0x35a9580 -20 2013-09-11 12:00:36.677334 7f0de1438700 2 mds.0.28 boot_start 1: opening inotable -19 2013-09-11 12:00:36.677516 7f0de1438700 1 -- 194.109.43.76:6800/8335 -- 194.109.43.75:6802/16460 -- osd_op(mds.0.28:1 mds0_inotable [read 0~0] 1.b852b893 e529385) v4 -- ?+0 0x35d96c0 con 0x35a99a0 -18 2013-09-11 12:00:36.677543 7f0de1438700 2 mds.0.28 boot_start 1: opening sessionmap -17 2013-09-11 12:00:36.677717 7f0de1438700 1 -- 194.109.43.76:6800/8335 -- 194.109.43.72:6802/25951 -- osd_op(mds.0.28:2 mds0_sessionmap [read 0~0] 1.3270c60b e529385) v4 -- ?+0 0x35d9480 con 0x35a9dc0 -16 2013-09-11 12:00:36.677734 7f0de1438700 2 mds.0.28 boot_start 1: opening anchor table -15 2013-09-11 12:00:36.677749 7f0de1438700 1 -- 194.109.43.76:6800/8335 -- 194.109.43.75:6802/16460 -- osd_op(mds.0.28:3 mds_anchortable [read 0~0] 1.a977f6a7 e529385) v4 -- ?+0 0x35d9000 con 0x35a99a0 -14 2013-09-11 12:00:36.677759 7f0de1438700 2 mds.0.28 boot_start 1: opening snap table -13 2013-09-11 12:00:36.677912 7f0de1438700 1 -- 194.109.43.76:6800/8335 -- 194.109.43.76:6804/27840 -- osd_op(mds.0.28:4 mds_snaptable [read 0~0] 1.d90270ad e529385) v4 -- ?+0 0x358ad80 con 0x35a9c60 -12 2013-09-11 12:00:36.677938 7f0de1438700 2 mds.0.28 boot_start 1: opening mds log -11 2013-09-11 12:00:36.677966 7f0de1438700 5 mds.0.log open discovering log bounds -10 2013-09-11 12:00:36.677984 7f0de1438700 1 mds.0.journaler(ro) recover start -9 2013-09-11 12:00:36.677993 7f0de1438700 1 mds.0.journaler(ro) read_head -8 2013-09-11 12:00:36.678080 7f0de1438700 1 -- 194.109.43.76:6800/8335 -- 194.109.43.71:6800/18625 -- osd_op(mds.0.28:5 200. [read 0~0] 1.844f3494 e529385) v4 -- ?+0 0x358a6c0 con 0x35ee420 -7 2013-09-11 12:00:36.678111 7f0de1438700 1 -- 194.109.43.76:6800/8335 == mon.2 194.109.43.73:6789/0 26 mon_subscribe_ack(300s) v1 20+0+0 (3657348766 0 0) 0x35b0540 con 0x35a9580 -6 2013-09-11 12:00:36.678122 7f0de1438700 10 monclient: handle_subscribe_ack sent 2013-09-11 12:00:36.674463 renew after 2013-09-11 12:03:06.674463 -5 2013-09-11 12:00:36.678903 7f0de1438700 5 mds.0.28 ms_handle_connect on 194.109.43.72:6802/25951 -4 2013-09-11 12:00:36.679271 7f0de1438700 5 mds.0.28 ms_handle_connect on 194.109.43.71:6800/18625 -3 2013-09-11 12:00:36.679771 7f0de1438700 5 mds.0.28 ms_handle_connect on 194.109.43.75:6802/16460 -2 2013-09-11 12:00:36.680424 7f0de1438700 5 mds.0.28 ms_handle_connect on 194.109.43.76:6804/27840 -1 2013-09-11 12:00:36.697679 7f0de1438700 1 -- 194.109.43.76:6800/8335 == osd.5 194.109.43.76:6804/27840 1 osd_op_reply(4 mds_snaptable [read 0~0] ack = -2 (No such file or directory)) v4 112+0+0 (79439328 0 0) 0x35d0400 con 0x35a9c60 0 2013-09-11 12:00:36.699682 7f0de1438700 -1 mds/MDSTable.cc: In function 'void MDSTable::load_2(int, ceph::bufferlist, Context*)' thread 7f0de1438700 time 2013-09-11 12:00:36.697748 mds/MDSTable.cc: 152: FAILED assert(r = 0) ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a) 1: (MDSTable::load_2(int, ceph::buffer::list, Context*)+0x44f) [0x77ce7f] 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe3b) [0x7d891b] 3: (MDS::handle_core_message(Message*)+0x987) [0x56f527] 4: (MDS::_dispatch(Message*)+0x2f) [0x56f5ef] 5: (MDS::ms_dispatch(Message*)+0x19b) [0x5710bb] 6: (DispatchQueue::entry()+0x592) [0x92e432] 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x8a59bd] 8: (()+0x68ca) [0x7f0de59428ca] 9: (clone()+0x6d) [0x7f0de4675b6d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. So, this time it was missing mds_snaptable. I know it was missing this object before the upgrade and restart, because I checked those objects on this cluster yesterday, as a comparison against the cluster that we tried to debug then, and it was missing then too. So, the object must have gone missing earlier and only now, on restart, the problem shows itself. This CephFS was created on 0.67.2 and I haven't done anything noteworthy to the rest of the cluster in the mean time. There have been 2 node-failures without data-loss recently, though. So, again, restart of some or all services on several nodes has happened before the
Re: [ceph-users] Number of Monitors per OSDs
by node I mean physical servers. the number of OSD's doesn't really affect the choice of MON's. One Mon will always be the leader. But Mon's (and traffic to them) is very lightweight. So you don't need to fear. And yes, you omit the RAID in your server, you use one OSD per drive, which means you will have more than one OSD on a server. On 09/11/2013 02:30 PM, ian_m_por...@dell.com wrote: *Dell - Internal Use - Confidential * By node do you mean physical server or a running OSD instance (so could have multiple OSDs running on a single server, each with their own drive)? Ian -Original Message- From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Wolfgang Hennerbichler Sent: 11 September 2013 11:35 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Number of Monitors per OSDs On 09/11/2013 11:55 AM, ian_m_por...@dell.com wrote: *Dell - Internal Use - Confidential * if this is dell internal, I probabloy shouldn't answer :) Hi, What’s a good rule of thumb to work out the number of monitors per OSDs in a cluster AFAIK there is no rule of thumb. I would dimension like this: if you have a very small cluster (3 nodes) use 3 mon's. if you have a larger cluster (30 nodes) use 5 mon's if you have a huge cluster (500 nodes) use 7 mon's. Regards Ian -- http://www.wogri.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Dell Corporation Limited is registered in England and Wales. Company Registration Number: 2081369 Registered address: Dell House, The Boulevard, Cain Road, Bracknell, Berkshire, RG12 1LF, UK. Company details for other Dell UK entities can be found on www.dell.co.uk. -- http://www.wogri.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3
On Wed, Sep 11, 2013 at 9:12 PM, Yan, Zheng uker...@gmail.com wrote: On Wed, Sep 11, 2013 at 7:51 PM, Oliver Daudey oli...@xs4all.nl wrote: Hey Gregory, I wiped and re-created the MDS-cluster I just mailed about, starting out by making sure CephFS is not mounted anywhere, stopping all MDSs, completely cleaning the data and metadata-pools using rados --pool=pool cleanup prefix, then creating a new cluster using `ceph mds newfs 1 0 --yes-i-really-mean-it' and starting all MDSs again. Directly afterwards, I saw this: # rados --pool=metadata ls 1. 2. 200. 200.0001 600. 601. 602. 603. 605. 606. 608. 609. mds0_inotable mds0_sessionmap Note the missing objects, right from the start. I was able to mount the CephFS at this point, but after unmounting it and restarting the MDS-cluster, it failed to come up, with the same symptoms as before. I didn't place any files on CephFS at any point between newfs and failure. Naturally, I tried initializing it again, but now, even after more than 5 tries, the mds*-objects simply no longer show up in the metadata-pool at all. In fact, it remains empty. I can mount CephFS after the first start of the MDS-cluster after a newfs, but on restart, it fails because of the missing objects. Am I doing anything wrong while initializing the cluster, maybe? Is cleaning the pools and doing the newfs enough? I did the same on the other cluster yesterday and it seems to have all objects. Thank you for your default information. s/default/detail sorry for the typo. The cause of missing object is that the MDS IDs for old FS and new FS are the same (incarnations are the same). When OSD receives MDS requests for the newly created FS. It silently drops the requests, because it thinks they are duplicated. You can get around the bug by creating new pools for the newfs. Regards Yan, Zheng Regards, Oliver On di, 2013-09-10 at 16:24 -0700, Gregory Farnum wrote: Nope, a repair won't change anything if scrub doesn't detect any inconsistencies. There must be something else going on, but I can't fathom what...I'll try and look through it a bit more tomorrow. :/ -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Sep 10, 2013 at 3:49 PM, Oliver Daudey oli...@xs4all.nl wrote: Hey Gregory, Thanks for your explanation. Turns out to be 1.a7 and it seems to scrub OK. # ceph osd getmap -o osdmap # osdmaptool --test-map-object mds_anchortable --pool 1 osdmap osdmaptool: osdmap file 'osdmap' object 'mds_anchortable' - 1.a7 - [2,0] # ceph pg scrub 1.a7 osd.2 logs: 2013-09-11 00:41:15.843302 7faf56b1b700 0 log [INF] : 1.a7 scrub ok osd.0 didn't show anything in it's logs, though. Should I try a repair next? Regards, Oliver On di, 2013-09-10 at 15:01 -0700, Gregory Farnum wrote: If the problem is somewhere in RADOS/xfs/whatever, then there's a good chance that the mds_anchortable object exists in its replica OSDs, but when listing objects those aren't queried, so they won't show up in a listing. You can use the osdmaptool to map from an object name to the PG it would show up in, or if you look at your log you should see a line something like 1 -- LOCAL IP -- OTHER IP -- osd_op(mds.0.31:3 mds_anchortable [read 0~0] 1.a977f6a7 e165) v4 -- ?+0 0x1e88d80 con 0x1f189a0 In this example, metadata is pool 1 and 1.a977f6a7 is the hash of the msd_anchortable object, and depending on how many PGs are in the pool it will be in pg 1.a7, or 1.6a7, or 1.f6a7... -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Sep 10, 2013 at 2:51 PM, Oliver Daudey oli...@xs4all.nl wrote: Hey Gregory, The only objects containing table I can find at all, are in the metadata-pool: # rados --pool=metadata ls | grep -i table mds0_inotable Looking at another cluster where I use CephFS, there is indeed an object named mds_anchortable, but the broken cluster is missing it. I don't see how I can scrub the PG for an object that doesn't appear to exist. Please elaborate. Regards, Oliver On di, 2013-09-10 at 14:06 -0700, Gregory Farnum wrote: Also, can you scrub the PG which contains the mds_anchortable object and see if anything comes up? You should be able to find the key from the logs (in the osd_op line that contains mds_anchortable) and convert that into the PG. Or you can just scrub all of osd 2. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Sep 10, 2013 at 1:59 PM, Gregory Farnum g...@inktank.com wrote: It's not an upgrade issue. There's an MDS object that is somehow missing. If it exists, then on restart you'll be fine. Oliver, what is your general cluster config? What filesystem are your OSDs
Re: [ceph-users] Ceph space problem, garbage collector ?
Very simple test on a new pool ssdtest, with 3 replica full SSD (crushrule 3) : # rbd create ssdtest/test-mysql --size 102400 # rbd map ssdtest/test-mysql # dd if=/dev/zero of=/dev/rbd/ssdtest/test-mysql bs=4M count=500 # ceph df | grep ssdtest ssdtest10 2000M 0 502 host1:# du -skc /var/lib/ceph/osd/ceph-*/*/10.* | tail -n1 3135780total host2:# du -skc /var/lib/ceph/osd/ceph-*/*/10.* | tail -n1 3028804total → so 6020kB on disk, wich seems correct (and a find reports 739+767 files of 4MB, so it's also good). First snapshot : # rbd snap create ssdtest/test-mysql@s1 # dd if=/dev/zero of=/dev/rbd/ssdtest/test-mysql bs=4M count=250 # ceph df | grep ssdtest ssdtest10 3000M 0 752 2 * # du -skc /var/lib/ceph/osd/ceph-*/*/10.* | tail -n1 → 9024kB on disk, which is correct again. Second snapshot : # rbd snap create ssdtest/test-mysql@s2 Here I write 4KB only it 100 differents rados blocks : # for I in '' 1 2 3 4 5 6 7 8 9 ; do for J in 0 1 2 3 4 5 6 7 8 9 ; do OFFSET=$I$J ; dd if=/dev/zero of=/dev/rbd/ssdtest/test-mysql bs=1k seek= $((OFFSET*4096)) count=4 ; done ; done # ceph df | grep ssdtest ssdtest10 3000M 0 852 Here the USED column of ceph df is wrong. And on the disk I see 10226kB used. So, for me the problem come from ceph df (and rados df), wich don't correctly reports space used by partially writed object. Or is it XFS related only ? Le mercredi 11 septembre 2013 à 11:00 +0200, Olivier Bonvalet a écrit : Hi, do you need more information about that ? thanks, Olivier Le mardi 10 septembre 2013 à 11:19 -0700, Samuel Just a écrit : Can you post the rest of you crush map? -Sam On Tue, Sep 10, 2013 at 5:52 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote: I also checked that all files in that PG still are on that PG : for IMG in `find . -type f -printf '%f\n' | awk -F '__' '{ print $1 }' | sort --unique` ; do echo -n $IMG ; ceph osd map ssd3copies $IMG | grep -v 6\\.31f ; echo ; done And all objects are referenced in rados (compared with rados --pool ssd3copies ls rados.ssd3copies.dump). Le mardi 10 septembre 2013 à 13:46 +0200, Olivier Bonvalet a écrit : Some additionnal informations : if I look on one PG only, for example the 6.31f. ceph pg dump report a size of 616GB : # ceph pg dump | grep ^6\\. | awk '{ SUM+=($6/1024/1024) } END { print SUM }' 631717 But on disk, on the 3 replica I have : # du -sh /var/lib/ceph/osd/ceph-50/current/6.31f_head/ 1,3G /var/lib/ceph/osd/ceph-50/current/6.31f_head/ Since I was suspected a snapshot problem, I try to count only head files : # find /var/lib/ceph/osd/ceph-50/current/6.31f_head/ -type f -name '*head*' -print0 | xargs -r -0 du -hc | tail -n1 448M total and the content of the directory : http://pastebin.com/u73mTvjs Le mardi 10 septembre 2013 à 10:31 +0200, Olivier Bonvalet a écrit : Hi, I have a space problem on a production cluster, like if there is unused data not freed : ceph df and rados df reports 613GB of data, and disk usage is 2640GB (with 3 replica). It should be near 1839GB. I have 5 hosts, 3 with SAS storage and 2 with SSD storage. I use crush rules to put pools on SAS or on SSD. My pools : # ceph osd dump | grep ^pool pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 576 pgp_num 576 last_change 68315 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 576 pgp_num 576 last_change 68317 owner 0 pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 576 pgp_num 576 last_change 68321 owner 0 pool 3 'hdd3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash rjenkins pg_num 200 pgp_num 200 last_change 172933 owner 0 pool 6 'ssd3copies' rep size 3 min_size 1 crush_ruleset 7 object_hash rjenkins pg_num 800 pgp_num 800 last_change 172929 owner 0 pool 9 'sas3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 172935 owner 0 Only hdd3copies, sas3copies and ssd3copies are really used : # ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 76498G 51849G 24648G 32.22 POOLS: NAME ID USED %USED OBJECTS data 0 46753 0 72 metadata 1 0 0 0 rbd2 8 0 1 hdd3copies 3 2724G 3.56 5190954 ssd3copies 6 613G 0.80 347668 sas3copies 9 3692G 4.83 764394 My CRUSH rules was : rule SASperHost { ruleset 4 type replicated min_size 1
[ceph-users] ocfs2 for OSDs?
Hi, I wonder is ocfs2 suitable for hosting OSD data? In ceph documentation only XFS, ext4 and btrfs are discussed, but looking at ocfs2 feature list it theoretically also could host OSDs: Some of the notable features of the file system are: Optimized Allocations (extents, reservations, sparse, unwritten extents, punch holes) REFLINKs (inode-based writeable snapshots) Indexed Directories Metadata Checksums Extended Attributes (unlimited number of attributes per inode) Advanced Security (POSIX ACLs and SELinux) User and Group Quotas Variable Block and Cluster sizes Journaling (Ordered and Writeback data journaling modes) Endian and Architecture Neutral (x86, x86_64, ia64 and ppc64) Buffered, Direct, Asynchronous, Splice and Memory Mapped I/Os In-built Clusterstack with a Distributed Lock Manager Cluster-aware Tools (mkfs, fsck, tunefs, etc.) ocfs2 can work in cluster mode but it can also work for single node. Just wondering would OSD work on ocfs2 and what would performance characteristics be. Any thoughts/experience? BR, Ugis Racko ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Number of Monitors per OSDs
On 09/11/2013 10:55 AM, ian_m_por...@dell.com wrote: *Dell - Internal Use - Confidential * Hi, What’s a good rule of thumb to work out the number of monitors per OSDs in a cluster This question makes little sense. The number of monitors and the number of OSDs (or any other component in the cluster for that matter) are not correlated. The number of monitors heavily depends on the availability and resiliency you want. Higher number of monitors means your monitor cluster is less prone to data loss in the event of hardware or network failure; it also has the drawback of putting extra pressure on the monitors to get updates out -- more monitors means more acks are needed until a given update (on monitor-managed data) is considered committed. Sure, having more monitors should help balancing the client load, but that will only happen for reads (on which clients, including OSDs, depend to get their maps updated). But then we go back to the drawbacks previously mentioned. As a rule of thumbs, stick with 3 or 5 monitors. You'll want odd-numbers. You shouldn't need more than that, and I don't recall any deployment to date that required more than that. If you have a big deployment, specially with a big number of OSDs and clients, feel free to experiment with multiple number of monitors and report back your findings. I'm sure I speak for pretty much everybody when I tell you those would be much appreciated :-) -Joao Regards Ian Dell Corporation Limited is registered in England and Wales. Company Registration Number: 2081369 Registered address: Dell House, The Boulevard, Cain Road, Bracknell, Berkshire, RG12 1LF, UK. Company details for other Dell UK entities can be found on www.dell.co.uk. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Joao Eduardo Luis Software Engineer | http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ocfs2 for OSDs?
On 09/11/2013 08:58 AM, Ugis wrote: Hi, I wonder is ocfs2 suitable for hosting OSD data? In ceph documentation only XFS, ext4 and btrfs are discussed, but looking at ocfs2 feature list it theoretically also could host OSDs: Some of the notable features of the file system are: Optimized Allocations (extents, reservations, sparse, unwritten extents, punch holes) REFLINKs (inode-based writeable snapshots) Indexed Directories Metadata Checksums Extended Attributes (unlimited number of attributes per inode) Advanced Security (POSIX ACLs and SELinux) User and Group Quotas Variable Block and Cluster sizes Journaling (Ordered and Writeback data journaling modes) Endian and Architecture Neutral (x86, x86_64, ia64 and ppc64) Buffered, Direct, Asynchronous, Splice and Memory Mapped I/Os In-built Clusterstack with a Distributed Lock Manager Cluster-aware Tools (mkfs, fsck, tunefs, etc.) ocfs2 can work in cluster mode but it can also work for single node. Just wondering would OSD work on ocfs2 and what would performance characteristics be. Any thoughts/experience? Technically it may work, but I'm not sure why you would want to use OCF2 under the OSD. Do you have a particular use case in mind? FWIW, OSDs also can work on ZFS, though we don't do a whole lot of testing yet. Mark BR, Ugis Racko ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Number of Monitors per OSDs
On 09/11/2013 11:55 AM, ian_m_por...@dell.com wrote: *Dell - Internal Use - Confidential * if this is dell internal, I probabloy shouldn't answer :) Hi, What’s a good rule of thumb to work out the number of monitors per OSDs in a cluster AFAIK there is no rule of thumb. I would dimension like this: if you have a very small cluster (3 nodes) use 3 mon's. if you have a larger cluster (30 nodes) use 5 mon's if you have a huge cluster (500 nodes) use 7 mon's. Regards Ian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Module rbd not found on Ubuntu 13.04
Hi, I am new to ceph. I setup a ceph cluster for block storage (I refered : http://www.youtube.com/watch?v=R3gnLrsZSno) Cluster Machines are Centos 6.3 with kernel 2.6.32. Ceph version 0.67.3 I am trying to configure Ubuntu Client 113.04 for the same. I am getting rbd module not found error: ubuntu@ubuntu:~$ sudo modprobe rbd FATAL: Module rbd not found. ubuntu@ubuntu:~$ uname -r 3.8.0-19-generic ubuntu@ubuntu:~$ ceph --version ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a) Please let me know if there is some mistake in my setup. ~Prasanna ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph space problem, garbage collector ?
I removed some garbage about hosts faude / rurkh / murmillia (they was temporarily added because cluster was full). So the clean CRUSH map : # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 # devices device 0 device0 device 1 device1 device 2 device2 device 3 device3 device 4 device4 device 5 device5 device 6 device6 device 7 device7 device 8 device8 device 9 device9 device 10 device10 device 11 device11 device 12 device12 device 13 device13 device 14 device14 device 15 device15 device 16 device16 device 17 device17 device 18 device18 device 19 device19 device 20 device20 device 21 device21 device 22 device22 device 23 device23 device 24 device24 device 25 device25 device 26 device26 device 27 device27 device 28 device28 device 29 device29 device 30 device30 device 31 device31 device 32 device32 device 33 device33 device 34 device34 device 35 device35 device 36 device36 device 37 device37 device 38 device38 device 39 device39 device 40 osd.40 device 41 osd.41 device 42 osd.42 device 43 osd.43 device 44 osd.44 device 45 osd.45 device 46 osd.46 device 47 osd.47 device 48 osd.48 device 49 osd.49 device 50 osd.50 device 51 osd.51 device 52 osd.52 device 53 osd.53 device 54 osd.54 device 55 osd.55 device 56 osd.56 device 57 osd.57 device 58 osd.58 device 59 osd.59 device 60 osd.60 device 61 osd.61 device 62 osd.62 device 63 osd.63 device 64 osd.64 device 65 osd.65 device 66 osd.66 device 67 osd.67 device 68 osd.68 device 69 osd.69 device 70 osd.70 device 71 osd.71 device 72 osd.72 device 73 osd.73 device 74 osd.74 device 75 osd.75 device 76 osd.76 device 77 osd.77 device 78 osd.78 # types type 0 osd type 1 host type 2 rack type 3 net type 4 room type 5 datacenter type 6 root # buckets host dragan { id -17 # do not change unnecessarily # weight 24.480 alg straw hash 0 # rjenkins1 item osd.70 weight 2.720 item osd.71 weight 2.720 item osd.72 weight 2.720 item osd.73 weight 2.720 item osd.74 weight 2.720 item osd.75 weight 2.720 item osd.76 weight 2.720 item osd.77 weight 2.720 item osd.78 weight 2.720 } rack SAS15B01 { id -40 # do not change unnecessarily # weight 24.480 alg straw hash 0 # rjenkins1 item dragan weight 24.480 } net SAS188-165-15 { id -72 # do not change unnecessarily # weight 24.480 alg straw hash 0 # rjenkins1 item SAS15B01 weight 24.480 } room SASs15 { id -90 # do not change unnecessarily # weight 24.480 alg straw hash 0 # rjenkins1 item SAS188-165-15 weight 24.480 } datacenter SASrbx1 { id -100 # do not change unnecessarily # weight 24.480 alg straw hash 0 # rjenkins1 item SASs15 weight 24.480 } host taman { id -16 # do not change unnecessarily # weight 24.480 alg straw hash 0 # rjenkins1 item osd.49 weight 2.720 item osd.62 weight 2.720 item osd.63 weight 2.720 item osd.64 weight 2.720 item osd.65 weight 2.720 item osd.66 weight 2.720 item osd.67 weight 2.720 item osd.68 weight 2.720 item osd.69 weight 2.720 } rack SAS31A10 { id -15 # do not change unnecessarily # weight 24.480 alg straw hash 0 # rjenkins1 item taman weight 24.480 } net SAS178-33-62 { id -14 # do not change unnecessarily # weight 24.480 alg straw hash 0 # rjenkins1 item SAS31A10 weight 24.480 } room SASs31 { id -13 # do not change unnecessarily # weight 24.480 alg straw hash 0 # rjenkins1 item SAS178-33-62 weight 24.480 } host kaino { id -9 # do not change unnecessarily # weight 24.480 alg straw hash 0 # rjenkins1 item osd.40 weight 2.720 item osd.41 weight 2.720 item osd.42 weight 2.720 item osd.43 weight 2.720 item osd.44 weight 2.720 item osd.45 weight 2.720 item osd.46 weight 2.720 item osd.47 weight 2.720 item osd.48 weight 2.720 } rack SAS34A14 { id -10 # do not change unnecessarily # weight 24.480 alg straw hash 0 # rjenkins1 item kaino weight 24.480 } net SAS5-135-135 { id -11 # do not change unnecessarily # weight 24.480 alg straw hash 0 # rjenkins1 item SAS34A14 weight 24.480 } room SASs34 { id -12 # do not change unnecessarily # weight 24.480 alg straw hash 0 # rjenkins1 item SAS5-135-135 weight 24.480 } datacenter SASrbx2 { id -101 # do not change unnecessarily # weight 48.960 alg straw
Re: [ceph-users] ocfs2 for OSDs?
No particular use case yet. Ocfs2 could be considered as option if it outperforms xfs/ext4 under osd - therefore question. It just seemed suitable as it can be considered stable(as xfs,ext4) and has unlimited xattrs. Regarding zfs - at times it seems that it is closer to production on linux and matures faster than btrfs. Ugis 2013/9/11 Mark Nelson mark.nel...@inktank.com: On 09/11/2013 08:58 AM, Ugis wrote: Hi, I wonder is ocfs2 suitable for hosting OSD data? In ceph documentation only XFS, ext4 and btrfs are discussed, but looking at ocfs2 feature list it theoretically also could host OSDs: Some of the notable features of the file system are: Optimized Allocations (extents, reservations, sparse, unwritten extents, punch holes) REFLINKs (inode-based writeable snapshots) Indexed Directories Metadata Checksums Extended Attributes (unlimited number of attributes per inode) Advanced Security (POSIX ACLs and SELinux) User and Group Quotas Variable Block and Cluster sizes Journaling (Ordered and Writeback data journaling modes) Endian and Architecture Neutral (x86, x86_64, ia64 and ppc64) Buffered, Direct, Asynchronous, Splice and Memory Mapped I/Os In-built Clusterstack with a Distributed Lock Manager Cluster-aware Tools (mkfs, fsck, tunefs, etc.) ocfs2 can work in cluster mode but it can also work for single node. Just wondering would OSD work on ocfs2 and what would performance characteristics be. Any thoughts/experience? Technically it may work, but I'm not sure why you would want to use OCF2 under the OSD. Do you have a particular use case in mind? FWIW, OSDs also can work on ZFS, though we don't do a whole lot of testing yet. Mark BR, Ugis Racko ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3
On Wed, Sep 11, 2013 at 10:06 PM, Oliver Daudey oli...@xs4all.nl wrote: Hey Yan, On 11-09-13 15:12, Yan, Zheng wrote: On Wed, Sep 11, 2013 at 7:51 PM, Oliver Daudey oli...@xs4all.nl wrote: Hey Gregory, I wiped and re-created the MDS-cluster I just mailed about, starting out by making sure CephFS is not mounted anywhere, stopping all MDSs, completely cleaning the data and metadata-pools using rados --pool=pool cleanup prefix, then creating a new cluster using `ceph mds newfs 1 0 --yes-i-really-mean-it' and starting all MDSs again. Directly afterwards, I saw this: # rados --pool=metadata ls 1. 2. 200. 200.0001 600. 601. 602. 603. 605. 606. 608. 609. mds0_inotable mds0_sessionmap Note the missing objects, right from the start. I was able to mount the CephFS at this point, but after unmounting it and restarting the MDS-cluster, it failed to come up, with the same symptoms as before. I didn't place any files on CephFS at any point between newfs and failure. Naturally, I tried initializing it again, but now, even after more than 5 tries, the mds*-objects simply no longer show up in the metadata-pool at all. In fact, it remains empty. I can mount CephFS after the first start of the MDS-cluster after a newfs, but on restart, it fails because of the missing objects. Am I doing anything wrong while initializing the cluster, maybe? Is cleaning the pools and doing the newfs enough? I did the same on the other cluster yesterday and it seems to have all objects. Thank you for your default information. The cause of missing object is that the MDS IDs for old FS and new FS are the same (incarnations are the same). When OSD receives MDS requests for the newly created FS. It silently drops the requests, because it thinks they are duplicated. You can get around the bug by creating new pools for the newfs. Thanks for this very useful info, I think this solves the mystery! Could I get around it any other way? I'd rather not have to re-create the pools and switch to new pool-ID's every time I have to do this. Does the OSD store this info in it's meta-data, or might restarting the OSDs be enough? I'm quite sure that I re-created MDS-clusters on the same pools many times, without all the objects going missing. This was usually as part of tests, where I also restarted other cluster-components, like OSDs. This could explain why only some files went missing. If some OSDs are restarted and processed the requests, while others dropped the requests, it would appear as if some, but not all objects are missing. The problem then persists until the active MDS in the MDS-cluster is restarted, after which the missing objects get noticed, because things fail to restart. IMHO, this is a bug. Why Yes, it's a bug. Fixing it should be easy. would the OSD ignore these requests, if the objects the MDS tries to write don't even exist at that time? OSD uses informartion in PG log to check duplicated requests, so restarting OSD does not work. Another way to get around the bug is generate lots of writes to the data/metadata pools, make sure each PG trim old entries in its log. Regards Yan, Zheng ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Module rbd not found on Ubuntu 13.04
Hello, Is it a custom kernel ? You can verify that rbd module is enable in your kernel config : $ grep CONFIG_BLK_DEV_RBD /boot/config-3.8.0-19-generic CONFIG_BLK_DEV_RBD=m Laurent Barbe Le 11/09/2013 10:51, Prasanna Gholap a écrit : Hi, I am new to ceph. I setup a ceph cluster for block storage (I refered : http://www.youtube.com/watch?v=R3gnLrsZSno) Cluster Machines are Centos 6.3 with kernel 2.6.32. Ceph version 0.67.3 I am trying to configure Ubuntu Client 113.04 for the same. I am getting rbd module not found error: ubuntu@ubuntu:~$ sudo modprobe rbd FATAL: Module rbd not found. ubuntu@ubuntu:~$ uname -r 3.8.0-19-generic ubuntu@ubuntu:~$ ceph --version ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a) Please let me know if there is some mistake in my setup. ~Prasanna ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] orphans rbd_data
On Wed, 11 Sep 2013, Laurent Barbe wrote: Hello, I deleted rbd images in format 2 but it seems to remain rbd_data objects : $ rados -p datashare ls | grep rbd_data | wc -l 1479669 $ rados -p datashare ls | grep rbd_header | wc -l 0 $ rados -p datashare ls | grep rbd_id | wc -l 0 Does anyone know when rbd_data object must be destroyed ? In this pool I keep rbd image in format 1 but I think they do not use objects called rbd_data* and there should not be any format 2 images. It's seems these images have been used with a buggy rc kernel (corrected by this patch : https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=3a96d5cd7bdce45d5dded75c3a62d4fb98050280 ) and I notice that the name suffix size is only 12 characters : rbd_data.11ae2ae8944a.d3b2 rbd_data.11292ae8944a.00090f98 rbd_data.11292ae8944a.000814bc rbd_data.11292ae8944a.00020c81 Is it possible that these objects are not deleted when I make rbd rm ? I guess there any hope for these crumbs disappear by themselves ? That explains it. They won't go away by themselves. You can clean them up manually by grepping them out of rados ls and deleting them with rados rm. Try to pass several names to each instantiation of the command or the startup/shutdown will make things really slow. sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] orphans rbd_data
Hello Sage, Thank you, I will do something like this : $ rados -p rbd ls | grep '^rbd_data.24bb2ae8944a.' | xargs -n 100 rados -p rbd rm Laurent Le 11/09/2013 17:37, Sage Weil a écrit : On Wed, 11 Sep 2013, Laurent Barbe wrote: Hello, I deleted rbd images in format 2 but it seems to remain rbd_data objects : $ rados -p datashare ls | grep rbd_data | wc -l 1479669 $ rados -p datashare ls | grep rbd_header | wc -l 0 $ rados -p datashare ls | grep rbd_id | wc -l 0 Does anyone know when rbd_data object must be destroyed ? In this pool I keep rbd image in format 1 but I think they do not use objects called rbd_data* and there should not be any format 2 images. It's seems these images have been used with a buggy rc kernel (corrected by this patch : https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=3a96d5cd7bdce45d5dded75c3a62d4fb98050280 ) and I notice that the name suffix size is only 12 characters : rbd_data.11ae2ae8944a.d3b2 rbd_data.11292ae8944a.00090f98 rbd_data.11292ae8944a.000814bc rbd_data.11292ae8944a.00020c81 Is it possible that these objects are not deleted when I make rbd rm ? I guess there any hope for these crumbs disappear by themselves ? That explains it. They won't go away by themselves. You can clean them up manually by grepping them out of rados ls and deleting them with rados rm. Try to pass several names to each instantiation of the command or the startup/shutdown will make things really slow. sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Module rbd not found on Ubuntu 13.04
Hi Laurent, Thanks for reply. I checked in my machine: ubuntu@ubuntu:~$ grep CONFIG_BLK_DEV_RBD /boot/config-3.8.0-19-generic CONFIG_BLK_DEV_RBD=m It is not a custom kernel. It is an AWS instance. ~Prasanna On Wed, Sep 11, 2013 at 8:57 PM, Laurent Barbe laur...@ksperis.com wrote: Hello, Is it a custom kernel ? You can verify that rbd module is enable in your kernel config : $ grep CONFIG_BLK_DEV_RBD /boot/config-3.8.0-19-generic CONFIG_BLK_DEV_RBD=m Laurent Barbe Le 11/09/2013 10:51, Prasanna Gholap a écrit : Hi, I am new to ceph. I setup a ceph cluster for block storage (I refered : http://www.youtube.com/watch?**v=R3gnLrsZSnohttp://www.youtube.com/watch?v=R3gnLrsZSno) Cluster Machines are Centos 6.3 with kernel 2.6.32. Ceph version 0.67.3 I am trying to configure Ubuntu Client 113.04 for the same. I am getting rbd module not found error: ubuntu@ubuntu:~$ sudo modprobe rbd FATAL: Module rbd not found. ubuntu@ubuntu:~$ uname -r 3.8.0-19-generic ubuntu@ubuntu:~$ ceph --version ceph version 0.67.3 (**408cd61584c72c0d97b774b3d8f95c**6b1b06341a) Please let me know if there is some mistake in my setup. ~Prasanna __**_ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ocfs2 for OSDs?
On Wed, 11 Sep 2013, Ugis wrote: Hi, I wonder is ocfs2 suitable for hosting OSD data? In ceph documentation only XFS, ext4 and btrfs are discussed, but looking at ocfs2 feature list it theoretically also could host OSDs: Some of the notable features of the file system are: Optimized Allocations (extents, reservations, sparse, unwritten extents, punch holes) REFLINKs (inode-based writeable snapshots) This is the one item on this list I see that the ceph-osds could take real advantage of; it would make object clones triggered by things like RBD snapshots faster. What is missing from this list that would be similarly (or more) useful is a volume/fs snapshot feature. Of course, ocfs2 only makes sense when run in a single-node mode underneath ceph-osds. sage Indexed Directories Metadata Checksums Extended Attributes (unlimited number of attributes per inode) Advanced Security (POSIX ACLs and SELinux) User and Group Quotas Variable Block and Cluster sizes Journaling (Ordered and Writeback data journaling modes) Endian and Architecture Neutral (x86, x86_64, ia64 and ppc64) Buffered, Direct, Asynchronous, Splice and Memory Mapped I/Os In-built Clusterstack with a Distributed Lock Manager Cluster-aware Tools (mkfs, fsck, tunefs, etc.) ocfs2 can work in cluster mode but it can also work for single node. Just wondering would OSD work on ocfs2 and what would performance characteristics be. Any thoughts/experience? BR, Ugis Racko -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph at the OpenNebulaConf
Hello everyone, As you may be aware, we are holding the first OpenNebula Conference [1] in Berlin, this 24-26 September. The conference is the perfect place to learn about practical Cloud Computing, aimed at cloud users, developers, executives and IT managers to help them tackle their computational and business challenges. The goal is to foster fruitful and educational discussions around Cloud Computing and OpenNebula. We think this is of special interest to Ceph users and developers, since many OpenNebula users are also part of the Ceph userbase. There will be a talk about OpenNebula and Ceph among other things by Joel Merrick from the BBC [2] with the name Adventures in Research. Registration closing date is getting closer, so seize the moment and register now [3]! The conferences attendees will be in for a treat: * Keynotes speakers include Daniel Concepción from Produban – Bank Santander Group, Thomas Higon from Akamai, Steven Timm from FermiLab, André von Deetzen from Deutsche Post E-Post, Jordi Farrés from European Space Agency, Karanbir Singh from CentOS Project, and Ignacio M. Llorente and Rubén S. Montero from the OpenNebula Project. * The talks are organized in three tracks about user experiences and case studies, integration with other cloud tools, and interoperability and HPC clouds and include speakers from leading organizations like CloudWeavers, Terradue, NetWays, INRIA, BBC, inovex, AGS Group, Hedera, NetOpenServices, KTH, CESNET or CESCA. * The Hands-on Tutorial will show how to build, configure and operate your own OpenNebula cloud. * The Hacking and Open Space Sessions will provide an opportunity to discuss burning ideas, and meet face to face to discuss development. * The Lightning Talks will provide an opportunity to present new projects, products, features, integrations, experiences, use cases, collaboration invitations, quick tips or demonstrations. This session is an opportunity for ideas to get the attention they deserve. What's not to like? See you all at Berlin! The OpenNebula Team [1] http://opennebulaconf.com/ [2] http://opennebulaconf.com/schedule/track-i-talk-8/ [3] http://opennebulaconf.com/registration/ -- Join us at OpenNebulaConf2013 http://opennebulaconf.com/ in Berlin, 24-26 September, 2013 -- Jaime Melis Project Engineer OpenNebula - The Open Source Toolkit for Cloud Computing www.OpenNebula.org | jme...@opennebula.org ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Module rbd not found on Ubuntu 13.04
Hi, I don't have rbd module in given location ubuntu@ubuntu:~$ ls /lib/modules/3.8.0-19-generic/kernel/drivers/block cryptoloop.ko floppy.ko nbd.ko ubuntu@ubuntu:~$ modinfo rbd ERROR: Module rbd not found. By the link about aws, rbd.ko isn't included yet in linux aws . I'll try to build the kernel manually and proceed for rbd. Thanks for your help. ~Prasanna On Wed, Sep 11, 2013 at 9:33 PM, Laurent Barbe laur...@ksperis.com wrote: Remove on AWS images ? For me, the module is here (ubuntu 13.04) : $ ls /lib/modules/$(uname -r)/kernel/drivers/block/ $ modinfo rbd filename: /lib/modules/3.8.0-30-generic/** kernel/drivers/block/rbd.ko license:GPL author: Jeff Garzik j...@garzik.org description:rados block device author: Yehuda Sadeh yeh...@hq.newdream.net author: Sage Weil s...@newdream.net srcversion: DA4E3C524752162C9266077 depends:libceph intree: Y vermagic: 3.8.0-30-generic SMP mod_unload modversions Also : https://forums.aws.amazon.com/**thread.jspa?threadID=120198https://forums.aws.amazon.com/thread.jspa?threadID=120198 Laurent Le 11/09/2013 17:55, Prasanna Gholap a écrit : Hi Laurent, Thanks for reply. I checked in my machine: ubuntu@ubuntu:~$ grep CONFIG_BLK_DEV_RBD /boot/config-3.8.0-19-generic CONFIG_BLK_DEV_RBD=m It is not a custom kernel. It is an AWS instance. ~Prasanna On Wed, Sep 11, 2013 at 8:57 PM, Laurent Barbe laur...@ksperis.com mailto:laur...@ksperis.com wrote: Hello, Is it a custom kernel ? You can verify that rbd module is enable in your kernel config : $ grep CONFIG_BLK_DEV_RBD /boot/config-3.8.0-19-generic CONFIG_BLK_DEV_RBD=m Laurent Barbe Le 11/09/2013 10:51, Prasanna Gholap a écrit : Hi, I am new to ceph. I setup a ceph cluster for block storage (I refered : http://www.youtube.com/watch?_**_v=R3gnLrsZSnohttp://www.youtube.com/watch?__v=R3gnLrsZSno http://www.youtube.com/watch?**v=R3gnLrsZSnohttp://www.youtube.com/watch?v=R3gnLrsZSno) Cluster Machines are Centos 6.3 with kernel 2.6.32. Ceph version 0.67.3 I am trying to configure Ubuntu Client 113.04 for the same. I am getting rbd module not found error: ubuntu@ubuntu:~$ sudo modprobe rbd FATAL: Module rbd not found. ubuntu@ubuntu:~$ uname -r 3.8.0-19-generic ubuntu@ubuntu:~$ ceph --version ceph version 0.67.3 (__**408cd61584c72c0d97b774b3d8f95c** __6b1b06341a) Please let me know if there is some mistake in my setup. ~Prasanna __**___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-us...@lists.ceph.**comceph-users@lists.ceph.com http://lists.ceph.com/__**listinfo.cgi/ceph-users-ceph._**_comhttp://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Module rbd not found on Ubuntu 13.04
Remove on AWS images ? For me, the module is here (ubuntu 13.04) : $ ls /lib/modules/$(uname -r)/kernel/drivers/block/ $ modinfo rbd filename: /lib/modules/3.8.0-30-generic/kernel/drivers/block/rbd.ko license:GPL author: Jeff Garzik j...@garzik.org description:rados block device author: Yehuda Sadeh yeh...@hq.newdream.net author: Sage Weil s...@newdream.net srcversion: DA4E3C524752162C9266077 depends:libceph intree: Y vermagic: 3.8.0-30-generic SMP mod_unload modversions Also : https://forums.aws.amazon.com/thread.jspa?threadID=120198 Laurent Le 11/09/2013 17:55, Prasanna Gholap a écrit : Hi Laurent, Thanks for reply. I checked in my machine: ubuntu@ubuntu:~$ grep CONFIG_BLK_DEV_RBD /boot/config-3.8.0-19-generic CONFIG_BLK_DEV_RBD=m It is not a custom kernel. It is an AWS instance. ~Prasanna On Wed, Sep 11, 2013 at 8:57 PM, Laurent Barbe laur...@ksperis.com mailto:laur...@ksperis.com wrote: Hello, Is it a custom kernel ? You can verify that rbd module is enable in your kernel config : $ grep CONFIG_BLK_DEV_RBD /boot/config-3.8.0-19-generic CONFIG_BLK_DEV_RBD=m Laurent Barbe Le 11/09/2013 10:51, Prasanna Gholap a écrit : Hi, I am new to ceph. I setup a ceph cluster for block storage (I refered : http://www.youtube.com/watch?__v=R3gnLrsZSno http://www.youtube.com/watch?v=R3gnLrsZSno) Cluster Machines are Centos 6.3 with kernel 2.6.32. Ceph version 0.67.3 I am trying to configure Ubuntu Client 113.04 for the same. I am getting rbd module not found error: ubuntu@ubuntu:~$ sudo modprobe rbd FATAL: Module rbd not found. ubuntu@ubuntu:~$ uname -r 3.8.0-19-generic ubuntu@ubuntu:~$ ceph --version ceph version 0.67.3 (__408cd61584c72c0d97b774b3d8f95c__6b1b06341a) Please let me know if there is some mistake in my setup. ~Prasanna _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3
On Wed, Sep 11, 2013 at 7:48 AM, Yan, Zheng uker...@gmail.com wrote: On Wed, Sep 11, 2013 at 10:06 PM, Oliver Daudey oli...@xs4all.nl wrote: Hey Yan, On 11-09-13 15:12, Yan, Zheng wrote: On Wed, Sep 11, 2013 at 7:51 PM, Oliver Daudey oli...@xs4all.nl wrote: Hey Gregory, I wiped and re-created the MDS-cluster I just mailed about, starting out by making sure CephFS is not mounted anywhere, stopping all MDSs, completely cleaning the data and metadata-pools using rados --pool=pool cleanup prefix, then creating a new cluster using `ceph mds newfs 1 0 --yes-i-really-mean-it' and starting all MDSs again. Directly afterwards, I saw this: # rados --pool=metadata ls 1. 2. 200. 200.0001 600. 601. 602. 603. 605. 606. 608. 609. mds0_inotable mds0_sessionmap Note the missing objects, right from the start. I was able to mount the CephFS at this point, but after unmounting it and restarting the MDS-cluster, it failed to come up, with the same symptoms as before. I didn't place any files on CephFS at any point between newfs and failure. Naturally, I tried initializing it again, but now, even after more than 5 tries, the mds*-objects simply no longer show up in the metadata-pool at all. In fact, it remains empty. I can mount CephFS after the first start of the MDS-cluster after a newfs, but on restart, it fails because of the missing objects. Am I doing anything wrong while initializing the cluster, maybe? Is cleaning the pools and doing the newfs enough? I did the same on the other cluster yesterday and it seems to have all objects. Thank you for your default information. The cause of missing object is that the MDS IDs for old FS and new FS are the same (incarnations are the same). When OSD receives MDS requests for the newly created FS. It silently drops the requests, because it thinks they are duplicated. You can get around the bug by creating new pools for the newfs. Thanks for this very useful info, I think this solves the mystery! Could I get around it any other way? I'd rather not have to re-create the pools and switch to new pool-ID's every time I have to do this. Does the OSD store this info in it's meta-data, or might restarting the OSDs be enough? I'm quite sure that I re-created MDS-clusters on the same pools many times, without all the objects going missing. This was usually as part of tests, where I also restarted other cluster-components, like OSDs. This could explain why only some files went missing. If some OSDs are restarted and processed the requests, while others dropped the requests, it would appear as if some, but not all objects are missing. The problem then persists until the active MDS in the MDS-cluster is restarted, after which the missing objects get noticed, because things fail to restart. IMHO, this is a bug. Why Yes, it's a bug. Fixing it should be easy. would the OSD ignore these requests, if the objects the MDS tries to write don't even exist at that time? OSD uses informartion in PG log to check duplicated requests, so restarting OSD does not work. Another way to get around the bug is generate lots of writes to the data/metadata pools, make sure each PG trim old entries in its log. Regards Yan, Zheng This definitely explains the symptoms seen here on a not-very-busy/long-lived cluster; I wish I had the notes to figure out if it could have caused the problem for other users as well. I'm not sure the best way to work around the problem in the code, though. We could add an fs generation number to every object or every mds incarnation, but that seems a bit icky. Did you have other ideas, Zheng? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [ceph-deploy] problem creating mds after a full cluster wipe
Title: signature Alph Salas Ingeniero T.I Kepler Data Recovery Asturias 97, Las Condes Santiago- Chile (56 2) 2362 7504 asa...@kepler.cl www.kepler.cl On 09/04/13 23:56, Sage Weil wrote: On Wed, 4 Sep 2013, Alphe Salas Michels wrote: Hi again, as I was doomed to full wipe my cluster once again after. I uploaded to ceph-deploy 1.2.3 all went smoothing along my ceph-deploy process. until I create the mds and then ceph-deploy mds create myhost provoked first a File "/usr/lib/python2.7/dist-packages/pushy/protocol/baseconnection.py", line 645, in __handle raise e pushy.protocol.proxy.ExceptionProxy: [Errno 2] No such file or directory: '/var/lib/ceph/bootstrap-mds' doing a mkdir -p /var/lib/ceph/bootstrap-mds solved that one then I got a: pushy.protocol.proxy.ExceptionProxy: [Errno 2] No such file or directory: '/var/lib/ceph/mds/ceph-mds01' doing a mkdir -p /var/lib/ceph/mds/ceph-mds01 solved that one too What distro was this? And what version of ceph did you install? Thanks! sage Sorry Sage and all for the late reply I missed your comments distro: ubuntu 13.04 main up to date as much as it could be ceph: 0.67.2-1 raring ceph-deploy: 1.2.3 After that all was runing nicely ... health HEALTH_OK etc ../.. mdsmap e4: 1/1/1 up {0=mds01=up:active} Hope that can help. -- signature *Alph? Salas* Ingeniero T.I Descripci?n: cid:image001.gif@01CAA59C.F14CE4B0*Kepler Data Recovery* *Asturias 97, Las Condes** Santiago- Chile** **(56 2) 2362 7504* asa...@kepler.cl *www.kepler.cl http://www.kepler.cl* ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3
Hey Yan, Just confirming that creating fresh pools and doing the newfs on those fixed the problem, while restarting the OSDs didn't, thanks again! If you come up with a permanent fix, let me know and I'll test it for you. Regards, Oliver On wo, 2013-09-11 at 22:48 +0800, Yan, Zheng wrote: On Wed, Sep 11, 2013 at 10:06 PM, Oliver Daudey oli...@xs4all.nl wrote: Hey Yan, On 11-09-13 15:12, Yan, Zheng wrote: On Wed, Sep 11, 2013 at 7:51 PM, Oliver Daudey oli...@xs4all.nl wrote: Hey Gregory, I wiped and re-created the MDS-cluster I just mailed about, starting out by making sure CephFS is not mounted anywhere, stopping all MDSs, completely cleaning the data and metadata-pools using rados --pool=pool cleanup prefix, then creating a new cluster using `ceph mds newfs 1 0 --yes-i-really-mean-it' and starting all MDSs again. Directly afterwards, I saw this: # rados --pool=metadata ls 1. 2. 200. 200.0001 600. 601. 602. 603. 605. 606. 608. 609. mds0_inotable mds0_sessionmap Note the missing objects, right from the start. I was able to mount the CephFS at this point, but after unmounting it and restarting the MDS-cluster, it failed to come up, with the same symptoms as before. I didn't place any files on CephFS at any point between newfs and failure. Naturally, I tried initializing it again, but now, even after more than 5 tries, the mds*-objects simply no longer show up in the metadata-pool at all. In fact, it remains empty. I can mount CephFS after the first start of the MDS-cluster after a newfs, but on restart, it fails because of the missing objects. Am I doing anything wrong while initializing the cluster, maybe? Is cleaning the pools and doing the newfs enough? I did the same on the other cluster yesterday and it seems to have all objects. Thank you for your default information. The cause of missing object is that the MDS IDs for old FS and new FS are the same (incarnations are the same). When OSD receives MDS requests for the newly created FS. It silently drops the requests, because it thinks they are duplicated. You can get around the bug by creating new pools for the newfs. Thanks for this very useful info, I think this solves the mystery! Could I get around it any other way? I'd rather not have to re-create the pools and switch to new pool-ID's every time I have to do this. Does the OSD store this info in it's meta-data, or might restarting the OSDs be enough? I'm quite sure that I re-created MDS-clusters on the same pools many times, without all the objects going missing. This was usually as part of tests, where I also restarted other cluster-components, like OSDs. This could explain why only some files went missing. If some OSDs are restarted and processed the requests, while others dropped the requests, it would appear as if some, but not all objects are missing. The problem then persists until the active MDS in the MDS-cluster is restarted, after which the missing objects get noticed, because things fail to restart. IMHO, this is a bug. Why Yes, it's a bug. Fixing it should be easy. would the OSD ignore these requests, if the objects the MDS tries to write don't even exist at that time? OSD uses informartion in PG log to check duplicated requests, so restarting OSD does not work. Another way to get around the bug is generate lots of writes to the data/metadata pools, make sure each PG trim old entries in its log. Regards Yan, Zheng ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ocfs2 for OSDs?
On Wed, Sep 11, 2013 at 12:55 PM, David Disseldorp dd...@suse.de wrote: Hi Sage, On Wed, 11 Sep 2013 09:18:13 -0700 (PDT) Sage Weil s...@inktank.com wrote: REFLINKs (inode-based writeable snapshots) This is the one item on this list I see that the ceph-osds could take real advantage of; it would make object clones triggered by things like RBD snapshots faster. What is missing from this list that would be similarly (or more) useful is a volume/fs snapshot feature. I must be missing something here, but I don't see how this would offer any advantage over Btrfs, which provides the same feature via BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE. Yep, we do that on btrfs. Sage was just looking at things we could use which aren't part of the standard POSIX spec. This could give ocfs2 an advantage over xfs/ext4 if we implemented that awareness. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] xfsprogs not found in RHEL
Thank you. That worked. I am trying to install the object storage based on the steps here - http://ceph.com/docs/master/install/rpm/#installing-ceph-packages Where can I get these RPMs? rpm -ivh fcgi-2.4.0-10.el6.x86_64.rpm rpm -ivh mod_fastcgi-2.4.6-2.el6.rf.x86_64.rpm On Tue, Sep 10, 2013 at 5:05 PM, Gagandeep Arora aroragaga...@gmail.comwrote: Hello, I think you downloaded source rpm. Download this package from the following link http://mirror.centos.org/centos/6/updates/x86_64/Packages/xfsprogs-3.1.1-10.el6_4.1.x86_64.rpm Regards, Gagan On Wed, Sep 11, 2013 at 8:54 AM, sriram sriram@gmail.com wrote: I installed xfsprogs from http://rpm.pbone.net/index.php3/stat/26/dist/74/size/1400502/name/xfsprogs-3.1.1-4.el6.src.rpm . I then ran sudo yum install ceph and I still get the same error. Any ideas? On Wed, Aug 28, 2013 at 3:47 PM, sriram sriram@gmail.com wrote: Can anyone point me to which xfsprogs RPM to use for RHEL 6 On Wed, Aug 28, 2013 at 5:46 AM, Sriram sriram@gmail.com wrote: Yes I read that but I was not sure if installing from Centos 6 repository can cause issues. On Aug 27, 2013, at 11:46 PM, Stroppa Daniele (strp) s...@zhaw.ch wrote: Check this issue: http://tracker.ceph.com/issues/5193 You might need the RHEL Scalable File System add-on. Cheers, -- Daniele Stroppa Researcher Institute of Information Technology Zürich University of Applied Sciences http://www.cloudcomp.ch From: sriram sriram@gmail.com Date: Tue, 27 Aug 2013 22:50:41 -0700 To: Lincoln Bryant linco...@uchicago.edu Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] xfsprogs not found in RHEL Tried yum clean all followed by yum install ceph and the same result. On Tue, Aug 27, 2013 at 7:44 PM, Lincoln Bryant linco...@uchicago.eduwrote: Hi, xfsprogs should be included in the EL6 base. Perhaps run yum clean all and try again? Cheers, Lincoln On Aug 27, 2013, at 9:16 PM, sriram wrote: I am trying to install CEPH and I get the following error - --- Package ceph.x86_64 0:0.67.2-0.el6 will be installed -- Processing Dependency: xfsprogs for package: ceph-0.67.2-0.el6.x86_64 --- Package python-babel.noarch 0:0.9.4-5.1.el6 will be installed --- Package python-backports-ssl_match_hostname.noarch 0:3.2-0.3.a3.el6 will be installed --- Package python-docutils.noarch 0:0.6-1.el6 will be installed -- Processing Dependency: python-imaging for package: python-docutils-0.6-1.el6.noarch --- Package python-jinja2.x86_64 0:2.2.1-1.el6 will be installed --- Package python-pygments.noarch 0:1.1.1-1.el6 will be installed --- Package python-six.noarch 0:1.1.0-2.el6 will be installed -- Running transaction check --- Package ceph.x86_64 0:0.67.2-0.el6 will be installed -- Processing Dependency: xfsprogs for package: ceph-0.67.2-0.el6.x86_64 --- Package python-imaging.x86_64 0:1.1.6-19.el6 will be installed -- Finished Dependency Resolution Error: Package: ceph-0.67.2-0.el6.x86_64 (ceph) Requires: xfsprogs Machine Info - Linux version 2.6.32-131.4.1.el6.x86_64 ( mockbu...@x86-003.build.bos.redhat.com) (gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC) ) #1 SMP Fri Jun 10 10:54:26 EDT 2011 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] librados vs libcephfs performance for database broker
On Fri, Sep 6, 2013 at 2:08 AM, Serge Slipchenko serge.slipche...@gmail.com wrote: Hi, I am setting up a cluster that is using Hypertable as one of the key components. This had required some fixes of CephBroker, which I hope would be integrated to the main Hypertable branch soon. However, it seems to me that CephBroker doesn't need full fledged filesystem support. I wonder if raw librados could give me extra performance or metadata management isn't really time consuming. It's unlikely to be much of a performance impact since IIRC Hypertable uses pretty large files/chunks in order to minimize the metadata, and CephFS is pretty efficient about managing metadata in that scenario. Raw RADOS would be a simpler overall stack (and maybe more stable, right now) but would require implementing whatever Hypertable requires (and I think it basically wants HDFS) independently of the existing FS stuff, and that would be a non-trivial project. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ocfs2 for OSDs?
Hi Sage, On Wed, 11 Sep 2013 09:18:13 -0700 (PDT) Sage Weil s...@inktank.com wrote: REFLINKs (inode-based writeable snapshots) This is the one item on this list I see that the ceph-osds could take real advantage of; it would make object clones triggered by things like RBD snapshots faster. What is missing from this list that would be similarly (or more) useful is a volume/fs snapshot feature. I must be missing something here, but I don't see how this would offer any advantage over Btrfs, which provides the same feature via BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE. Cheers, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] xfsprogs not found in RHEL
On Wed, Aug 28, 2013 at 4:46 PM, Stroppa Daniele (strp) s...@zhaw.ch wrote: You might need the RHEL Scalable File System add-on. Exactly. I understand this needs to be purchased from Red Hat in order to get access to it if you are using the Red Hat subscription management system. I expect you could drag over the CentOS RPM but you would then need to track updates/patches yourself (or minimally reconcile differences between Red Hat and CentOS). In summary: XFS on Red Hat is a paid-for-option. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] how to upload file by the ceph S3 php api
To whom it may concern: I run ceph radosgw as s3 service, and the AWS SDK for PHP suport upload file for browser. the class is s3BrowserUpload , how the ceph s3 api use s3BrowserUpload class, and the ceph s3 api support upload file; ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] how to upload file by the ceph S3 php api
To whom it may concern: I run ceph radosgw as s3 service, and the AWS SDK for PHP suport upload file for browser. the class is s3BrowserUpload , how the ceph s3 api use s3BrowserUpload class, and the ceph s3 api support upload file; the AWS example ?php require_once 'sdk.class.php'; require_once 'extensions/s3browserupload.class.php'; $upload = new S3BrowserUpload(); // Generate the parameters for the upload. $html_parameters = $upload-generate_upload_parameters('my-upload-bucket', '15 minutes', array( // Set permissions to private. 'acl' = AmazonS3::ACL_PRIVATE, // Set various HTTP headers on the uploaded file. 'Content-Disposition' = 'attachment; filename=information.txt', 'Content-Encoding' = 'gzip', 'Content-Type' = '^text/', 'Expires' = gmdate(DATE_RFC1123, strtotime('January 1, 1970, midnight GMT')), // Format for the HTTP Expires header // The S3 Key to upload to. ${filename} is an S3 variable that equals the name of the file being uploaded. // We're also using PHP's built-in Filter extension in this example. 'key' = '^user/' . filter_var($_POST['user_id'], FILTER_VALIDATE_INT) . '/${filename}', // Where should S3 redirect to after the upload completes? The current page. 'success_action_redirect' = S3BrowserUpload::current_uri(), // Status code to send back on success. This is primarily to work around issues in Adobe® Flash®. 'success_action_status' = 201, // Use reduced redundancy storage. 'x-amz-storage-class' = AmazonS3::STORAGE_REDUCED )); ? form action=?= $html_parameters['form']['action'] ? method=?= $html_parameters['form']['method'] ? enctype=?= $html_parameters['form']['enctype'] ? ? foreach ($html_parameters['inputs'] as $name = $value): ? input type=hidden name=?= $name; ? value=?= $value; ? ? endforeach; ? input type=file name=file input type=submit name=upload value=Upload /form ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3
On Thu, Sep 12, 2013 at 3:26 AM, Oliver Daudey oli...@xs4all.nl wrote: Hey Yan, Just confirming that creating fresh pools and doing the newfs on those fixed the problem, while restarting the OSDs didn't, thanks again! If you come up with a permanent fix, let me know and I'll test it for you. Here is the patch, thanks for testing. --- commit e42b371cc83aa0398d2c288d7a25a3e8f3494afb Author: Yan, Zheng zheng.z@intel.com Date: Thu Sep 12 09:50:51 2013 +0800 mon/MDSMonitor: don't reset incarnation when creating newfs Signed-off-by: Yan, Zheng zheng.z@intel.com diff --git a/src/mon/MDSMonitor.cc b/src/mon/MDSMonitor.cc index 9988d8c..b227327 100644 --- a/src/mon/MDSMonitor.cc +++ b/src/mon/MDSMonitor.cc @@ -947,6 +947,7 @@ bool MDSMonitor::prepare_command(MMonCommand *m) ss this is DANGEROUS and will wipe out the mdsmap's fs, and may clobber data in the new pools you specify. add --yes-i-really-mean-it if you do.; r = -EPERM; } else { + newmap.inc = pending_mdsmap.inc; pending_mdsmap = newmap; pending_mdsmap.epoch = mdsmap.epoch + 1; create_new_fs(pending_mdsmap, metadata, data); ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [RadosGW] Performance for Concurrency Connections
Hi Yehuda, Here's my ceph.conf root@p01:/tmp# cat /etc/ceph/ceph.conf [global] fsid = 6e05675c-f545-4d88-9784-ea56ceda750e mon_initial_members = s01, s02, s03 mon_host = 192.168.2.61,192.168.2.62,192.168.2.63 auth_supported = cephx osd_journal_size = 1024 filestore_xattr_use_omap = true [client.radosgw.gateway] host = p01 keyring = /etc/ceph/keyring.radosgw.gateway rgw_socket_path = /tmp/radosgw.sock log_file = /var/log/ceph/radosgw.log rgw_thread_pool_size = 200 Depends on my conf, the /tmp/radosgw.sock was created while starting radosgw service. So that I tried to show up config by : root@p01:/tmp# ceph --admin-daemon /tmp/radosgw.sock config show read only got 0 bytes of 4 expected for response length; invalid command? Is it a bug or operation mistake ? root@p01:/tmp# radosgw-admin -v ceph version 0.61.8 (a6fdcca3bddbc9f177e4e2bf0d9cdd85006b028b) Appreciate ~ +Hugo Kuo+ (+886) 935004793 2013/9/11 Yehuda Sadeh yeh...@inktank.com On Wed, Sep 11, 2013 at 7:57 AM, Kuo Hugo tonyt...@gmail.com wrote: Hi Yehuda, I tried ... a question for modifying param. How to make it effect to the RadosGW ? is it by restarting radosgw ? The value was set to 200. I'm not sure if it's applied to RadosGW or not. Is there a way to check the runtime value of rgw thread pool size ? You can do it through the admin socket interface. Try running something like: $ ceph --admin-daemon /var/run/ceph/radosgw.asok config show $ ceph --admin-daemon /var/run/ceph/radosgw.asok config set rgw_thread_pool_size 200 The path to the admin socket may be different, and in any case can be set through the 'admin socket' variable in ceph.conf. Yehuda 2013/9/11 Yehuda Sadeh yeh...@inktank.com Try modifing the 'rgw thread pool size' param in your ceph.conf. By default it's 100, so try increasing it and see if it affects anything. Yehuda On Wed, Sep 11, 2013 at 3:14 AM, Kuo Hugo tonyt...@gmail.com wrote: For ref : Benchmark result Could someone help me to improve the performance of high concurrency use case ? Any suggestion would be excellent.! +Hugo Kuo+ (+886) 935004793 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [RadosGW] Performance for Concurrency Connections
On Wed, Sep 11, 2013 at 9:34 PM, Kuo Hugo tonyt...@gmail.com wrote: Hi Yehuda, Here's my ceph.conf root@p01:/tmp# cat /etc/ceph/ceph.conf [global] fsid = 6e05675c-f545-4d88-9784-ea56ceda750e mon_initial_members = s01, s02, s03 mon_host = 192.168.2.61,192.168.2.62,192.168.2.63 auth_supported = cephx osd_journal_size = 1024 filestore_xattr_use_omap = true [client.radosgw.gateway] host = p01 keyring = /etc/ceph/keyring.radosgw.gateway rgw_socket_path = /tmp/radosgw.sock log_file = /var/log/ceph/radosgw.log rgw_thread_pool_size = 200 Depends on my conf, the /tmp/radosgw.sock was created while starting radosgw service. So that I tried to show up config by : root@p01:/tmp# ceph --admin-daemon /tmp/radosgw.sock config show read only got 0 bytes of 4 expected for response length; invalid command? Is it a bug or operation mistake ? You're connecting to the wrong socket. You need to connect to the admin socket, not to the socket that used for web server - gateway communication. That socket by default should reside in /var/run/ceph. root@p01:/tmp# radosgw-admin -v ceph version 0.61.8 (a6fdcca3bddbc9f177e4e2bf0d9cdd85006b028b) Appreciate ~ +Hugo Kuo+ (+886) 935004793 2013/9/11 Yehuda Sadeh yeh...@inktank.com On Wed, Sep 11, 2013 at 7:57 AM, Kuo Hugo tonyt...@gmail.com wrote: Hi Yehuda, I tried ... a question for modifying param. How to make it effect to the RadosGW ? is it by restarting radosgw ? The value was set to 200. I'm not sure if it's applied to RadosGW or not. Is there a way to check the runtime value of rgw thread pool size ? You can do it through the admin socket interface. Try running something like: $ ceph --admin-daemon /var/run/ceph/radosgw.asok config show $ ceph --admin-daemon /var/run/ceph/radosgw.asok config set rgw_thread_pool_size 200 The path to the admin socket may be different, and in any case can be set through the 'admin socket' variable in ceph.conf. Yehuda 2013/9/11 Yehuda Sadeh yeh...@inktank.com Try modifing the 'rgw thread pool size' param in your ceph.conf. By default it's 100, so try increasing it and see if it affects anything. Yehuda On Wed, Sep 11, 2013 at 3:14 AM, Kuo Hugo tonyt...@gmail.com wrote: For ref : Benchmark result Could someone help me to improve the performance of high concurrency use case ? Any suggestion would be excellent.! +Hugo Kuo+ (+886) 935004793 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [RadosGW] Performance for Concurrency Connections
Hmm Interesting now. I have no admin socket opened around. root@p01:/var/run/ceph# ls /var/run/ceph -al total 0 drwxr-xr-x 2 root root 40 Sep 9 07:47 . drwxr-xr-x 17 root root 600 Sep 11 21:23 .. root@p01:/var/run/ceph# lsof | grep radosgw.asok root@p01:/var/run/ceph# I review the on-line doc for radosgw : http://ceph.com/docs/next/radosgw/config-ref/ There's no configuration for rgw admin socket tho. root@s01:~# ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok config show | grep rgw_thread rgw_thread_pool_size: 100, 1. I found that the OSD config information includes rgw_thread_pool_size , is this what you mentioned ? 2. Why that the value is on OSD? 3. Where is the value of rgw_thread_pool_size that OSDs referenced from ? +Hugo Kuo+ (+886) 935004793 2013/9/12 Yehuda Sadeh yeh...@inktank.com On Wed, Sep 11, 2013 at 9:34 PM, Kuo Hugo tonyt...@gmail.com wrote: Hi Yehuda, Here's my ceph.conf root@p01:/tmp# cat /etc/ceph/ceph.conf [global] fsid = 6e05675c-f545-4d88-9784-ea56ceda750e mon_initial_members = s01, s02, s03 mon_host = 192.168.2.61,192.168.2.62,192.168.2.63 auth_supported = cephx osd_journal_size = 1024 filestore_xattr_use_omap = true [client.radosgw.gateway] host = p01 keyring = /etc/ceph/keyring.radosgw.gateway rgw_socket_path = /tmp/radosgw.sock log_file = /var/log/ceph/radosgw.log rgw_thread_pool_size = 200 Depends on my conf, the /tmp/radosgw.sock was created while starting radosgw service. So that I tried to show up config by : root@p01:/tmp# ceph --admin-daemon /tmp/radosgw.sock config show read only got 0 bytes of 4 expected for response length; invalid command? Is it a bug or operation mistake ? You're connecting to the wrong socket. You need to connect to the admin socket, not to the socket that used for web server - gateway communication. That socket by default should reside in /var/run/ceph. root@p01:/tmp# radosgw-admin -v ceph version 0.61.8 (a6fdcca3bddbc9f177e4e2bf0d9cdd85006b028b) Appreciate ~ +Hugo Kuo+ (+886) 935004793 2013/9/11 Yehuda Sadeh yeh...@inktank.com On Wed, Sep 11, 2013 at 7:57 AM, Kuo Hugo tonyt...@gmail.com wrote: Hi Yehuda, I tried ... a question for modifying param. How to make it effect to the RadosGW ? is it by restarting radosgw ? The value was set to 200. I'm not sure if it's applied to RadosGW or not. Is there a way to check the runtime value of rgw thread pool size ? You can do it through the admin socket interface. Try running something like: $ ceph --admin-daemon /var/run/ceph/radosgw.asok config show $ ceph --admin-daemon /var/run/ceph/radosgw.asok config set rgw_thread_pool_size 200 The path to the admin socket may be different, and in any case can be set through the 'admin socket' variable in ceph.conf. Yehuda 2013/9/11 Yehuda Sadeh yeh...@inktank.com Try modifing the 'rgw thread pool size' param in your ceph.conf. By default it's 100, so try increasing it and see if it affects anything. Yehuda On Wed, Sep 11, 2013 at 3:14 AM, Kuo Hugo tonyt...@gmail.com wrote: For ref : Benchmark result Could someone help me to improve the performance of high concurrency use case ? Any suggestion would be excellent.! +Hugo Kuo+ (+886) 935004793 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [RadosGW] Performance for Concurrency Connections
On Wed, Sep 11, 2013 at 9:57 PM, Kuo Hugo tonyt...@gmail.com wrote: Hmm Interesting now. I have no admin socket opened around. Maybe your radosgw process doesn't have permissions to write into /var/run/ceph? root@p01:/var/run/ceph# ls /var/run/ceph -al total 0 drwxr-xr-x 2 root root 40 Sep 9 07:47 . drwxr-xr-x 17 root root 600 Sep 11 21:23 .. root@p01:/var/run/ceph# lsof | grep radosgw.asok root@p01:/var/run/ceph# I review the on-line doc for radosgw : http://ceph.com/docs/next/radosgw/config-ref/ There's no configuration for rgw admin socket tho. It's a generic ceph configurable. It's 'admin socket'. root@s01:~# ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok config show | grep rgw_thread rgw_thread_pool_size: 100, I found that the OSD config information includes rgw_thread_pool_size , is this what you mentioned ? Why that the value is on OSD? Where is the value of rgw_thread_pool_size that OSDs referenced from ? The ceph global config holds that variable, the osd just gets all the defaults but it has no use for it. Yehuda +Hugo Kuo+ (+886) 935004793 2013/9/12 Yehuda Sadeh yeh...@inktank.com On Wed, Sep 11, 2013 at 9:34 PM, Kuo Hugo tonyt...@gmail.com wrote: Hi Yehuda, Here's my ceph.conf root@p01:/tmp# cat /etc/ceph/ceph.conf [global] fsid = 6e05675c-f545-4d88-9784-ea56ceda750e mon_initial_members = s01, s02, s03 mon_host = 192.168.2.61,192.168.2.62,192.168.2.63 auth_supported = cephx osd_journal_size = 1024 filestore_xattr_use_omap = true [client.radosgw.gateway] host = p01 keyring = /etc/ceph/keyring.radosgw.gateway rgw_socket_path = /tmp/radosgw.sock log_file = /var/log/ceph/radosgw.log rgw_thread_pool_size = 200 Depends on my conf, the /tmp/radosgw.sock was created while starting radosgw service. So that I tried to show up config by : root@p01:/tmp# ceph --admin-daemon /tmp/radosgw.sock config show read only got 0 bytes of 4 expected for response length; invalid command? Is it a bug or operation mistake ? You're connecting to the wrong socket. You need to connect to the admin socket, not to the socket that used for web server - gateway communication. That socket by default should reside in /var/run/ceph. root@p01:/tmp# radosgw-admin -v ceph version 0.61.8 (a6fdcca3bddbc9f177e4e2bf0d9cdd85006b028b) Appreciate ~ +Hugo Kuo+ (+886) 935004793 2013/9/11 Yehuda Sadeh yeh...@inktank.com On Wed, Sep 11, 2013 at 7:57 AM, Kuo Hugo tonyt...@gmail.com wrote: Hi Yehuda, I tried ... a question for modifying param. How to make it effect to the RadosGW ? is it by restarting radosgw ? The value was set to 200. I'm not sure if it's applied to RadosGW or not. Is there a way to check the runtime value of rgw thread pool size ? You can do it through the admin socket interface. Try running something like: $ ceph --admin-daemon /var/run/ceph/radosgw.asok config show $ ceph --admin-daemon /var/run/ceph/radosgw.asok config set rgw_thread_pool_size 200 The path to the admin socket may be different, and in any case can be set through the 'admin socket' variable in ceph.conf. Yehuda 2013/9/11 Yehuda Sadeh yeh...@inktank.com Try modifing the 'rgw thread pool size' param in your ceph.conf. By default it's 100, so try increasing it and see if it affects anything. Yehuda On Wed, Sep 11, 2013 at 3:14 AM, Kuo Hugo tonyt...@gmail.com wrote: For ref : Benchmark result Could someone help me to improve the performance of high concurrency use case ? Any suggestion would be excellent.! +Hugo Kuo+ (+886) 935004793 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [RadosGW] Performance for Concurrency Connections
thanks 1) I'm sure there's no asok socket filer for the radosgw in my RadosGW host. 2) The rgw_thread_pool_size was set to 200 in my ceph.conf. So that the radosgw is using the value now generally. 3) If so, the tweaking of rgw_thread_pool_size value from 100-200 was not help for improve the performance of concurrency connection. 4) I'm considering to put some research on apache's configurations. 5) Do ya have a similar benchmark run for high concurrency connections before? Cheers +Hugo Kuo+ (+886) 935004793 2013/9/12 Yehuda Sadeh yeh...@inktank.com On Wed, Sep 11, 2013 at 9:57 PM, Kuo Hugo tonyt...@gmail.com wrote: Hmm Interesting now. I have no admin socket opened around. Maybe your radosgw process doesn't have permissions to write into /var/run/ceph? root@p01:/var/run/ceph# ls /var/run/ceph -al total 0 drwxr-xr-x 2 root root 40 Sep 9 07:47 . drwxr-xr-x 17 root root 600 Sep 11 21:23 .. root@p01:/var/run/ceph# lsof | grep radosgw.asok root@p01:/var/run/ceph# I review the on-line doc for radosgw : http://ceph.com/docs/next/radosgw/config-ref/ There's no configuration for rgw admin socket tho. It's a generic ceph configurable. It's 'admin socket'. root@s01:~# ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok config show | grep rgw_thread rgw_thread_pool_size: 100, I found that the OSD config information includes rgw_thread_pool_size , is this what you mentioned ? Why that the value is on OSD? Where is the value of rgw_thread_pool_size that OSDs referenced from ? The ceph global config holds that variable, the osd just gets all the defaults but it has no use for it. Yehuda +Hugo Kuo+ (+886) 935004793 2013/9/12 Yehuda Sadeh yeh...@inktank.com On Wed, Sep 11, 2013 at 9:34 PM, Kuo Hugo tonyt...@gmail.com wrote: Hi Yehuda, Here's my ceph.conf root@p01:/tmp# cat /etc/ceph/ceph.conf [global] fsid = 6e05675c-f545-4d88-9784-ea56ceda750e mon_initial_members = s01, s02, s03 mon_host = 192.168.2.61,192.168.2.62,192.168.2.63 auth_supported = cephx osd_journal_size = 1024 filestore_xattr_use_omap = true [client.radosgw.gateway] host = p01 keyring = /etc/ceph/keyring.radosgw.gateway rgw_socket_path = /tmp/radosgw.sock log_file = /var/log/ceph/radosgw.log rgw_thread_pool_size = 200 Depends on my conf, the /tmp/radosgw.sock was created while starting radosgw service. So that I tried to show up config by : root@p01:/tmp# ceph --admin-daemon /tmp/radosgw.sock config show read only got 0 bytes of 4 expected for response length; invalid command? Is it a bug or operation mistake ? You're connecting to the wrong socket. You need to connect to the admin socket, not to the socket that used for web server - gateway communication. That socket by default should reside in /var/run/ceph. root@p01:/tmp# radosgw-admin -v ceph version 0.61.8 (a6fdcca3bddbc9f177e4e2bf0d9cdd85006b028b) Appreciate ~ +Hugo Kuo+ (+886) 935004793 2013/9/11 Yehuda Sadeh yeh...@inktank.com On Wed, Sep 11, 2013 at 7:57 AM, Kuo Hugo tonyt...@gmail.com wrote: Hi Yehuda, I tried ... a question for modifying param. How to make it effect to the RadosGW ? is it by restarting radosgw ? The value was set to 200. I'm not sure if it's applied to RadosGW or not. Is there a way to check the runtime value of rgw thread pool size ? You can do it through the admin socket interface. Try running something like: $ ceph --admin-daemon /var/run/ceph/radosgw.asok config show $ ceph --admin-daemon /var/run/ceph/radosgw.asok config set rgw_thread_pool_size 200 The path to the admin socket may be different, and in any case can be set through the 'admin socket' variable in ceph.conf. Yehuda 2013/9/11 Yehuda Sadeh yeh...@inktank.com Try modifing the 'rgw thread pool size' param in your ceph.conf. By default it's 100, so try increasing it and see if it affects anything. Yehuda On Wed, Sep 11, 2013 at 3:14 AM, Kuo Hugo tonyt...@gmail.com wrote: For ref : Benchmark result Could someone help me to improve the performance of high concurrency use case ? Any suggestion would be excellent.! +Hugo Kuo+ (+886) 935004793 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [RadosGW] Performance for Concurrency Connections
On Wed, Sep 11, 2013 at 10:25 PM, Kuo Hugo tonyt...@gmail.com wrote: thanks 1) I'm sure there's no asok socket filer for the radosgw in my RadosGW host. 2) The rgw_thread_pool_size was set to 200 in my ceph.conf. So that the radosgw is using the value now generally. 3) If so, the tweaking of rgw_thread_pool_size value from 100-200 was not Did you restart your gateway afterwards? help for improve the performance of concurrency connection. 4) I'm considering to put some research on apache's configurations. 5) Do ya have a similar benchmark run for high concurrency connections before? Cheers +Hugo Kuo+ (+886) 935004793 2013/9/12 Yehuda Sadeh yeh...@inktank.com On Wed, Sep 11, 2013 at 9:57 PM, Kuo Hugo tonyt...@gmail.com wrote: Hmm Interesting now. I have no admin socket opened around. Maybe your radosgw process doesn't have permissions to write into /var/run/ceph? root@p01:/var/run/ceph# ls /var/run/ceph -al total 0 drwxr-xr-x 2 root root 40 Sep 9 07:47 . drwxr-xr-x 17 root root 600 Sep 11 21:23 .. root@p01:/var/run/ceph# lsof | grep radosgw.asok root@p01:/var/run/ceph# I review the on-line doc for radosgw : http://ceph.com/docs/next/radosgw/config-ref/ There's no configuration for rgw admin socket tho. It's a generic ceph configurable. It's 'admin socket'. root@s01:~# ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok config show | grep rgw_thread rgw_thread_pool_size: 100, I found that the OSD config information includes rgw_thread_pool_size , is this what you mentioned ? Why that the value is on OSD? Where is the value of rgw_thread_pool_size that OSDs referenced from ? The ceph global config holds that variable, the osd just gets all the defaults but it has no use for it. Yehuda +Hugo Kuo+ (+886) 935004793 2013/9/12 Yehuda Sadeh yeh...@inktank.com On Wed, Sep 11, 2013 at 9:34 PM, Kuo Hugo tonyt...@gmail.com wrote: Hi Yehuda, Here's my ceph.conf root@p01:/tmp# cat /etc/ceph/ceph.conf [global] fsid = 6e05675c-f545-4d88-9784-ea56ceda750e mon_initial_members = s01, s02, s03 mon_host = 192.168.2.61,192.168.2.62,192.168.2.63 auth_supported = cephx osd_journal_size = 1024 filestore_xattr_use_omap = true [client.radosgw.gateway] host = p01 keyring = /etc/ceph/keyring.radosgw.gateway rgw_socket_path = /tmp/radosgw.sock log_file = /var/log/ceph/radosgw.log rgw_thread_pool_size = 200 Depends on my conf, the /tmp/radosgw.sock was created while starting radosgw service. So that I tried to show up config by : root@p01:/tmp# ceph --admin-daemon /tmp/radosgw.sock config show read only got 0 bytes of 4 expected for response length; invalid command? Is it a bug or operation mistake ? You're connecting to the wrong socket. You need to connect to the admin socket, not to the socket that used for web server - gateway communication. That socket by default should reside in /var/run/ceph. root@p01:/tmp# radosgw-admin -v ceph version 0.61.8 (a6fdcca3bddbc9f177e4e2bf0d9cdd85006b028b) Appreciate ~ +Hugo Kuo+ (+886) 935004793 2013/9/11 Yehuda Sadeh yeh...@inktank.com On Wed, Sep 11, 2013 at 7:57 AM, Kuo Hugo tonyt...@gmail.com wrote: Hi Yehuda, I tried ... a question for modifying param. How to make it effect to the RadosGW ? is it by restarting radosgw ? The value was set to 200. I'm not sure if it's applied to RadosGW or not. Is there a way to check the runtime value of rgw thread pool size ? You can do it through the admin socket interface. Try running something like: $ ceph --admin-daemon /var/run/ceph/radosgw.asok config show $ ceph --admin-daemon /var/run/ceph/radosgw.asok config set rgw_thread_pool_size 200 The path to the admin socket may be different, and in any case can be set through the 'admin socket' variable in ceph.conf. Yehuda 2013/9/11 Yehuda Sadeh yeh...@inktank.com Try modifing the 'rgw thread pool size' param in your ceph.conf. By default it's 100, so try increasing it and see if it affects anything. Yehuda On Wed, Sep 11, 2013 at 3:14 AM, Kuo Hugo tonyt...@gmail.com wrote: For ref : Benchmark result Could someone help me to improve the performance of high concurrency use case ? Any suggestion would be excellent.! +Hugo Kuo+ (+886) 935004793 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [RadosGW] Performance for Concurrency Connections
Yes. I restart it by /etc/init.d/radosgw for times before. :D btw, I check several things here to prevent any permission issue. root@p01:/var/run# /etc/init.d/radosgw start Starting client.radosgw.gateway... root@p01:/var/run# ps aux | grep rados root 25823 1.8 0.0 16436340 7096 ? Ssl 22:30 0:00 /usr/bin/radosgw -n client.radosgw.gateway root 26055 0.0 0.0 9384 920 pts/0S+ 22:30 0:00 grep --color=auto rados root@p01:/var/run# ls ceph/ root@p01:/var/run# ls -ald ceph/ drwxrwxrwx 2 root root 40 Sep 9 07:47 ceph/ apparently, the radosgw is running by root. For safety, I change the mode of /var/run/ceph to fully opened. Still no luck with radosgw admin socket stuff. One more thing about the performance is that performance down to 20~30% once run out of memory to cache inode. A quick reference number is 1KB object uploading test. The performance from 1200reqs/sec -- 300reqs/sec. That's a potential issue which I observed. Any way to work around would be great. +Hugo Kuo+ (+886) 935004793 2013/9/12 Yehuda Sadeh yeh...@inktank.com On Wed, Sep 11, 2013 at 10:25 PM, Kuo Hugo tonyt...@gmail.com wrote: thanks 1) I'm sure there's no asok socket filer for the radosgw in my RadosGW host. 2) The rgw_thread_pool_size was set to 200 in my ceph.conf. So that the radosgw is using the value now generally. 3) If so, the tweaking of rgw_thread_pool_size value from 100-200 was not Did you restart your gateway afterwards? help for improve the performance of concurrency connection. 4) I'm considering to put some research on apache's configurations. 5) Do ya have a similar benchmark run for high concurrency connections before? Cheers +Hugo Kuo+ (+886) 935004793 2013/9/12 Yehuda Sadeh yeh...@inktank.com On Wed, Sep 11, 2013 at 9:57 PM, Kuo Hugo tonyt...@gmail.com wrote: Hmm Interesting now. I have no admin socket opened around. Maybe your radosgw process doesn't have permissions to write into /var/run/ceph? root@p01:/var/run/ceph# ls /var/run/ceph -al total 0 drwxr-xr-x 2 root root 40 Sep 9 07:47 . drwxr-xr-x 17 root root 600 Sep 11 21:23 .. root@p01:/var/run/ceph# lsof | grep radosgw.asok root@p01:/var/run/ceph# I review the on-line doc for radosgw : http://ceph.com/docs/next/radosgw/config-ref/ There's no configuration for rgw admin socket tho. It's a generic ceph configurable. It's 'admin socket'. root@s01:~# ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok config show | grep rgw_thread rgw_thread_pool_size: 100, I found that the OSD config information includes rgw_thread_pool_size , is this what you mentioned ? Why that the value is on OSD? Where is the value of rgw_thread_pool_size that OSDs referenced from ? The ceph global config holds that variable, the osd just gets all the defaults but it has no use for it. Yehuda +Hugo Kuo+ (+886) 935004793 2013/9/12 Yehuda Sadeh yeh...@inktank.com On Wed, Sep 11, 2013 at 9:34 PM, Kuo Hugo tonyt...@gmail.com wrote: Hi Yehuda, Here's my ceph.conf root@p01:/tmp# cat /etc/ceph/ceph.conf [global] fsid = 6e05675c-f545-4d88-9784-ea56ceda750e mon_initial_members = s01, s02, s03 mon_host = 192.168.2.61,192.168.2.62,192.168.2.63 auth_supported = cephx osd_journal_size = 1024 filestore_xattr_use_omap = true [client.radosgw.gateway] host = p01 keyring = /etc/ceph/keyring.radosgw.gateway rgw_socket_path = /tmp/radosgw.sock log_file = /var/log/ceph/radosgw.log rgw_thread_pool_size = 200 Depends on my conf, the /tmp/radosgw.sock was created while starting radosgw service. So that I tried to show up config by : root@p01:/tmp# ceph --admin-daemon /tmp/radosgw.sock config show read only got 0 bytes of 4 expected for response length; invalid command? Is it a bug or operation mistake ? You're connecting to the wrong socket. You need to connect to the admin socket, not to the socket that used for web server - gateway communication. That socket by default should reside in /var/run/ceph. root@p01:/tmp# radosgw-admin -v ceph version 0.61.8 (a6fdcca3bddbc9f177e4e2bf0d9cdd85006b028b) Appreciate ~ +Hugo Kuo+ (+886) 935004793 2013/9/11 Yehuda Sadeh yeh...@inktank.com On Wed, Sep 11, 2013 at 7:57 AM, Kuo Hugo tonyt...@gmail.com wrote: Hi Yehuda, I tried ... a question for modifying param. How to make it effect to the RadosGW ? is it by restarting radosgw ? The value was set to 200. I'm not sure if it's applied to RadosGW or not. Is there a way to check the runtime value of rgw thread pool size ? You can do it through the admin socket interface. Try running something like: $ ceph
Re: [ceph-users] [RadosGW] Performance for Concurrency Connections
On Wed, Sep 11, 2013 at 10:37 PM, Kuo Hugo tonyt...@gmail.com wrote: Yes. I restart it by /etc/init.d/radosgw for times before. :D btw, I check several things here to prevent any permission issue. root@p01:/var/run# /etc/init.d/radosgw start Starting client.radosgw.gateway... root@p01:/var/run# ps aux | grep rados root 25823 1.8 0.0 16436340 7096 ? Ssl 22:30 0:00 /usr/bin/radosgw -n client.radosgw.gateway root 26055 0.0 0.0 9384 920 pts/0S+ 22:30 0:00 grep --color=auto rados root@p01:/var/run# ls ceph/ root@p01:/var/run# ls -ald ceph/ drwxrwxrwx 2 root root 40 Sep 9 07:47 ceph/ apparently, the radosgw is running by root. For safety, I change the mode of /var/run/ceph to fully opened. Still no luck with radosgw admin socket stuff. One more thing about the performance is that performance down to 20~30% once run out of memory to cache inode. A quick reference number is 1KB object uploading test. The performance from 1200reqs/sec -- 300reqs/sec. That's a potential issue which I observed. Any way to work around would be great. You can try disabling the rgw cache. We just found out an issue with it so it may be interesting to see how the system behaves without it: rgw cache enabled = false Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com