Re: [ceph-users] About the data movement in Ceph
Thx Sage for help me understand Ceph much more deeply! And recently i have another questions as follows, 1. As we know, Ceph -s is the summary of system's state, and is there any tools to monitor the detail of data's flow when the Crush map is changed? 2. In my understanding, the mapping between the obj and PG is consistent, we only need to change the mapping between the PG and OSD when our Crush map is changed, right? 3.Suppose that two pools in Ceph, each root has its OSDs in leaf. And if i move one's OSD from one to the other pool, in response to that, the PG in this OSD would be migrated in its original pool or rebalance in the target pool? 4. The crushtool is a very cool tool to understand the Crush, but i don't know how to use the --show-utilization (show OSD usage), what args or action i need to add in cli? Is there any cli that can query each OSD's usage and statistics? 5. I find that librados offer the api, and about this rados_ioctx_pool_stat( rados_ioctx_t io, struct rados_pool_stat_t *stats), if i want to query some pools' statistics and i need to declare some rados_ioctx io or cluster handle that each for a pool? i found the segment fault when the return for rados_ioctx_pool_stat. Very looking forward for ur kindly reply! 2013/9/11 Sage Weil > On Tue, 10 Sep 2013, atrmat wrote: > > Hi all, > > recently i read the source code and paper, and i have some questions > about > > the data movement: > > 1. when OSD's add or removal, how Ceph do this data migration and > rebalance > > the crush map? is it the rados modify the crush map or cluster map, and > the > > primary OSD does the data movement according to the cluster map? how to > > found the data migration in the source code? > > The OSDMap changes when the osd is added or removed (or some other event > or administrator action happens). In response, the OSDs recalculate where > the PGs should be stored, and move data in response to that. > > > 2. when OSD's down or failed, how Ceph recover the data in other OSDs? > is it > > the primary OSD copy the PG to the new located OSD? > > The (new) primary figures out where data is/was (peering) and the > coordinates any data migration (recovery) to where the data should now be > (according to the latest OSDMap and its embedded CRUSH map). > > > 3. the OSD has 4 status bits: up,down,in,out. But i can't found the > defined > > status-- CEPH_OSD_DOWN, is it the OSD call the function mark_osd_down() > to > > modify the OSD status in OSDMap? > > See OSDMap.h: is_up() and is_down(). For in/out, it is either binary > (is_in() and is_out() or can be somewhere in between; see get_weight()). > > Hope that helps! > > sage > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OpenStack Grizzly Authentication (Keystone PKI) with RADOS Gateway
Hello! Does RADOS Gateway supports or integrates with OpenStack (Grizzly) Authentication (Keystone PKI)? Can RADOS Gateway use PKI tokens to conduct user token verification without explicit calls to Keystone. Thanks! Amit Amit Vijairania | 978.319.3684 --*-- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rdosgw swift subuser creation
Thanks that worked - you were close: This is another document issue on http://ceph.com/docs/next/radosgw/config/ , --gen-secret parameter requirement isn't mentioned. Enabling Swift Access Allowing access to the object store with Swift (OpenStack Object Storage) compatible clients requires an additional step; namely, the creation of a subuser and a Swift access key. sudo radosgw-admin subuser create --uid=johndoe --subuser=johndoe:swift --access=full sudo radosgw-admin key create --subuser=johndoe:swift --key-type=swift radosgw-admin key create --subuser=rados:swift --key-type=swift --gen-secret 2013-09-27 14:46:40.202708 7f25c5d70780 0 WARNING: cannot read region map { "user_id": "rados", "display_name": "rados", "email": "n...@none.com", "suspended": 0, "max_buckets": 1000, "auid": 0, "subusers": [ { "id": "rados:swift", "permissions": "full-control"}], "keys": [ { "user": "rados", "access_key": "R5F0D2UCSK3618DJ829A", "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}], "swift_keys": [ { "user": "rados:swift", "secret_key": "77iJvemrxWvYk47HW7pxsL+eHdA53AtLl2T0OyuG"}], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "placement_tags": []} -Original Message- From: Matt McNulty [mailto:ma...@codero.com] Sent: Friday, September 27, 2013 4:40 PM To: Snider, Tim; ceph-users@lists.ceph.com Subject: Re: [ceph-users] rdosgw swift subuser creation Hi Tim, Try adding --gen-key to your create command (you should be able to create a key for the subuser you already created). Thanks, Matt On 9/27/13 4:35 PM, "Snider, Tim" wrote: >I created an rdosgw user and swift subuser and attempted to generate a >key for the swift user. Using the commands below. However the swift key >was empty when the command completed. What did I miss? > >root@controller21:/etc# radosgw-admin user create --uid=rados >--display-name=rados --email=n...@none.com >2013-09-27 13:34:08.155162 7f984f0a5780 0 WARNING: cannot read region >map { "user_id": "rados", > "display_name": "rados", > "email": "n...@none.com", > "suspended": 0, > "max_buckets": 1000, > "auid": 0, > "subusers": [], > "keys": [ >{ "user": "rados", > "access_key": "R5F0D2UCSK3618DJ829A", > "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}], > "swift_keys": [], > "caps": [], > "op_mask": "read, write, delete", > "default_placement": "", > "placement_tags": []} > >root@controller21:/etc# radosgw-admin subuser create --uid=rados >--subuser=rados:swift --key-type=swift --access=full >2013-09-27 13:34:58.761911 7f5307c04780 0 WARNING: cannot read region >map { "user_id": "rados", > "display_name": "rados", > "email": "n...@none.com", > "suspended": 0, > "max_buckets": 1000, > "auid": 0, > "subusers": [ >{ "id": "rados:swift", > "permissions": "full-control"}], > "keys": [ >{ "user": "rados", > "access_key": "R5F0D2UCSK3618DJ829A", > "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}], > "swift_keys": [], > "caps": [], > "op_mask": "read, write, delete", > "default_placement": "", > "placement_tags": []} > >root@controller21:/etc# radosgw-admin key create --subuser=rados:swift > --key-type=swift >2013-09-27 13:35:43.544005 7f599e672780 0 WARNING: cannot read region >map { "user_id": "rados", > "display_name": "rados", > "email": "n...@none.com", > "suspended": 0, > "max_buckets": 1000, > "auid": 0, > "subusers": [ >{ "id": "rados:swift", > "permissions": "full-control"}], > "keys": [ >{ "user": "rados", > "access_key": "R5F0D2UCSK3618DJ829A", > "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}], > "swift_keys": [ >{ "user": "rados:swift", > "secret_key": ""}], > "caps": [], > "op_mask": "read, write, delete", > "default_placement": "", > "placement_tags": []} > >Thanks, >Tim > >Timothy Snider >Strategic Planning & Architecture - Advanced Development NetApp >316-636-8736 Direct Phone >316-213-0223 Mobile Phone >tim.sni...@netapp.com >netapp.com > > > >___ >ceph-users mailing list >ceph-users@lists.ceph.com >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't mount CephFS - where to start troubleshooting?
On Fri, Sep 27, 2013 at 2:44 PM, Gregory Farnum wrote: > What is the output of ceph -s? It could be something underneath the > filesystem. > > root@chekov:~# ceph -s cluster 18b7cba7-ccc3-4945-bb39-99450be81c98 health HEALTH_OK monmap e3: 3 mons at {chekov= 10.42.6.29:6789/0,laforge=10.42.5.30:6789/0,picard=10.42.6.21:6789/0}, election epoch 30, quorum 0,1,2 chekov,laforge,picard osdmap e387: 4 osds: 4 up, 4 in pgmap v1100: 320 pgs: 320 active+clean; 7568 MB data, 15445 MB used, 14873 GB / 14888 GB avail mdsmap e28: 1/1/1 up {0=1=up:active}, 2 up:standby > What kernel version are you using? I have two configurations I've tested: root@chekov:~# uname -a Linux chekov 3.5.0-40-generic #62~precise1-Ubuntu SMP Fri Aug 23 17:38:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux aaron@seven ~ $ uname -a Linux seven 3.10.7-gentoo-r1 #1 SMP PREEMPT Thu Sep 26 07:23:03 PDT 2013 x86_64 Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz GenuineIntel GNU/Linux > Did you enable crush tunables? I haven't edited the CRUSH map for this cluster after creating it... so I don't think I've edited any of the tunables either. It > could be that your kernel doesn't support all the options you enabled. > Well, on both my Ubuntu and Gentoo systems, I can mount the other cluster just fine (the one that started as 0.61 and got upgraded): seven ~ # mount -t ceph 10.42.100.20:/ /mnt/ceph -o name=admin,secret=... seven ~ # I should have mentioned in my initial email that with or without -o name=,secret= mounting the new cluster fails with the same error 95 = Operation not supported. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't mount CephFS - where to start troubleshooting?
On Fri, Sep 27, 2013 at 2:12 PM, Aaron Ten Clay wrote: > Hi, > > I probably did something wrong setting up my cluster with 0.67.3. I > previously built a cluster with 0.61 and everything went well, even after an > upgrade to 0.67.3. Now I built a fresh 0.67.3 cluster and when I try to > mount CephFS: > > aaron@seven ~ $ sudo mount -t ceph 10.42.6.21:/ /mnt/ceph > mount error 95 = Operation not supported > > Nothing new shows in dmesg / syslog after this attempt, and I don't see > anything telling in the mds or mon logs. Any pointers on where to look? > > I have three mons, three mds (2 standby), and four osds. I'm just doing more > testing/learning Ceph, so if I did something wrong data loss is not a > problem. What is the output of ceph -s? It could be something underneath the filesystem. What kernel version are you using? Did you enable crush tunables? It could be that your kernel doesn't support all the options you enabled. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rdosgw swift subuser creation
Hi Tim, Try adding --gen-key to your create command (you should be able to create a key for the subuser you already created). Thanks, Matt On 9/27/13 4:35 PM, "Snider, Tim" wrote: >I created an rdosgw user and swift subuser and attempted to generate a >key for the swift user. Using the commands below. However the swift key >was empty when the command completed. What did I miss? > >root@controller21:/etc# radosgw-admin user create --uid=rados >--display-name=rados --email=n...@none.com >2013-09-27 13:34:08.155162 7f984f0a5780 0 WARNING: cannot read region map >{ "user_id": "rados", > "display_name": "rados", > "email": "n...@none.com", > "suspended": 0, > "max_buckets": 1000, > "auid": 0, > "subusers": [], > "keys": [ >{ "user": "rados", > "access_key": "R5F0D2UCSK3618DJ829A", > "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}], > "swift_keys": [], > "caps": [], > "op_mask": "read, write, delete", > "default_placement": "", > "placement_tags": []} > >root@controller21:/etc# radosgw-admin subuser create --uid=rados >--subuser=rados:swift --key-type=swift --access=full >2013-09-27 13:34:58.761911 7f5307c04780 0 WARNING: cannot read region map >{ "user_id": "rados", > "display_name": "rados", > "email": "n...@none.com", > "suspended": 0, > "max_buckets": 1000, > "auid": 0, > "subusers": [ >{ "id": "rados:swift", > "permissions": "full-control"}], > "keys": [ >{ "user": "rados", > "access_key": "R5F0D2UCSK3618DJ829A", > "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}], > "swift_keys": [], > "caps": [], > "op_mask": "read, write, delete", > "default_placement": "", > "placement_tags": []} > >root@controller21:/etc# radosgw-admin key create --subuser=rados:swift > --key-type=swift >2013-09-27 13:35:43.544005 7f599e672780 0 WARNING: cannot read region map >{ "user_id": "rados", > "display_name": "rados", > "email": "n...@none.com", > "suspended": 0, > "max_buckets": 1000, > "auid": 0, > "subusers": [ >{ "id": "rados:swift", > "permissions": "full-control"}], > "keys": [ >{ "user": "rados", > "access_key": "R5F0D2UCSK3618DJ829A", > "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}], > "swift_keys": [ >{ "user": "rados:swift", > "secret_key": ""}], > "caps": [], > "op_mask": "read, write, delete", > "default_placement": "", > "placement_tags": []} > >Thanks, >Tim > >Timothy Snider >Strategic Planning & Architecture - Advanced Development >NetApp >316-636-8736 Direct Phone >316-213-0223 Mobile Phone >tim.sni...@netapp.com >netapp.com > > > >___ >ceph-users mailing list >ceph-users@lists.ceph.com >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rdosgw swift subuser creation
I created an rdosgw user and swift subuser and attempted to generate a key for the swift user. Using the commands below. However the swift key was empty when the command completed. What did I miss? root@controller21:/etc# radosgw-admin user create --uid=rados --display-name=rados --email=n...@none.com 2013-09-27 13:34:08.155162 7f984f0a5780 0 WARNING: cannot read region map { "user_id": "rados", "display_name": "rados", "email": "n...@none.com", "suspended": 0, "max_buckets": 1000, "auid": 0, "subusers": [], "keys": [ { "user": "rados", "access_key": "R5F0D2UCSK3618DJ829A", "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}], "swift_keys": [], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "placement_tags": []} root@controller21:/etc# radosgw-admin subuser create --uid=rados --subuser=rados:swift --key-type=swift --access=full 2013-09-27 13:34:58.761911 7f5307c04780 0 WARNING: cannot read region map { "user_id": "rados", "display_name": "rados", "email": "n...@none.com", "suspended": 0, "max_buckets": 1000, "auid": 0, "subusers": [ { "id": "rados:swift", "permissions": "full-control"}], "keys": [ { "user": "rados", "access_key": "R5F0D2UCSK3618DJ829A", "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}], "swift_keys": [], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "placement_tags": []} root@controller21:/etc# radosgw-admin key create --subuser=rados:swift --key-type=swift 2013-09-27 13:35:43.544005 7f599e672780 0 WARNING: cannot read region map { "user_id": "rados", "display_name": "rados", "email": "n...@none.com", "suspended": 0, "max_buckets": 1000, "auid": 0, "subusers": [ { "id": "rados:swift", "permissions": "full-control"}], "keys": [ { "user": "rados", "access_key": "R5F0D2UCSK3618DJ829A", "secret_key": "PJR1rvV2+Xrzlwo+AZZKXextsDl45EaLljzopgjD"}], "swift_keys": [ { "user": "rados:swift", "secret_key": ""}], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "placement_tags": []} Thanks, Tim Timothy Snider Strategic Planning & Architecture - Advanced Development NetApp 316-636-8736 Direct Phone 316-213-0223 Mobile Phone tim.sni...@netapp.com netapp.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Can't mount CephFS - where to start troubleshooting?
Hi, I probably did something wrong setting up my cluster with 0.67.3. I previously built a cluster with 0.61 and everything went well, even after an upgrade to 0.67.3. Now I built a fresh 0.67.3 cluster and when I try to mount CephFS: aaron@seven ~ $ sudo mount -t ceph 10.42.6.21:/ /mnt/ceph mount error 95 = Operation not supported Nothing new shows in dmesg / syslog after this attempt, and I don't see anything telling in the mds or mon logs. Any pointers on where to look? I have three mons, three mds (2 standby), and four osds. I'm just doing more testing/learning Ceph, so if I did something wrong data loss is not a problem. Thanks! -Aaron ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSD: Newbie question regarding ceph-deploy odd create
Hi, I'm trying to setup my first cluster, (have never manually bootstrapped a cluster) Is ceph-deploy odd activate/prepare supposed to write to the master ceph.conf file, specific entries for each OSD along the lines of http://ceph.com/docs/master/rados/configuration/osd-config-ref/ ? I appear to have the OSDs prepared without error, but then.. no OSD entries in master cepf.conf nor node /etc/cepf.conf Am I missing something? Thanks in advance, Piers Dawson-Damer Tasmania 2013-09-28 06:47:00,471 [ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection with sudo 2013-09-28 06:47:01,205 [ceph_deploy.osd][INFO ] Distro info: Ubuntu 12.04 precise 2013-09-28 06:47:01,205 [ceph_deploy.osd][DEBUG ] Preparing host storage03-vs-e2 disk /dev/sdm journal /dev/mapper/ceph_journal-osd_12 activate True 2013-09-28 06:47:01,206 [storage03-vs-e2][INFO ] Running command: ceph-disk-prepare --cluster ceph -- /dev/sdm /dev/mapper/ceph_journal-osd_12 2013-09-28 06:47:20,247 [storage03-vs-e2][INFO ] Information: Moved requested sector from 4194338 to 4196352 in 2013-09-28 06:47:20,248 [storage03-vs-e2][INFO ] order to align on 2048-sector boundaries. 2013-09-28 06:47:20,248 [storage03-vs-e2][INFO ] Warning: The kernel is still using the old partition table. 2013-09-28 06:47:20,248 [storage03-vs-e2][INFO ] The new table will be used at the next reboot. 2013-09-28 06:47:20,248 [storage03-vs-e2][INFO ] The operation has completed successfully. 2013-09-28 06:47:20,248 [storage03-vs-e2][INFO ] Information: Moved requested sector from 34 to 2048 in 2013-09-28 06:47:20,249 [storage03-vs-e2][INFO ] order to align on 2048-sector boundaries. 2013-09-28 06:47:20,249 [storage03-vs-e2][INFO ] The operation has completed successfully. 2013-09-28 06:47:20,249 [storage03-vs-e2][INFO ] meta-data=/dev/sdm1 isize=2048 agcount=4, agsize=183105343 blks 2013-09-28 06:47:20,250 [storage03-vs-e2][INFO ] = sectsz=512 attr=2, projid32bit=0 2013-09-28 06:47:20,250 [storage03-vs-e2][INFO ] data = bsize=4096 blocks=732421371, imaxpct=5 2013-09-28 06:47:20,250 [storage03-vs-e2][INFO ] = sunit=0 swidth=0 blks 2013-09-28 06:47:20,250 [storage03-vs-e2][INFO ] naming =version 2 bsize=4096 ascii-ci=0 2013-09-28 06:47:20,251 [storage03-vs-e2][INFO ] log =internal log bsize=4096 blocks=357627, version=2 2013-09-28 06:47:20,251 [storage03-vs-e2][INFO ] = sectsz=512 sunit=0 blks, lazy-count=1 2013-09-28 06:47:20,251 [storage03-vs-e2][INFO ] realtime =none extsz=4096 blocks=0, rtextents=0 2013-09-28 06:47:20,251 [storage03-vs-e2][INFO ] The operation has completed successfully. 2013-09-28 06:47:20,252 [storage03-vs-e2][ERROR ] WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data 2013-09-28 06:47:20,266 [storage03-vs-e2][INFO ] Running command: udevadm trigger --subsystem-match=block --action=add 2013-09-28 06:47:20,413 [ceph_deploy.osd][DEBUG ] Host storage03-vs-e2 is now ready for osd use. 2013-09-27 10:13:25,349 [storage03-vs-e2][DEBUG ] status for monitor: mon.storage03-vs-e2 2013-09-27 10:13:25,349 [storage03-vs-e2][DEBUG ] { "name": "storage03-vs-e2", 2013-09-27 10:13:25,350 [storage03-vs-e2][DEBUG ] "rank": 2, 2013-09-27 10:13:25,350 [storage03-vs-e2][DEBUG ] "state": "electing", 2013-09-27 10:13:25,350 [storage03-vs-e2][DEBUG ] "election_epoch": 1, 2013-09-27 10:13:25,351 [storage03-vs-e2][DEBUG ] "quorum": [], 2013-09-27 10:13:25,351 [storage03-vs-e2][DEBUG ] "outside_quorum": [], 2013-09-27 10:13:25,351 [storage03-vs-e2][DEBUG ] "extra_probe_peers": [ 2013-09-27 10:13:25,351 [storage03-vs-e2][DEBUG ] "172.17.181.47:6789\/0", 2013-09-27 10:13:25,352 [storage03-vs-e2][DEBUG ] "172.17.181.48:6789\/0"], 2013-09-27 10:13:25,352 [storage03-vs-e2][DEBUG ] "sync_provider": [], 2013-09-27 10:13:25,352 [storage03-vs-e2][DEBUG ] "monmap": { "epoch": 0, 2013-09-27 10:13:25,352 [storage03-vs-e2][DEBUG ] "fsid": "28626c0a-0266-4b80-8c06-0562bf48b793", 2013-09-27 10:13:25,353 [storage03-vs-e2][DEBUG ] "modified": "0.00", 2013-09-27 10:13:25,353 [storage03-vs-e2][DEBUG ] "created": "0.00", 2013-09-27 10:13:25,353 [storage03-vs-e2][DEBUG ] "mons": [ 2013-09-27 10:13:25,353 [storage03-vs-e2][DEBUG ] { "rank": 0, 2013-09-27 10:13:25,354 [storage03-vs-e2][DEBUG ] "name": "storage01-vs-e2", 2013-09-27 10:13:25,354 [storage03-vs-e2][DEBUG ] "addr": "172.17.181.47:6789\/0"}, 2013-09-27 10:13:25,354 [storage03-vs-e2][DEBUG ] { "rank": 1, 2013-09-27 10:13:25,354 [storage03-vs-e2][DEBUG ] "name": "storage02-vs-e2", 2013-09-27 10:13:25,355 [storage03-vs-e2][DEBUG ] "addr": "172.17.181.48:6789\/0"}, 2013-09-27 10:13:25,355 [storag
Re: [ceph-users] RBD Snap removal priority
Hi Mike, Thanks for the info. I had seem some of the previous reports of reduced performance during various recovery tasks (and certainly experienced them) but you summarized them all quite nicely. Yes, I'm running XFS on the OSDs. I checked fragmentation on a few of my OSDs -- all came back ~38% (better than I thought!). - Travis On Fri, Sep 27, 2013 at 2:05 PM, Mike Dawson wrote: > [cc ceph-devel] > > Travis, > > RBD doesn't behave well when Ceph maintainance operations create spindle > contention (i.e. 100% util from iostat). More about that below. > > Do you run XFS under your OSDs? If so, can you check for extent > fragmentation? Should be something like: > > xfs_db -c frag -r /dev/sdb1 > > We recently saw a fragmentation factors of over 80%, with lots of ino's > having hundreds of extents. After 24 hours+ of defrag'ing, we got it under > control, but we're seeing the fragmentation factor grow by ~1.5% daily. We > experienced spindle contention issues even after the defrag. > > > > Sage, Sam, etc, > > I think the real issue is Ceph has several states where it performs what I > would call "maintanance operations" that saturate the underlying storage > without properly yielding to client i/o (which should have a higher > priority). > > I have experienced or seen reports of Ceph maintainance affecting rbd client > i/o in many ways: > > - QEMU/RBD Client I/O Stalls or Halts Due to Spindle Contention from Ceph > Maintainance [1] > - Recovery and/or Backfill Cause QEMU/RBD Reads to Hang [2] > - rbd snap rm (Travis' report below) > > [1] http://tracker.ceph.com/issues/6278 > [2] http://tracker.ceph.com/issues/6333 > > I think this family of issues speak to the need for Ceph to have more > visibility into the underlying storage's limitations (especially spindle > contention) when performing known expensive maintainance operations. > > Thanks, > Mike Dawson > > > On 9/27/2013 12:25 PM, Travis Rhoden wrote: >> >> Hello everyone, >> >> I'm running a Cuttlefish cluster that hosts a lot of RBDs. I recently >> removed a snapshot of a large one (rbd snap rm -- 12TB), and I noticed >> that all of the clients had markedly decreased performance. Looking >> at iostat on the OSD nodes had most disks pegged at 100% util. >> >> I know there are thread priorities that can be set for clients vs >> recovery, but I'm not sure what deleting a snapshot falls under. I >> couldn't really find anything relevant. Is there anything I can tweak >> to lower the priority of such an operation? I didn't need it to >> complete fast, as "rbd snap rm" returns immediately and the actual >> deletion is done asynchronously. I'd be fine with it taking longer at >> a lower priority, but as it stands now it brings my cluster to a crawl >> and is causing issues with several VMs. >> >> I see an "osd snap trim thread timeout" option in the docs -- Is the >> operation occuring here what you would call snap trimming? If so, any >> chance of adding an option for "osd snap trim priority" just like >> there is for osd client op and osd recovery op? >> >> Hope what I am saying makes sense... >> >> - Travis >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Scaling radosgw module
Likely on the radosgw side you are going to see the top consumers be malloc/free/memcpy/memcmp. If you have kernel 3.9 or newer compiled with libunwind, you might get better callgraphs in perf which could be helpful. Mark On 09/27/2013 01:56 PM, Somnath Roy wrote: Yes, I understand that.. I tried with thread pool size of 300 (default 100, I believe). I am in process of running perf on radosgw as well as on osds for profiling. BTW, let me know if any particular ceph component you want me to focus. Thanks & Regards Somnath -Original Message- From: Mark Nelson [mailto:mark.nel...@inktank.com] Sent: Friday, September 27, 2013 11:50 AM To: Somnath Roy Cc: Yehuda Sadeh; ceph-users@lists.ceph.com; Anirban Ray; ceph-de...@vger.kernel.org Subject: Re: [ceph-users] Scaling radosgw module Hi Somnath, With SSDs, you almost certainly are going to be running into bottlenecks on the RGW side... Maybe even fastcgi or apache depending on the machine and how things are configured. Unfortunately this is probably one of the more complex performance optimization scenarios in the Ceph world and is going to require figuring out exactly where things are slowing down. I don't remember if you've done this already, but you could try increasing the number of radosgw threads and try to throw more concurrency at the problem, but other than that it's probably going to come down to profiling, and lots of it. :) Mark On 09/26/2013 07:04 PM, Somnath Roy wrote: Hi Yehuda, With my 3 node cluster (30 OSDs in total, all in ssds), I am getting avg of ~3000 Gets/s from a single swift-bench client hitting single radosgw instance. Put is ~1000/s. BTW, I am not able to generate very big load yet and as the server has ~140G RAM, all the GET requests are served from memory , no disk utilization here. Thanks & Regards Somnath -Original Message- From: Yehuda Sadeh [mailto:yeh...@inktank.com] Sent: Thursday, September 26, 2013 4:48 PM To: Somnath Roy Cc: Mark Nelson; ceph-users@lists.ceph.com; Anirban Ray; ceph-de...@vger.kernel.org Subject: Re: [ceph-users] Scaling radosgw module You specify the relative performance, but what the actual numbers that you're seeing? How many GETs per second, and how many PUTs per second do you see? On Thu, Sep 26, 2013 at 4:00 PM, Somnath Roy wrote: Mark, One more thing, all my test is with rgw cache enabled , disabling the cache the performance is around 3x slower. Thanks & Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy Sent: Thursday, September 26, 2013 3:59 PM To: Mark Nelson Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org; Anirban Ray Subject: RE: [ceph-users] Scaling radosgw module Nope...With one client hitting the radaosgw , the daemon cpu usage is going up till 400-450% i.e taking in avg 4 core..In one client scenario, the server node (having radosgw + osds) cpu usage is ~80% idle and out of the 20% usage bulk is consumed by radosgw. Thanks & Regards Somnath -Original Message- From: Mark Nelson [mailto:mark.nel...@inktank.com] Sent: Thursday, September 26, 2013 3:50 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org; Anirban Ray Subject: Re: [ceph-users] Scaling radosgw module Ah, that's very good to know! And RGW CPU usage you said was low? Mark On 09/26/2013 05:40 PM, Somnath Roy wrote: Mark, I did set up 3 radosgw servers in 3 server nodes and the tested with 3 swift-bench client hitting 3 radosgw in the same time. I saw the aggregated throughput is linearly scaling. But, as an individual radosgw performance is very low we need to put lots of radosgw/apache server combination to get very high throughput. I guess that will be a problem. I will try to do some profiling and share the data. Thanks & Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Mark Nelson Sent: Thursday, September 26, 2013 3:33 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org; Anirban Ray Subject: Re: [ceph-users] Scaling radosgw module It's kind of annoying, but it may be worth setting up a 2nd RGW server and seeing if having two copies of the benchmark going at the same time on two separate RGW servers increases aggregate throughput. Also, it may be worth tracking down latencies with messenger debugging enabled, but I'm afraid I'm pretty bogged down right now and probably wouldn't be able to look at it for a while. :( Mark On 09/26/2013 05:15 PM, Somnath Roy wrote: Hi Mark, FYI, I tried with wip-6286-dumpling release and the results are the same for me. The radosgw throughput is around ~6x slower than the single rados bench output! Any other suggestion ? Thanks & Regards Somnath -Original Message- From: Somnath Roy Sent: Friday, September 20, 2013 4:08 PM To: 'Mark Nelson' Cc: ceph-users@
Re: [ceph-users] Scaling radosgw module
Yes, I understand that.. I tried with thread pool size of 300 (default 100, I believe). I am in process of running perf on radosgw as well as on osds for profiling. BTW, let me know if any particular ceph component you want me to focus. Thanks & Regards Somnath -Original Message- From: Mark Nelson [mailto:mark.nel...@inktank.com] Sent: Friday, September 27, 2013 11:50 AM To: Somnath Roy Cc: Yehuda Sadeh; ceph-users@lists.ceph.com; Anirban Ray; ceph-de...@vger.kernel.org Subject: Re: [ceph-users] Scaling radosgw module Hi Somnath, With SSDs, you almost certainly are going to be running into bottlenecks on the RGW side... Maybe even fastcgi or apache depending on the machine and how things are configured. Unfortunately this is probably one of the more complex performance optimization scenarios in the Ceph world and is going to require figuring out exactly where things are slowing down. I don't remember if you've done this already, but you could try increasing the number of radosgw threads and try to throw more concurrency at the problem, but other than that it's probably going to come down to profiling, and lots of it. :) Mark On 09/26/2013 07:04 PM, Somnath Roy wrote: > Hi Yehuda, > With my 3 node cluster (30 OSDs in total, all in ssds), I am getting avg of > ~3000 Gets/s from a single swift-bench client hitting single radosgw > instance. Put is ~1000/s. BTW, I am not able to generate very big load yet > and as the server has ~140G RAM, all the GET requests are served from memory > , no disk utilization here. > > Thanks & Regards > Somnath > > -Original Message- > From: Yehuda Sadeh [mailto:yeh...@inktank.com] > Sent: Thursday, September 26, 2013 4:48 PM > To: Somnath Roy > Cc: Mark Nelson; ceph-users@lists.ceph.com; Anirban Ray; > ceph-de...@vger.kernel.org > Subject: Re: [ceph-users] Scaling radosgw module > > You specify the relative performance, but what the actual numbers that you're > seeing? How many GETs per second, and how many PUTs per second do you see? > > On Thu, Sep 26, 2013 at 4:00 PM, Somnath Roy wrote: >> Mark, >> One more thing, all my test is with rgw cache enabled , disabling the cache >> the performance is around 3x slower. >> >> Thanks & Regards >> Somnath >> >> -Original Message- >> From: ceph-devel-ow...@vger.kernel.org >> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy >> Sent: Thursday, September 26, 2013 3:59 PM >> To: Mark Nelson >> Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org; Anirban >> Ray >> Subject: RE: [ceph-users] Scaling radosgw module >> >> Nope...With one client hitting the radaosgw , the daemon cpu usage is going >> up till 400-450% i.e taking in avg 4 core..In one client scenario, the >> server node (having radosgw + osds) cpu usage is ~80% idle and out of the >> 20% usage bulk is consumed by radosgw. >> >> Thanks & Regards >> Somnath >> >> -Original Message- >> From: Mark Nelson [mailto:mark.nel...@inktank.com] >> Sent: Thursday, September 26, 2013 3:50 PM >> To: Somnath Roy >> Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org; Anirban >> Ray >> Subject: Re: [ceph-users] Scaling radosgw module >> >> Ah, that's very good to know! >> >> And RGW CPU usage you said was low? >> >> Mark >> >> On 09/26/2013 05:40 PM, Somnath Roy wrote: >>> Mark, >>> I did set up 3 radosgw servers in 3 server nodes and the tested with 3 >>> swift-bench client hitting 3 radosgw in the same time. I saw the aggregated >>> throughput is linearly scaling. But, as an individual radosgw performance >>> is very low we need to put lots of radosgw/apache server combination to get >>> very high throughput. I guess that will be a problem. >>> I will try to do some profiling and share the data. >>> >>> Thanks & Regards >>> Somnath >>> >>> -Original Message- >>> From: ceph-devel-ow...@vger.kernel.org >>> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Mark Nelson >>> Sent: Thursday, September 26, 2013 3:33 PM >>> To: Somnath Roy >>> Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org; Anirban >>> Ray >>> Subject: Re: [ceph-users] Scaling radosgw module >>> >>> It's kind of annoying, but it may be worth setting up a 2nd RGW server and >>> seeing if having two copies of the benchmark going at the same time on two >>> separate RGW servers increases aggregate throughput. >>> >>> Also, it may be worth tracking down latencies with messenger >>> debugging enabled, but I'm afraid I'm pretty bogged down right now >>> and probably wouldn't be able to look at it for a while. :( >>> >>> Mark >>> >>> On 09/26/2013 05:15 PM, Somnath Roy wrote: Hi Mark, FYI, I tried with wip-6286-dumpling release and the results are the same for me. The radosgw throughput is around ~6x slower than the single rados bench output! Any other suggestion ? Thanks & Regards Somnath -Original Message- From: Somnath Roy Sent: Friday, September
Re: [ceph-users] Scaling radosgw module
Hi Somnath, With SSDs, you almost certainly are going to be running into bottlenecks on the RGW side... Maybe even fastcgi or apache depending on the machine and how things are configured. Unfortunately this is probably one of the more complex performance optimization scenarios in the Ceph world and is going to require figuring out exactly where things are slowing down. I don't remember if you've done this already, but you could try increasing the number of radosgw threads and try to throw more concurrency at the problem, but other than that it's probably going to come down to profiling, and lots of it. :) Mark On 09/26/2013 07:04 PM, Somnath Roy wrote: Hi Yehuda, With my 3 node cluster (30 OSDs in total, all in ssds), I am getting avg of ~3000 Gets/s from a single swift-bench client hitting single radosgw instance. Put is ~1000/s. BTW, I am not able to generate very big load yet and as the server has ~140G RAM, all the GET requests are served from memory , no disk utilization here. Thanks & Regards Somnath -Original Message- From: Yehuda Sadeh [mailto:yeh...@inktank.com] Sent: Thursday, September 26, 2013 4:48 PM To: Somnath Roy Cc: Mark Nelson; ceph-users@lists.ceph.com; Anirban Ray; ceph-de...@vger.kernel.org Subject: Re: [ceph-users] Scaling radosgw module You specify the relative performance, but what the actual numbers that you're seeing? How many GETs per second, and how many PUTs per second do you see? On Thu, Sep 26, 2013 at 4:00 PM, Somnath Roy wrote: Mark, One more thing, all my test is with rgw cache enabled , disabling the cache the performance is around 3x slower. Thanks & Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy Sent: Thursday, September 26, 2013 3:59 PM To: Mark Nelson Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org; Anirban Ray Subject: RE: [ceph-users] Scaling radosgw module Nope...With one client hitting the radaosgw , the daemon cpu usage is going up till 400-450% i.e taking in avg 4 core..In one client scenario, the server node (having radosgw + osds) cpu usage is ~80% idle and out of the 20% usage bulk is consumed by radosgw. Thanks & Regards Somnath -Original Message- From: Mark Nelson [mailto:mark.nel...@inktank.com] Sent: Thursday, September 26, 2013 3:50 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org; Anirban Ray Subject: Re: [ceph-users] Scaling radosgw module Ah, that's very good to know! And RGW CPU usage you said was low? Mark On 09/26/2013 05:40 PM, Somnath Roy wrote: Mark, I did set up 3 radosgw servers in 3 server nodes and the tested with 3 swift-bench client hitting 3 radosgw in the same time. I saw the aggregated throughput is linearly scaling. But, as an individual radosgw performance is very low we need to put lots of radosgw/apache server combination to get very high throughput. I guess that will be a problem. I will try to do some profiling and share the data. Thanks & Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Mark Nelson Sent: Thursday, September 26, 2013 3:33 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org; Anirban Ray Subject: Re: [ceph-users] Scaling radosgw module It's kind of annoying, but it may be worth setting up a 2nd RGW server and seeing if having two copies of the benchmark going at the same time on two separate RGW servers increases aggregate throughput. Also, it may be worth tracking down latencies with messenger debugging enabled, but I'm afraid I'm pretty bogged down right now and probably wouldn't be able to look at it for a while. :( Mark On 09/26/2013 05:15 PM, Somnath Roy wrote: Hi Mark, FYI, I tried with wip-6286-dumpling release and the results are the same for me. The radosgw throughput is around ~6x slower than the single rados bench output! Any other suggestion ? Thanks & Regards Somnath -Original Message- From: Somnath Roy Sent: Friday, September 20, 2013 4:08 PM To: 'Mark Nelson' Cc: ceph-users@lists.ceph.com Subject: RE: [ceph-users] Scaling radosgw module Hi Mark, It's a test cluster and I will try with the new release. As I mentioned in the mail, I think number of rados client instance is the limitation. Could you please let me know how many rados client instance the radosgw daemon is instantiating ? Is it configurable somehow ? Thanks & Regards Somnath -Original Message- From: Mark Nelson [mailto:mark.nel...@inktank.com] Sent: Friday, September 20, 2013 4:02 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Scaling radosgw module On 09/20/2013 05:49 PM, Somnath Roy wrote: Hi Mark, Thanks for your quick response. I tried adding the 'num_container = 100' in the job file and found that the performance actually decreasing with that option. I am get
Re: [ceph-users] RBD Snap removal priority
[cc ceph-devel] Travis, RBD doesn't behave well when Ceph maintainance operations create spindle contention (i.e. 100% util from iostat). More about that below. Do you run XFS under your OSDs? If so, can you check for extent fragmentation? Should be something like: xfs_db -c frag -r /dev/sdb1 We recently saw a fragmentation factors of over 80%, with lots of ino's having hundreds of extents. After 24 hours+ of defrag'ing, we got it under control, but we're seeing the fragmentation factor grow by ~1.5% daily. We experienced spindle contention issues even after the defrag. Sage, Sam, etc, I think the real issue is Ceph has several states where it performs what I would call "maintanance operations" that saturate the underlying storage without properly yielding to client i/o (which should have a higher priority). I have experienced or seen reports of Ceph maintainance affecting rbd client i/o in many ways: - QEMU/RBD Client I/O Stalls or Halts Due to Spindle Contention from Ceph Maintainance [1] - Recovery and/or Backfill Cause QEMU/RBD Reads to Hang [2] - rbd snap rm (Travis' report below) [1] http://tracker.ceph.com/issues/6278 [2] http://tracker.ceph.com/issues/6333 I think this family of issues speak to the need for Ceph to have more visibility into the underlying storage's limitations (especially spindle contention) when performing known expensive maintainance operations. Thanks, Mike Dawson On 9/27/2013 12:25 PM, Travis Rhoden wrote: Hello everyone, I'm running a Cuttlefish cluster that hosts a lot of RBDs. I recently removed a snapshot of a large one (rbd snap rm -- 12TB), and I noticed that all of the clients had markedly decreased performance. Looking at iostat on the OSD nodes had most disks pegged at 100% util. I know there are thread priorities that can be set for clients vs recovery, but I'm not sure what deleting a snapshot falls under. I couldn't really find anything relevant. Is there anything I can tweak to lower the priority of such an operation? I didn't need it to complete fast, as "rbd snap rm" returns immediately and the actual deletion is done asynchronously. I'd be fine with it taking longer at a lower priority, but as it stands now it brings my cluster to a crawl and is causing issues with several VMs. I see an "osd snap trim thread timeout" option in the docs -- Is the operation occuring here what you would call snap trimming? If so, any chance of adding an option for "osd snap trim priority" just like there is for osd client op and osd recovery op? Hope what I am saying makes sense... - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] gateway instance
On Fri, Sep 27, 2013 at 1:10 AM, lixuehui wrote: > Hi all > Does gateway instances mean multi-process of a gateway user for a ceph > cluster. Though they were configured independently during the configure > file,can they configured with zones among different region? Not sure I follow your question, but basically each gateway process (or 'instance') can be set to control different zone. Moreover, a single instance cannot control more than one zone. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] performance and disk usage of snapshots
Hi Corin! On 09/24/2013 11:37 AM, Corin Langosch wrote: Hi there, do snapshots have an impact on write performance? I assume on each write all snapshots have to get updated (cow) so the more snapshots exist the worse write performance will get? I'll be honest, I haven't tested it so I'm not sure how much impact there actually is. If you are really interested and wouldn't mind doing some testing, I would love to see the results! Is there any way to see how much disk space a snapshot occupies? I assume because of cow snapshots start with 0 real disk usage and grow over time as the underlying object changes? I'm not an expert here, but does rbd ls -l help? See: http://linux.die.net/man/8/rbd Corin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] failure starting radosgw after setting up object storage
On Wed, Sep 25, 2013 at 2:07 PM, Gruher, Joseph R wrote: > Hi all- > > > > I am following the object storage quick start guide. I have a cluster with > two OSDs and have followed the steps on both. Both are failing to start > radosgw but each in a different manner. All the previous steps in the quick > start guide appeared to complete successfully. Any tips on how to debug > from here? Thanks! > > > > > > OSD1: > > > > ceph@cephtest05:/etc/ceph$ sudo /etc/init.d/radosgw start > > ceph@cephtest05:/etc/ceph$ > > > > ceph@cephtest05:/etc/ceph$ sudo /etc/init.d/radosgw status > > /usr/bin/radosgw is not running. > > ceph@cephtest05:/etc/ceph$ > > > > ceph@cephtest05:/etc/ceph$ cat /var/log/ceph/radosgw.log > > ceph@cephtest05:/etc/ceph$ > > > > > > OSD2: > > > > ceph@cephtest06:/etc/ceph$ sudo /etc/init.d/radosgw start > > Starting client.radosgw.gateway... > > 2013-09-25 14:03:01.235789 7f713d79d780 -1 WARNING: libcurl doesn't support > curl_multi_wait() > > 2013-09-25 14:03:01.235797 7f713d79d780 -1 WARNING: cross zone / region > transfer performance may be affected > > ceph@cephtest06:/etc/ceph$ > > > > ceph@cephtest06:/etc/ceph$ sudo /etc/init.d/radosgw status > > /usr/bin/radosgw is not running. > > ceph@cephtest06:/etc/ceph$ > > > > ceph@cephtest06:/etc/ceph$ cat /var/log/ceph/radosgw.log > > 2013-09-25 14:03:01.235760 7f713d79d780 0 ceph version 0.67.3 > (408cd61584c72c0d97b774b3d8f95c6b1b06341a), process radosgw, pid 13187 > > 2013-09-25 14:03:01.235789 7f713d79d780 -1 WARNING: libcurl doesn't support > curl_multi_wait() > > 2013-09-25 14:03:01.235797 7f713d79d780 -1 WARNING: cross zone / region > transfer performance may be affected > > 2013-09-25 14:03:01.245786 7f713d79d780 0 librados: client.radosgw.gateway > authentication error (1) Operation not permitted > > 2013-09-25 14:03:01.246526 7f713d79d780 -1 Couldn't init storage provider > (RADOS) This means that the radosgw process cannot connect to the cluster due to user / key set up. Make sure that the user for radosgw exists, and that the ceph keyring file (on the radosgw side) has the correct credentials set. Yehuda > > ceph@cephtest06:/etc/ceph$ > > > > > > For reference, I think cluster health is OK: > > > > ceph@cephtest06:/etc/ceph$ sudo ceph status > > cluster a45e6e54-70ef-4470-91db-2152965deec5 > >health HEALTH_WARN clock skew detected on mon.cephtest03, mon.cephtest04 > >monmap e1: 3 mons at > {cephtest02=10.0.0.2:6789/0,cephtest03=10.0.0.3:6789/0,cephtest04=10.0.0.4:6789/0}, > election epoch 6, quorum 0,1,2 cephtest02,cephtest03,cephtest04 > >osdmap e9: 2 osds: 2 up, 2 in > > pgmap v439: 192 pgs: 192 active+clean; 0 bytes data, 72548 KB used, 1998 > GB / 1999 GB avail > >mdsmap e1: 0/0/1 up > > > > ceph@cephtest06:/etc/ceph$ sudo ceph health > > HEALTH_WARN clock skew detected on mon.cephtest03, mon.cephtest04 > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RBD Snap removal priority
Hello everyone, I'm running a Cuttlefish cluster that hosts a lot of RBDs. I recently removed a snapshot of a large one (rbd snap rm -- 12TB), and I noticed that all of the clients had markedly decreased performance. Looking at iostat on the OSD nodes had most disks pegged at 100% util. I know there are thread priorities that can be set for clients vs recovery, but I'm not sure what deleting a snapshot falls under. I couldn't really find anything relevant. Is there anything I can tweak to lower the priority of such an operation? I didn't need it to complete fast, as "rbd snap rm" returns immediately and the actual deletion is done asynchronously. I'd be fine with it taking longer at a lower priority, but as it stands now it brings my cluster to a crawl and is causing issues with several VMs. I see an "osd snap trim thread timeout" option in the docs -- Is the operation occuring here what you would call snap trimming? If so, any chance of adding an option for "osd snap trim priority" just like there is for osd client op and osd recovery op? Hope what I am saying makes sense... - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS Pool Specification?
I see it's the undocumented "ceph.dir.layout.pool" Something like: setfattr -n ceph.dir.layout.pool -v mynewpool On an empty dir should work. I'd like one directory to be more heavily mirrored so that a) objects are more likely to be on a less busy server b) availability increases (at the expense of size/write speed). -Original Message- From: Gregory Farnum [mailto:g...@inktank.com] Sent: Friday, September 27, 2013 11:14 AM To: Aronesty, Erik Cc: Aaron Ten Clay; Sage Weil; ceph-users@lists.ceph.com Subject: Re: [ceph-users] CephFS Pool Specification? On Fri, Sep 27, 2013 at 7:10 AM, Aronesty, Erik wrote: > Ø You can also create additional data pools and map directories to them, > but > > > this probably isn't what you need (yet). > > Is there a link to a web page where you can read how to map a directory to a > pool? (I googled ceph map directory to pool . and got this post) Nothing official at this point. Sébastien wrote a short blog about it earlier this year that will give you the basics: http://www.sebastien-han.fr/blog/2013/02/11/mount-a-specific-pool-with-cephfs/ But at this point it's easier to use the virtual xattrs at "ceph.dir.layout", as shown in this ticket: http://tracker.ceph.com/issues/4215 -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS Pool Specification?
On Fri, Sep 27, 2013 at 7:10 AM, Aronesty, Erik wrote: > Ø You can also create additional data pools and map directories to them, > but > > > this probably isn't what you need (yet). > > Is there a link to a web page where you can read how to map a directory to a > pool? (I googled ceph map directory to pool … and got this post) Nothing official at this point. Sébastien wrote a short blog about it earlier this year that will give you the basics: http://www.sebastien-han.fr/blog/2013/02/11/mount-a-specific-pool-with-cephfs/ But at this point it's easier to use the virtual xattrs at "ceph.dir.layout", as shown in this ticket: http://tracker.ceph.com/issues/4215 -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS Pool Specification?
Ø You can also create additional data pools and map directories to them, but this probably isn't what you need (yet). Is there a link to a web page where you can read how to map a directory to a pool? (I googled ceph map directory to pool ... and got this post) From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Aaron Ten Clay Sent: Thursday, September 26, 2013 5:15 PM To: Sage Weil Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] CephFS Pool Specification? On Wed, Sep 25, 2013 at 8:44 PM, Sage Weil mailto:s...@inktank.com>> wrote: On Wed, 25 Sep 2013, Aaron Ten Clay wrote: > Hi all, > > Does anyone know how to specify which pool the mds and CephFS data will be > stored in? > > After creating a new cluster, the pools "data", "metadata", and "rbd" all > exist but with pg count too small to be useful. The documentation indicates > the pg count can be set only at pool creation time, This is no longer true. Can you tell us where you read it so we can fix the documentation? ceph osd pool set data pg_num 1234 ceph osd pool set data pgp_num 1234 Repeat for metadata and/or rbd with an appropriate pg count. Thanks! Maybe I just misinterpreted the documentation. The page http://ceph.com/docs/master/rados/operations/placement-groups/ implies (to me, anyway) that the number of placement groups can't be changed once a pool is created. Under the "Set Pool Values" heading, pg_num isn't listed as an option. > so I am working under the assumption I must create a new pool with a > larger pg count and use that for CephFS and the mds storage. You can also create additional data pools and map directories to them, but this probably isn't what you need (yet). sage -Aaron ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy issues on RHEL6.4
On Fri, Sep 27, 2013 at 3:30 AM, Guang wrote: > Hi ceph-users, > I recently deployed a ceph cluster with use of *ceph-deploy* utility, on > RHEL6.4, during the time, I came across a couple of issues / questions which > I would like to ask for your help. > > 1. ceph-deploy does not help to install dependencies (snappy leveldb gdisk > python-argparse gperftools-libs) on the target host, so I will need to > manually install those dependencies before performing 'ceph-deploy install > {host_name}'. I am investigate the way to deploy ceph onto a hundred nodes > and it is time-consuming to manually install those dependencies manually. Am > I missing something here? I am thinking the dependency installation should > be handled by *ceph-deploy* itself. > > 2. When performing 'ceph-deploy -v disk zap ceph.host.name:/dev/sdb', I have > the following errors: > [ceph_deploy.osd][DEBUG ] zapping /dev/sdc on ceph.host.name > [ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection with sudo > Traceback (most recent call last): > File "/usr/bin/ceph-deploy", line 21, in >sys.exit(main()) > File "/usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py", > line 83, in newfunc >return f(*a, **kw) > File "/usr/lib/python2.6/site-packages/ceph_deploy/cli.py", line 147, in > main >return args.func(args) > File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 381, in > disk >disk_zap(args) > File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 317, in > disk_zap >zap_r(disk) > File "/usr/lib/python2.6/site-packages/pushy/protocol/proxy.py", line 255, > in >(conn.operator(type_, self, args, kwargs)) > File "/usr/lib/python2.6/site-packages/pushy/protocol/connection.py", line > 66, in operator >return self.send_request(type_, (object, args, kwargs)) > File "/usr/lib/python2.6/site-packages/pushy/protocol/baseconnection.py", > line 329, in send_request >return self.__handle(m) > File "/usr/lib/python2.6/site-packages/pushy/protocol/baseconnection.py", > line 645, in __handle >raise e > pushy.protocol.proxy.ExceptionProxy: [Errno 2] No such file or directory ceph-deploy should handle better this specific problem, you've hit the issue where $PATH is not enabled for a certain user over ssh, so it will not be able to execute commands because it can't find the executables. A temporary workaround is to set the $PATH for all users explicitly until we fix this issue (I opened: http://tracker.ceph.com/issues/6428 to track this). > > And then I logon to the host to perform 'ceph-disk zap /dev/sdb' and it can > be successful without any issues. > > 3. When performing 'ceph-deploy -v disk activate ceph.host.name:/dev/sdb', > I have the following errors: > ceph_deploy.osd][DEBUG ] Activating cluster ceph disks > ceph.host.name:/dev/sdb: > [ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection with sudo > [ceph_deploy.osd][DEBUG ] Activating host ceph.host.name disk /dev/sdb > [ceph_deploy.osd][DEBUG ] Distro RedHatEnterpriseServer codename Santiago, > will use sysvinit > Traceback (most recent call last): > File "/usr/bin/ceph-deploy", line 21, in >sys.exit(main()) > File "/usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py", > line 83, in newfunc >return f(*a, **kw) > File "/usr/lib/python2.6/site-packages/ceph_deploy/cli.py", line 147, in > main >return args.func(args) > File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 379, in > disk >activate(args, cfg) > File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 271, in > activate >cmd=cmd, ret=ret, out=out, err=err) > NameError: global name 'ret' is not defined Ah good find, somehow this error went pass our checks, I just opened an issue to fix this asap: http://tracker.ceph.com/issues/6427 > > Also, I logon to the host to perform 'ceph-disk activate /dev/sdb' and it is > good. > > Any help is appreciated. > > Thanks, > Guang > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG distribution scattered
Sorry for replying only now, I did not get to try it earlier… On Thu, 19 Sep 2013 08:43:11 -0500, Mark Nelson wrote: On 09/19/2013 08:36 AM, Niklas Goerke wrote: […] My Setup: * Two Hosts with 45 Disks each --> 90 OSDs * Only one newly created pool with 4500 PGs and a Replica Size of 2 --> should be about 100 PGs per OSD What I found was that one OSD only had 72 PGs, while another had 123 PGs [1]. That means that - if I did the math correctly - I can only fill the cluster to about 81%, because thats when the first OSD is completely full[2]. Does distribution improve if you make a pool with significantly more PGs? Yes it does. I tried 45000 PGs and got a range of minimum 922 to a maximum of 1066 PGs per OSD (average is 1000). This is better, I can now fill my cluster up to 93,8% (theoretically) but I still don't get why I would want to limit myself to that. Also 1000 PGs are was to many for one OSD (I think 100 is suggested). What should I do about this? I did some experimenting and found, that if I add another pool with 4500 PGs, each OSD will have exacly doubled the amount of PGs as with one pool. So this is not an accident (tried it multiple times). On another test-cluster with 4 Hosts and 15 Disks each, the Distribution was similarly worse. This is a bug that causes each pool to more or less be distributed the same way on the same hosts. We have a fix, but it impacts backwards compatibility so it's off by default. If you set: osd pool default flag hashpspool = true Theoretically that will cause different pools to be distributed more randomly. I did not try this, becuase in my production scenario we will probably only have one or two very large pools, so it does not matter all that much to me. […] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy issues on RHEL6.4
Dnia 2013-09-27, o godz. 15:30:21 Guang napisał(a): > Hi ceph-users, > I recently deployed a ceph cluster with use of *ceph-deploy* utility, > on RHEL6.4, during the time, I came across a couple of issues / > questions which I would like to ask for your help. > > 1. ceph-deploy does not help to install dependencies (snappy leveldb > gdisk python-argparse gperftools-libs) on the target host, so I will > need to manually install those dependencies before performing > 'ceph-deploy install {host_name}'. I am investigate the way to deploy > ceph onto a hundred nodes and it is time-consuming to manually > install those dependencies manually. Am I missing something here? I > am thinking the dependency installation should be handled by > *ceph-deploy* itself. You might want to use some kind of configuration management system, like Puppet, for that. It is not ceph specific (so you can use it for everything) and there are modules to manage ceph. It *is* harder to start with than just doing ceph-deploy, but if you would want to install anything more than only ceph on nodes it is very useful. Of course, nothing stops you from just using puppet to install deps and ceph-deploy for all ceph related stuff Cheers XANi signature.asc Description: PGP signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] gateway instance
Hi all Does gateway instances mean multi-process of a gateway user for a ceph cluster. Though they were configured independently during the configure file,can they configured with zones among different region? lixuehui___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-deploy issues on RHEL6.4
Hi ceph-users, I recently deployed a ceph cluster with use of *ceph-deploy* utility, on RHEL6.4, during the time, I came across a couple of issues / questions which I would like to ask for your help. 1. ceph-deploy does not help to install dependencies (snappy leveldb gdisk python-argparse gperftools-libs) on the target host, so I will need to manually install those dependencies before performing 'ceph-deploy install {host_name}'. I am investigate the way to deploy ceph onto a hundred nodes and it is time-consuming to manually install those dependencies manually. Am I missing something here? I am thinking the dependency installation should be handled by *ceph-deploy* itself. 2. When performing 'ceph-deploy -v disk zap ceph.host.name:/dev/sdb', I have the following errors: [ceph_deploy.osd][DEBUG ] zapping /dev/sdc on ceph.host.name [ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection with sudo Traceback (most recent call last): File "/usr/bin/ceph-deploy", line 21, in sys.exit(main()) File "/usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py", line 83, in newfunc return f(*a, **kw) File "/usr/lib/python2.6/site-packages/ceph_deploy/cli.py", line 147, in main return args.func(args) File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 381, in disk disk_zap(args) File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 317, in disk_zap zap_r(disk) File "/usr/lib/python2.6/site-packages/pushy/protocol/proxy.py", line 255, in (conn.operator(type_, self, args, kwargs)) File "/usr/lib/python2.6/site-packages/pushy/protocol/connection.py", line 66, in operator return self.send_request(type_, (object, args, kwargs)) File "/usr/lib/python2.6/site-packages/pushy/protocol/baseconnection.py", line 329, in send_request return self.__handle(m) File "/usr/lib/python2.6/site-packages/pushy/protocol/baseconnection.py", line 645, in __handle raise e pushy.protocol.proxy.ExceptionProxy: [Errno 2] No such file or directory And then I logon to the host to perform 'ceph-disk zap /dev/sdb' and it can be successful without any issues. 3. When performing 'ceph-deploy -v disk activate ceph.host.name:/dev/sdb', I have the following errors: ceph_deploy.osd][DEBUG ] Activating cluster ceph disks ceph.host.name:/dev/sdb: [ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection with sudo [ceph_deploy.osd][DEBUG ] Activating host ceph.host.name disk /dev/sdb [ceph_deploy.osd][DEBUG ] Distro RedHatEnterpriseServer codename Santiago, will use sysvinit Traceback (most recent call last): File "/usr/bin/ceph-deploy", line 21, in sys.exit(main()) File "/usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py", line 83, in newfunc return f(*a, **kw) File "/usr/lib/python2.6/site-packages/ceph_deploy/cli.py", line 147, in main return args.func(args) File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 379, in disk activate(args, cfg) File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 271, in activate cmd=cmd, ret=ret, out=out, err=err) NameError: global name 'ret' is not defined Also, I logon to the host to perform 'ceph-disk activate /dev/sdb' and it is good. Any help is appreciated. Thanks, Guang___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com