Re: [ceph-users] osds fails to start with mismatch in id
Hi It appears that in case of pre-created partitions, ceph-deploy create, unable to change the partition guid’s. The parted guid remains as it is. Ran manually sgdisk on each partition as sgdisk --change-name=2:ceph data --partition-guid=2:${osd_uuid} --typecode=2:${ptype2} /dev/${i}. The typecode for journal and data picked up from ceph-disk-udev. Udev working fine now after reboot and not required to make any changes in fstab. All osd’s are up too. ceph -s cluster 9c6cd1ae-66bf-45ce-b7ba-0256b572a8b7 health HEALTH_OK osdmap e358: 60 osds: 60 up, 60 in pgmap v1258: 4096 pgs, 1 pools, 0 bytes data, 0 objects 2802 MB used, 217 TB / 217 TB avail 4096 active+clean Thanks to all who responded. Regards, Rama From: Daniel Schwager [mailto:daniel.schwa...@dtnet.de] Sent: Monday, November 10, 2014 10:39 PM To: 'Irek Fasikhov'; Ramakrishna Nishtala (rnishtal); 'Gregory Farnum' Cc: 'ceph-us...@ceph.com' Subject: RE: [ceph-users] osds fails to start with mismatch in id Hi Ramakrishna, we use the phy. path (containing the serial number) to a disk to prevent complexity and wrong mapping... This path will never change: /etc/ceph/ceph.conf [osd.16] devs = /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z0SDCY-part1 osd_journal = /dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1 ... regards Danny From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Irek Fasikhov Sent: Tuesday, November 11, 2014 6:36 AM To: Ramakrishna Nishtala (rnishtal); Gregory Farnum Cc: ceph-us...@ceph.commailto:ceph-us...@ceph.com Subject: Re: [ceph-users] osds fails to start with mismatch in id Hi, Ramakrishna. I think you understand what the problem is: [ceph@ceph05 ~]$ cat /var/lib/ceph/osd/ceph-56/whoami 56 [ceph@ceph05 ~]$ cat /var/lib/ceph/osd/ceph-57/whoami 57 Tue Nov 11 2014 at 6:01:40, Ramakrishna Nishtala (rnishtal) rnish...@cisco.commailto:rnish...@cisco.com: Hi Greg, Thanks for the pointer. I think you are right. The full story is like this. After installation, everything works fine until I reboot. I do observe udevadm getting triggered in logs, but the devices do not come up after reboot. Exact issue as http://tracker.ceph.com/issues/5194. But this has been fixed a while back per the case details. As a workaround, I copied the contents from /proc/mounts to fstab and that’s where I landed into the issue. After your suggestion, defined as UUID in fstab, but similar problem. blkid.tab now moved to tmpfs and also isn’t consistent ever after issuing blkid explicitly to get the UUID’s. Goes in line with ceph-disk comments. Decided to reinstall, dd the partitions, zapdisks etc. Did not help. Very weird that links below change in /dev/disk/by-uuid and /dev/disk/by-partuuid etc. Before reboot lrwxrwxrwx 1 root root 10 Nov 10 06:31 11aca3e2-a9d5-4bcc-a5b0-441c53d473b6 - ../../sdd2 lrwxrwxrwx 1 root root 10 Nov 10 06:31 89594989-90cb-4144-ac99-0ffd6a04146e - ../../sde2 lrwxrwxrwx 1 root root 10 Nov 10 06:31 c17fe791-5525-4b09-92c4-f90eaaf80dc6 - ../../sda2 lrwxrwxrwx 1 root root 10 Nov 10 06:31 c57541a1-6820-44a8-943f-94d68b4b03d4 - ../../sdc2 lrwxrwxrwx 1 root root 10 Nov 10 06:31 da7030dd-712e-45e4-8d89-6e795d9f8011 - ../../sdb2 After reboot lrwxrwxrwx 1 root root 10 Nov 10 09:50 11aca3e2-a9d5-4bcc-a5b0-441c53d473b6 - ../../sdd2 lrwxrwxrwx 1 root root 10 Nov 10 09:50 89594989-90cb-4144-ac99-0ffd6a04146e - ../../sde2 lrwxrwxrwx 1 root root 10 Nov 10 09:50 c17fe791-5525-4b09-92c4-f90eaaf80dc6 - ../../sda2 lrwxrwxrwx 1 root root 10 Nov 10 09:50 c57541a1-6820-44a8-943f-94d68b4b03d4 - ../../sdb2 lrwxrwxrwx 1 root root 10 Nov 10 09:50 da7030dd-712e-45e4-8d89-6e795d9f8011 - ../../sdh2 Essentially, the transformation here is sdb2-sdh2 and sdc2- sdb2. In fact I haven’t partitioned my sdh at all before the test. The only difference probably from the standard procedure is I have pre-created the partitions for the journal and data, with parted. /lib/udev/rules.d osd rules has four different partition GUID codes, 45b0969e-9b03-4f30-b4c6-5ec00ceff106, 45b0969e-9b03-4f30-b4c6-b4b80ceff106, 4fbd7e29-9d25-41b8-afd0-062c0ceff05d, 4fbd7e29-9d25-41b8-afd0-5ec00ceff05d, But all my partitions journal/data are having ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 as partition guid code. Appreciate any help. Regards, Rama = -Original Message- From: Gregory Farnum [mailto:g...@gregs42.commailto:g...@gregs42.com] Sent: Sunday, November 09, 2014 3:36 PM To: Ramakrishna Nishtala (rnishtal) Cc: ceph-us...@ceph.commailto:ceph-us...@ceph.com Subject: Re: [ceph-users] osds fails to start with mismatch in id On Sun, Nov 9, 2014 at 3:21 PM, Ramakrishna Nishtala (rnishtal) rnish...@cisco.commailto:rnish...@cisco.com wrote: Hi I am on ceph 0.87, RHEL 7 Out of 60 few osd’s start and the rest complain about mismatch about id’s
Re: [ceph-users] PG's incomplete after OSD failure
Thanks for your reply Sage! I've tested with 8.6ae and no luck I'm afraid. Steps taken were - Stop osd.117 Export 8.6ae from osd.117 Remove 8.6ae from osd.117 start osd.117 restart osd.190 after still showing incomplete After this the PG was still showing incomplete and ceph pg dump_stuck inactive shows - pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 8.6ae 0 0 0 0 0 0 0 0 incomplete 2014-11-11 17:34:27.168078 0'0 161425:40 [117,190] 117 [117,190] 117 86424'389748 2013-09-09 16:52:58.796650 86424'389748 2013-09-09 16:52:58.796650 I then tried an export from OSD 190 to OSD 117 by doing - Stop osd.190 and osd.117 Export pg 8.6ae from osd.190 Import from file generated in previous step into osd.117 Boot both osd.190 and osd.117 When osd.117 attempts to start it generates an failed assert, full log is here http://pastebin.com/S4CXrTAL -1 2014-11-11 17:25:15.130509 7f9f44512900 0 osd.117 161404 load_pgs 0 2014-11-11 17:25:18.604696 7f9f44512900 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f9f44512900 time 2014-11-11 17:25:18.602626 osd/OSD.h: 715: FAILED assert(ret) ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xb8231b] 2: (OSDService::get_map(unsigned int)+0x3f) [0x6eea2f] 3: (OSD::load_pgs()+0x1b78) [0x6aae18] 4: (OSD::init()+0x71f) [0x6abf5f] 5: (main()+0x252c) [0x638cfc] 6: (__libc_start_main()+0xf5) [0x7f9f41650ec5] 7: /usr/bin/ceph-osd() [0x651027] I also attempted the same steps with 8.ca and got the same results. The below is the current state of the pg with it removed from osd.111 - pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 8.ca 2440 0 0 0 0 10219748864 9205 9205 incomplete 2014-11-11 17:39:28.570675 160435'959618 161425:6071759 [190,111] 190 [190,111] 190 86417'207324 2013-09-09 12:58:10.749001 86229'196887 2013-09-02 12:57:58.162789 Any idea of where I can go from here? One thought I had was setting osd.111 and osd.117 out of the cluster and once the data is moved I can shut them down and mark them as lost which would make osd.190 the only replica available for those PG's. Thanks again On Tue, Nov 11, 2014 at 1:10 PM, Sage Weil sw...@redhat.com wrote: On Tue, 11 Nov 2014, Matthew Anderson wrote: Just an update, it appears that no data actually exists for those PG's on osd.117 and osd.111 but it's showing as incomplete anyway. So for the 8.ca PG, osd.111 has only an empty directory but osd 190 is filled with data. For 8.6ae, osd.117 has no data in the pg directory and osd.190 is filled with data as before. Since all of the required data is on OSD.190, would there be a way to make osd.111 and osd.117 forget they have ever seen the two incomplete PG's and therefore restart backfilling? Ah, that's good news. You should know that the copy on osd.190 is slightly out of date, but it is much better than losing the entire contents of the PG. More specifically, for 8.6ae the latest version was 1935986 but the osd.190 is 1935747, about 200 writes in the past. You'll need to fsck the RBD images after this is all done. I don't think we've tested this recovery scenario, but I think you'll be able to recovery with ceph_objectstore_tool, which has an import/export function and a delete function. First, try removing the newer version of the pg on osd.117. First export it for good measure (even tho it's empty): stop the osd ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-117 \ --journal-path /var/lib/ceph/osd/ceph-117/journal \ --op export --pgid 8.6ae --file osd.117.8.7ae ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-117 \ --journal-path /var/lib/ceph/osd/ceph-117/journal \ --op remove --pgid 8.6ae and restart. If that doesn't peer, you can also try exporting the pg from osd.190 and importing it into osd.117. I think just removing the newer empty pg on osd.117 will do the trick, though... sage On Tue, Nov 11, 2014 at 10:37 AM, Matthew Anderson manderson8...@gmail.com wrote: Hi All, We've had a string of very unfortunate failures and need a hand fixing the incomplete PG's that we're now left with. We're configured with 3 replicas over different hosts with 5 in total. The timeline goes - -1 week :: A full server goes offline with a failed backplane. Still not working -1 day :: OSD 190 fails -1 day + 3 minutes :: OSD 121 fails in a different server fails taking out several PG's and blocking IO Today :: The first failed osd (osd.190) was cloned to a good drive with xfs_dump | xfs_restore and now boots fine. The last failed osd (osd.121) is completely unrecoverable and was marked as lost. What we're left with
[ceph-users] Stackforge Puppet Module
Hi, I'm just looking through the different methods of deploying Ceph and I particularly liked the idea that the stackforge puppet module advertises of using discover to automatically add new disks. I understand the principle of how it should work; using ceph-disk list to find unknown disks, but I would like to see in a little more detail on how it's been implemented. I've been looking through the puppet module on Github, but I can't see anyway where this discovery is carried out. Could anyone confirm if this puppet modules does currently support the auto discovery and where in the code its carried out? Many Thanks, Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Weight field in osd dump osd tree
Hi all When Issued ceph osd dump it displays weight for that osd as 1 and when issued osd tree it displays 0.35 output from osd dump: { osd: 20, uuid: b2a97a29-1b8a-43e4-a4b0-fd9ee351086e, up: 1, in: 1, weight: 1.00, primary_affinity: 1.00, last_clean_begin: 0, last_clean_end: 0, up_from: 103, up_thru: 106, down_at: 0, lost_at: 0, public_addr: 10.242.43.116:6820\/27623, cluster_addr: 10.242.43.116:6821\/27623, heartbeat_back_addr: 10.242.43.116:6822\/27623, heartbeat_front_addr: 10.242.43.116:6823\/27623, state: [ exists, up]}], output from osd tree: # idweight type name up/down reweight -1 7.35root default -2 2.8 host rack6-storage-5 0 0.35osd.0 up 1 1 0.35osd.1 up 1 2 0.35osd.2 up 1 3 0.35osd.3 up 1 4 0.35osd.4 up 1 5 0.35osd.5 up 1 6 0.35osd.6 up 1 7 0.35osd.7 up 1 -3 2.8 host rack6-storage-4 8 0.35osd.8 up 1 9 0.35osd.9 up 1 10 0.35osd.10 up 1 11 0.35osd.11 up 1 12 0.35osd.12 up 1 13 0.35osd.13 up 1 14 0.35osd.14 up 1 15 0.35osd.15 up 1 -4 1.75host rack6-storage-6 16 0.35osd.16 up 1 17 0.35osd.17 up 1 18 0.35osd.18 up 1 19 0.35osd.19 up 1 20 0.35osd.20 up 1 Please help me to understand this -regards, Mallikarjun Biradar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Stackforge Puppet Module
Hi Nick, The great thing about puppet-ceph's implementation on Stackforge is that it is both unit and integration tested. You can see the integration tests here: https://github.com/ceph/puppet-ceph/tree/master/spec/system Where I'm getting at is that the tests allow you to see how you can use the module to a certain extent. For example, in the OSD integration tests: - https://github.com/ceph/puppet-ceph/blob/master/spec/system/ceph_osd_spec.rb#L24 and then: - https://github.com/ceph/puppet-ceph/blob/master/spec/system/ceph_osd_spec.rb#L82-L110 There's no auto discovery mechanism built-in the module right now. It's kind of dangerous, you don't want to format the wrong disks. Now, this doesn't mean you can't discover the disks yourself and pass them to the module from your site.pp or from a composition layer. Here's something I have for my CI environment that uses the $::blockdevices fact to discover all devices, split that fact into a list of the devices and then reject the drives I don't want (such as the OS disk): # Assume OS is installed on xvda/sda/vda. # On an Openstack VM, vdb is ephemeral, we don't want to use vdc. # WARNING: ALL OTHER DISKS WILL BE FORMATTED/PARTITIONED BY CEPH! $block_devices = reject(split($::blockdevices, ','), '(xvda|sda|vda|vdc|sr0)') $devices = prefix($block_devices, '/dev/') And then you can pass $devices to the module. Let me know if you have any questions ! -- David Moreau Simard On Nov 11, 2014, at 6:23 AM, Nick Fisk n...@fisk.me.uk wrote: Hi, I'm just looking through the different methods of deploying Ceph and I particularly liked the idea that the stackforge puppet module advertises of using discover to automatically add new disks. I understand the principle of how it should work; using ceph-disk list to find unknown disks, but I would like to see in a little more detail on how it's been implemented. I've been looking through the puppet module on Github, but I can't see anyway where this discovery is carried out. Could anyone confirm if this puppet modules does currently support the auto discovery and where in the code its carried out? Many Thanks, Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Weight field in osd dump osd tree
On Tue, 11 Nov 2014 17:14:49 +0530 Mallikarjun Biradar wrote: Hi all When Issued ceph osd dump it displays weight for that osd as 1 and when issued osd tree it displays 0.35 There are many threads about this, google is your friend. For example: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg11010.html In short, one is the CRUSH weight (usually based on the capacity of the OSD), the other is the OSD weight (or reweight in the tree display). For example think about a cluster with 100 2TB OSDs and you're planning to replace them (bit by bit) with 4TB OSDs. But the hard disks are the same speed, so if you would just replace things, more and more data would migrate to your bigger OSDs, making the whole cluster actually slower. Setting the OSD weight (reweight) to 0.5 for the 4TB OSDs (untiil the replacement is complete) will result in them getting the same allocation as the 2TB ones, keeping things even. Christian output from osd dump: { osd: 20, uuid: b2a97a29-1b8a-43e4-a4b0-fd9ee351086e, up: 1, in: 1, weight: 1.00, primary_affinity: 1.00, last_clean_begin: 0, last_clean_end: 0, up_from: 103, up_thru: 106, down_at: 0, lost_at: 0, public_addr: 10.242.43.116:6820\/27623, cluster_addr: 10.242.43.116:6821\/27623, heartbeat_back_addr: 10.242.43.116:6822\/27623, heartbeat_front_addr: 10.242.43.116:6823\/27623, state: [ exists, up]}], output from osd tree: # idweight type name up/down reweight -1 7.35root default -2 2.8 host rack6-storage-5 0 0.35osd.0 up 1 1 0.35osd.1 up 1 2 0.35osd.2 up 1 3 0.35osd.3 up 1 4 0.35osd.4 up 1 5 0.35osd.5 up 1 6 0.35osd.6 up 1 7 0.35osd.7 up 1 -3 2.8 host rack6-storage-4 8 0.35osd.8 up 1 9 0.35osd.9 up 1 10 0.35osd.10 up 1 11 0.35osd.11 up 1 12 0.35osd.12 up 1 13 0.35osd.13 up 1 14 0.35osd.14 up 1 15 0.35osd.15 up 1 -4 1.75host rack6-storage-6 16 0.35osd.16 up 1 17 0.35osd.17 up 1 18 0.35osd.18 up 1 19 0.35osd.19 up 1 20 0.35osd.20 up 1 Please help me to understand this -regards, Mallikarjun Biradar -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Weight field in osd dump osd tree
Hi Christian, On 11/11/2014 13:09, Christian Balzer wrote: On Tue, 11 Nov 2014 17:14:49 +0530 Mallikarjun Biradar wrote: Hi all When Issued ceph osd dump it displays weight for that osd as 1 and when issued osd tree it displays 0.35 There are many threads about this, google is your friend. For example: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg11010.html In short, one is the CRUSH weight (usually based on the capacity of the OSD), the other is the OSD weight (or reweight in the tree display). For example think about a cluster with 100 2TB OSDs and you're planning to replace them (bit by bit) with 4TB OSDs. But the hard disks are the same speed, so if you would just replace things, more and more data would migrate to your bigger OSDs, making the whole cluster actually slower. Setting the OSD weight (reweight) to 0.5 for the 4TB OSDs (untiil the replacement is complete) will result in them getting the same allocation as the 2TB ones, keeping things even. It is a great example. Would you like to add it to http://ceph.com/docs/giant/rados/operations/control/#osd-subsystem ? If you do not have time, I volunteer to do it :-) Cheers Christian output from osd dump: { osd: 20, uuid: b2a97a29-1b8a-43e4-a4b0-fd9ee351086e, up: 1, in: 1, weight: 1.00, primary_affinity: 1.00, last_clean_begin: 0, last_clean_end: 0, up_from: 103, up_thru: 106, down_at: 0, lost_at: 0, public_addr: 10.242.43.116:6820\/27623, cluster_addr: 10.242.43.116:6821\/27623, heartbeat_back_addr: 10.242.43.116:6822\/27623, heartbeat_front_addr: 10.242.43.116:6823\/27623, state: [ exists, up]}], output from osd tree: # idweight type name up/down reweight -1 7.35root default -2 2.8 host rack6-storage-5 0 0.35osd.0 up 1 1 0.35osd.1 up 1 2 0.35osd.2 up 1 3 0.35osd.3 up 1 4 0.35osd.4 up 1 5 0.35osd.5 up 1 6 0.35osd.6 up 1 7 0.35osd.7 up 1 -3 2.8 host rack6-storage-4 8 0.35osd.8 up 1 9 0.35osd.9 up 1 10 0.35osd.10 up 1 11 0.35osd.11 up 1 12 0.35osd.12 up 1 13 0.35osd.13 up 1 14 0.35osd.14 up 1 15 0.35osd.15 up 1 -4 1.75host rack6-storage-6 16 0.35osd.16 up 1 17 0.35osd.17 up 1 18 0.35osd.18 up 1 19 0.35osd.19 up 1 20 0.35osd.20 up 1 Please help me to understand this -regards, Mallikarjun Biradar -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Configuring swift user for ceph Rados Gateway - 403 Access Denied
Hi, I am having problems accessing rados gateway using swift interface. I am using ceph firefly version and have configured a us region as explained in the docs. There are two zones us-east and us-west. us-east gateway is running on host ceph-node-1 and us-west gateway is running on host ceph-node-2. Here is the output when i try to connect with swift interface. user1@ceph-node-4:~$ swift -A http://ceph-node-1/auth -U useast:swift -K FmQYYbzly4RH+PmNlrWA3ynN+eJrayYXzeISGDSw --debug stat INFO:urllib3.connectionpool:Starting new HTTP connection (1): ceph-node-1 DEBUG:urllib3.connectionpool:Setting read timeout to object object at 0x7f45834a7090 DEBUG:urllib3.connectionpool:GET /auth HTTP/1.1 403 23 INFO:swiftclient:REQ: curl -i http://ceph-node-1/auth -X GET INFO:swiftclient:RESP STATUS: 403 Forbidden INFO:swiftclient:RESP HEADERS: [('date', 'Tue, 11 Nov 2014 12:30:58 GMT'), ('accept-ranges', 'bytes'), ('content-type', 'application/json'), ('content-length', '23'), ('server', 'Apache/2.2.22 (Ubuntu)')] INFO:swiftclient:RESP BODY: {Code:AccessDenied} ERROR:swiftclient:Auth GET failed: http://ceph-node-1/auth 403 Forbidden Traceback (most recent call last): File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 1181, in _retry self.url, self.token = self.get_auth() File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 1155, in get_auth insecure=self.insecure) File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 318, in get_auth insecure=insecure) File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 241, in get_auth_1_0 http_reason=resp.reason) ClientException: Auth GET failed: http://ceph-node-1/auth 403 Forbidden Auth GET failed: http://ceph-node-1/auth 403 Forbidden The region map is as follows. vinod@ceph-node-1:~$ radosgw-admin region get --name=client.radosgw.us-east-1 { name: us, api_name: us, is_master: true, endpoints: [], master_zone: us-east, zones: [ { name: us-east, endpoints: [ http:\/\/ceph-node-1:80\/], log_meta: true, log_data: true}, { name: us-west, endpoints: [ http:\/\/ceph-node-2:80\/], log_meta: true, log_data: true}], placement_targets: [ { name: default-placement, tags: []}], default_placement: default-placement} The user info is follows. vinod@ceph-node-1:~$ radosgw-admin user info --uid=useast --name=client.radosgw.us-east-1 { user_id: useast, display_name: Region-US Zone-East, email: , suspended: 0, max_buckets: 1000, auid: 0, subusers: [ { id: useast:swift, permissions: full-control}], keys: [ { user: useast, access_key: 45BEF1XQ3Z94B0LIBTLX, secret_key: 123}, { user: useast:swift, access_key: WF2QYTY0LDN66CHJ8JSE, secret_key: }], swift_keys: [ { user: useast:swift, secret_key: FmQYYbzly4RH+PmNlrWA3ynN+eJrayYXzeISGDSw}], caps: [], op_mask: read, write, delete, system: true, default_placement: , placement_tags: [], bucket_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, user_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, temp_url_keys: []} Contents of rgw-us-east.conf file is as follows. vinod@ceph-node-1:~$ cat /etc/apache2/sites-enabled/rgw-us-east.conf FastCgiExternalServer /var/www/s3gw.fcgi -socket /var/run/ceph/client.radosgw.us-east-1.sock VirtualHost *:80 ServerName ceph-node-1 ServerAdmin vinvi...@gmail.com DocumentRoot /var/www RewriteEngine On RewriteRule ^/(.*) /s3gw.fcgi?%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L] IfModule mod_fastcgi.c Directory /var/www Options +ExecCGI AllowOverride All SetHandler fastcgi-script Order allow,deny Allow from all AuthBasicAuthoritative Off /Directory /IfModule AllowEncodedSlashes On ErrorLog /var/log/apache2/error.log CustomLog /var/log/apache2/access.log combined ServerSignature Off /VirtualHost Can someone point out to me where am i doing wrong? -- Vinod ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds isn't working anymore after osd's running full
No problem thanks for helping. I don't want to disable the deep scrubbing process itself because its very useful but one placement group (3.30) is continuously deep scrubbing and it should finish after some time but it won't. Jasper Van: Gregory Farnum [g...@gregs42.com] Verzonden: maandag 10 november 2014 18:24 Aan: Jasper Siero CC: ceph-users; John Spray Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full It's supposed to do that; deep scrubbing is an ongoing consistency-check mechanism. If you really want to disable it you can set an osdmap flag to prevent it, but you'll have to check the docs for exactly what that is as I can't recall. Glad things are working for you; sorry it took so long! -Greg On Mon, Nov 10, 2014 at 8:49 AM, Jasper Siero jasper.si...@target-holding.nl wrote: Hello John and Greg, I used the new patch and now the undump succeeded and the mds is working fine and I can mount cephfs again! I still have one placement group which keeps deep scrubbing even after restarting the ceph cluster: dumped all in format plain 3.300 0 0 0 0 0 0 active+clean+scrubbing+deep 2014-11-10 17:21:15.866965 0'0 2414:418[1,9] 1 [1,9] 1 631'34632014-08-21 15:14:45.430926 602'31312014-08-18 15:14:37.494913 I there a way to solve this? Kind regards, Jasper Van: Gregory Farnum [g...@gregs42.com] Verzonden: vrijdag 7 november 2014 22:42 Aan: Jasper Siero CC: ceph-users; John Spray Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full On Thu, Nov 6, 2014 at 11:49 AM, John Spray john.sp...@redhat.com wrote: This is still an issue on master, so a fix will be coming soon. Follow the ticket for updates: http://tracker.ceph.com/issues/10025 Thanks for finding the bug! John is off for a vacation, but he pushed a branch wip-10025-firefly that if you install that (similar address to the other one) should work for you. You'll need to reset and undump again (I presume you still have the journal-as-a-file). I'll be merging them in to the stable branches pretty shortly as well. -Greg John On Thu, Nov 6, 2014 at 6:21 PM, John Spray john.sp...@redhat.com wrote: Jasper, Thanks for this -- I've reproduced this issue in a development environment. We'll see if this is also an issue on giant, and backport a fix if appropriate. I'll update this thread soon. Cheers, John On Mon, Nov 3, 2014 at 8:49 AM, Jasper Siero jasper.si...@target-holding.nl wrote: Hello Greg, I saw that the site of the previous link of the logs uses a very short expiring time so I uploaded it to another one: http://www.mediafire.com/download/gikiy7cqs42cllt/ceph-mds.th1-mon001.log.tar.gz Thanks, Jasper Van: gregory.far...@inktank.com [gregory.far...@inktank.com] namens Gregory Farnum [gfar...@redhat.com] Verzonden: donderdag 30 oktober 2014 1:03 Aan: Jasper Siero CC: John Spray; ceph-users Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full On Wed, Oct 29, 2014 at 7:51 AM, Jasper Siero jasper.si...@target-holding.nl wrote: Hello Greg, I added the debug options which you mentioned and started the process again: [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph --reset-journal 0 old journal was 9483323613~134233517 new journal start will be 9621733376 (4176246 bytes past old end) writing journal head writing EResetJournal entry done [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 -c /etc/ceph/ceph.conf --cluster ceph --undump-journal 0 journaldumptgho-mon001 undump journaldumptgho-mon001 start 9483323613 len 134213311 writing header 200. writing 9483323613~1048576 writing 9484372189~1048576 writing 9485420765~1048576 writing 9486469341~1048576 writing 9487517917~1048576 writing 9488566493~1048576 writing 9489615069~1048576 writing 9490663645~1048576 writing 9491712221~1048576 writing 9492760797~1048576 writing 9493809373~1048576 writing 9494857949~1048576 writing 9495906525~1048576 writing 9496955101~1048576 writing 9498003677~1048576 writing 9499052253~1048576 writing 9500100829~1048576 writing 9501149405~1048576 writing 9502197981~1048576 writing 9503246557~1048576 writing 9504295133~1048576 writing 9505343709~1048576 writing 9506392285~1048576 writing 9507440861~1048576 writing 9508489437~1048576 writing 9509538013~1048576 writing 9510586589~1048576 writing 9511635165~1048576 writing 9512683741~1048576 writing 9513732317~1048576 writing 9514780893~1048576 writing 9515829469~1048576 writing 9516878045~1048576 writing 9517926621~1048576 writing 9518975197~1048576 writing
Re: [ceph-users] Configuring swift user for ceph Rados Gateway - 403 Access Denied
On 2014-11-11 13:12:32 +, ವಿನೋದ್ Vinod H I said: Hi, I am having problems accessing rados gateway using swift interface. I am using ceph firefly version and have configured a us region as explained in the docs. There are two zones us-east and us-west. us-east gateway is running on host ceph-node-1 and us-west gateway is running on host ceph-node-2. [...] Auth GET failed: http://ceph-node-1/auth 403 Forbidden [...] swift_keys: [ { user: useast:swift, secret_key: FmQYYbzly4RH+PmNlrWA3ynN+eJrayYXzeISGDSw}], We have seen problems when the secret_key has special characters. I am not sure if + is one of them, but the manual states this somewhere. Try setting the key explictly or by re-generating one until you get one without any special chars. Drove me nuts. Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Weight field in osd dump osd tree
Thanks christian.. Got clear about the concept.. thanks very much :) On Tue, Nov 11, 2014 at 5:47 PM, Loic Dachary l...@dachary.org wrote: Hi Christian, On 11/11/2014 13:09, Christian Balzer wrote: On Tue, 11 Nov 2014 17:14:49 +0530 Mallikarjun Biradar wrote: Hi all When Issued ceph osd dump it displays weight for that osd as 1 and when issued osd tree it displays 0.35 There are many threads about this, google is your friend. For example: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg11010.html In short, one is the CRUSH weight (usually based on the capacity of the OSD), the other is the OSD weight (or reweight in the tree display). For example think about a cluster with 100 2TB OSDs and you're planning to replace them (bit by bit) with 4TB OSDs. But the hard disks are the same speed, so if you would just replace things, more and more data would migrate to your bigger OSDs, making the whole cluster actually slower. Setting the OSD weight (reweight) to 0.5 for the 4TB OSDs (untiil the replacement is complete) will result in them getting the same allocation as the 2TB ones, keeping things even. It is a great example. Would you like to add it to http://ceph.com/docs/giant/rados/operations/control/#osd-subsystem ? If you do not have time, I volunteer to do it :-) Cheers Christian output from osd dump: { osd: 20, uuid: b2a97a29-1b8a-43e4-a4b0-fd9ee351086e, up: 1, in: 1, weight: 1.00, primary_affinity: 1.00, last_clean_begin: 0, last_clean_end: 0, up_from: 103, up_thru: 106, down_at: 0, lost_at: 0, public_addr: 10.242.43.116:6820\/27623, cluster_addr: 10.242.43.116:6821\/27623, heartbeat_back_addr: 10.242.43.116:6822\/27623, heartbeat_front_addr: 10.242.43.116:6823\/27623, state: [ exists, up]}], output from osd tree: # idweight type name up/down reweight -1 7.35root default -2 2.8 host rack6-storage-5 0 0.35osd.0 up 1 1 0.35osd.1 up 1 2 0.35osd.2 up 1 3 0.35osd.3 up 1 4 0.35osd.4 up 1 5 0.35osd.5 up 1 6 0.35osd.6 up 1 7 0.35osd.7 up 1 -3 2.8 host rack6-storage-4 8 0.35osd.8 up 1 9 0.35osd.9 up 1 10 0.35osd.10 up 1 11 0.35osd.11 up 1 12 0.35osd.12 up 1 13 0.35osd.13 up 1 14 0.35osd.14 up 1 15 0.35osd.15 up 1 -4 1.75host rack6-storage-6 16 0.35osd.16 up 1 17 0.35osd.17 up 1 18 0.35osd.18 up 1 19 0.35osd.19 up 1 20 0.35osd.20 up 1 Please help me to understand this -regards, Mallikarjun Biradar -- Loïc Dachary, Artisan Logiciel Libre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Federated gateways
Ok I believe I’ve made some progress here. I have everything syncing *except* data. The data is getting 500s when it tries to sync to the backup zone. I have a log from the radosgw with debug cranked up to 20: 2014-11-11 14:37:06.688331 7f54447f0700 1 == starting new request req=0x7f546800f3b0 = 2014-11-11 14:37:06.688978 7f54447f0700 0 WARNING: couldn't find acl header for bucket, generating default 2014-11-11 14:37:06.689358 7f54447f0700 1 -- 172.16.10.103:0/1007381 -- 172.16.10.103:6934/14875 -- osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 -- ?+0 0x7f534800d770 con 0x7f53f00053f0 2014-11-11 14:37:06.689396 7f54447f0700 20 -- 172.16.10.103:0/1007381 submit_message osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 remote, 172.16.10.103:6934/14875, have pipe. 2014-11-11 14:37:06.689481 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.689592 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer encoding 48 features 17592186044415 0x7f534800d770 osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 2014-11-11 14:37:06.689756 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer signed seq # 48): sig = 206599450695048354 2014-11-11 14:37:06.689804 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sending 48 0x7f534800d770 2014-11-11 14:37:06.689884 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.689915 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sleeping 2014-11-11 14:37:06.694968 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ACK 2014-11-11 14:37:06.695053 7f51ff0f0700 15 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ack seq 48 2014-11-11 14:37:06.695067 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader reading tag... 2014-11-11 14:37:06.695079 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got MSG 2014-11-11 14:37:06.695093 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got envelope type=43 src osd.25 front=190 data=0 off 0 2014-11-11 14:37:06.695108 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader wants 190 from dispatch throttler 0/104857600 2014-11-11 14:37:06.695135 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got front 190 2014-11-11 14:37:06.695150 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).aborted = 0 2014-11-11 14:37:06.695158 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got 190 + 0 + 0 byte message 2014-11-11 14:37:06.695284 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got message 48 0x7f51b4001950 osd_op_reply(1783 statelog.obj_opstate.97 [call] v47531'13 uv13 ondisk = 0) v6 2014-11-11 14:37:06.695313 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 queue 0x7f51b4001950 prio 127 2014-11-11 14:37:06.695374 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader reading tag... 2014-11-11 14:37:06.695384 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.695426 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61
[ceph-users] long term support version?
Hi all, Did I notice correctly that firefly is going to be supported long term whereas Giant is not going to be supported as long? http://ceph.com/releases/v0-80-firefly-released/ This release will form the basis for our long-term supported release Firefly, v0.80.x. http://ceph.com/uncategorized/v0-87-giant-released/ This release will form the basis for the stable release Giant, v0.87.x. Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
Thanks Craig, I'll jiggle the OSDs around to see if that helps. Otherwise, I'm almost certain removing the pool will work. :/ Have a good one, Chad. I had the same experience with force_create_pg too. I ran it, and the PGs sat there in creating state. I left the cluster overnight, and sometime in the middle of the night, they created. The actual transition from creating to active+clean happened during the recovery after a single OSD was kicked out. I don't recall if that single OSD was responsible for the creating PGs. I really can't say what un-jammed my creating. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] InInstalling ceph on a single machine with cephdeploy ubuntu 14.04 64 bit
Hi, I am unable to figure out how to install and deploy ceph on a single machine with ceph deploy. I have ubuntu 14.04 - 64 bit installed in a virtual machine (on windows 8.1 through VMware player) and have installed devstack on ubuntu. I am trying to install ceph on the same machine (Ubuntu) and interface with openstack. I have tried the following steps but it says that mkcephfs does not exist and I read that it is deprecated and ceph - deploy is there. But documentation talks about multiple nodes. I am lost as to how to use ceph deploy and install and setup ceph on a single machine. Pl guide me. I tried the following steps earlier which was given for mkcephfs. ( reference http://eu.ceph.com/docs/wip-6919/start/quick-start/ sudo apt-get update sudo apt-get install ceph (2) Execute hostname -s on the command line to retrieve the name of your host. Then, replace {hostname} in the sample configuration file with your host name. Execute ifconfig on the command line to retrieve the IP address of your host. Then, replace {ip-address} with the IP address of your host. Finally, copy the contents of the modified configuration file and save it to /etc/ceph/ceph.conf. This file will configure Ceph to operate a monitor, two OSD daemons and one metadata server on your local machin [osd] osd journal size = 1000 filestore xattr use omap = true # Execute $ hostname to retrieve the name of your host, # and replace {hostname} with the name of your host. # For the monitor, replace {ip-address} with the IP # address of your host. [mon.a] host = {hostname} mon addr = {ip-address}:6789 [osd.0] host = {hostname} [osd.1] host = {hostname} [mds.a] host = {hostname} sudo mkdir /var/lib/ceph/osd/ceph-0 sudo mkdir /var/lib/ceph/osd/ceph-1 sudo mkdir /var/lib/ceph/mon/ceph-a sudo mkdir /var/lib/ceph/mds/ceph-a cd /etc/ceph sudo mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring sudo service ceph start ceph health Regards Sent from Windows Mail___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] long term support version?
Yep! Every other stable release gets the LFS treatment. We're still fixing bugs and backporting some minor features to Dumpling, but haven't done any serious updates to Emperor since Firefly came out. Giant will be superseded by Hammer in the February timeframe, if I have my dates right. -Greg On Tue, Nov 11, 2014 at 8:54 AM Chad Seys cws...@physics.wisc.edu wrote: Hi all, Did I notice correctly that firefly is going to be supported long term whereas Giant is not going to be supported as long? http://ceph.com/releases/v0-80-firefly-released/ This release will form the basis for our long-term supported release Firefly, v0.80.x. http://ceph.com/uncategorized/v0-87-giant-released/ This release will form the basis for the stable release Giant, v0.87.x. Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Deep scrub, cache pools, replica 1
On Mon, Nov 10, 2014 at 10:58 PM, Christian Balzer ch...@gol.com wrote: Hello, One of my clusters has become busy enough (I'm looking at you, evil Window VMs that I shall banish elsewhere soon) to experience client noticeable performance impacts during deep scrub. Before this I instructed all OSDs to deep scrub in parallel at Saturday night and that finished before Sunday morning. So for now I'll fire them off one by one to reduce the load. Looking forward, that cluster doesn't need more space so instead of adding more hosts and OSDs I was thinking of a cache pool instead. I suppose that will keep the clients happy while the slow pool gets scrubbed. Is there anybody who tested cache pools with Firefly and compared the performance to Giant? For testing I'm currently playing with a single storage node and 8 SSD backed OSDs. Now what very much blew my mind is that a pool with a replication of 1 still does quite the impressive read orgy, clearly reading all the data in the PGs. Why? And what is it comparing that data with, the cosmic background radiation? Yeah, cache pools currently do full-object promotions whenever an object is accessed. There are some ideas and projects to improve this or reduce its effects, but they're mostly just getting started. At least, I assume that's what you mean by a read orgy; perhaps you are seeing something else entirely? Also, even on cache pools you don't really want to run with 1x replication as they hold the only copy of whatever data is dirty... -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds isn't working anymore after osd's running full
On Tue, Nov 11, 2014 at 5:06 AM, Jasper Siero jasper.si...@target-holding.nl wrote: No problem thanks for helping. I don't want to disable the deep scrubbing process itself because its very useful but one placement group (3.30) is continuously deep scrubbing and it should finish after some time but it won't. Hmm, how are you determining that this one PG won't stop scrubbing? This doesn't sound like any issues familiar to me. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull
Hi Guys, We ran into this issue after we nearly max’ed out the sod’s. Since then, we have cleaned up a lot of data in the sod’s but pg’s seem to stuck for last 4 to 5 days. I have run ceph osd reweight-by-utilization” and that did not seem to work. Any suggestions? ceph -s cluster 909c7fe9-0012-4c27-8087-01497c661511 health HEALTH_WARN 224 pgs backfill; 130 pgs backfill_toofull; 86 pgs backfilling; 4 pgs degraded; 14 pgs recovery_wait; 324 pgs stuck unclean; recovery -11922/573322 objects degraded (-2.079%) monmap e5: 5 mons at {Lab-mon001=x.x.96.12:6789/0,Lab-mon002=x.x.96.13:6789/0,Lab-mon003=x.x.96.14:6789/0,Lab-mon004=x.x.96.15:6789/0,Lab-mon005=x.x.96.16:6789/0}, election epoch 28, quorum 0,1,2,3,4 Lab-mon001,Lab-mon002,Lab-mon003,Lab-mon004,Lab-mon005 mdsmap e6: 1/1/1 up {0=Lab-mon001=up:active} osdmap e10598: 495 osds: 492 up, 492 in pgmap v1827231: 21568 pgs, 3 pools, 221 GB data, 184 kobjects 4142 GB used, 4982 GB / 9624 GB avail -11922/573322 objects degraded (-2.079%) 9 active+recovery_wait 21244 active+clean 90 active+remapped+wait_backfill 5 active+recovery_wait+remapped 4 active+degraded+remapped+wait_backfill 130 active+remapped+wait_backfill+backfill_toofull 86 active+remapped+backfilling client io 0 B/s rd, 0 op/s ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Don't have yet 10GBE, but here my result my simple lacp on 2 gigabit links with a cisco 6500 rtt min/avg/max/mdev = 0.179/0.202/0.221/0.019 ms (Seem to be lower than your 10gbe nexus) - Mail original - De: Wido den Hollander w...@42on.com À: ceph-users@lists.ceph.com Envoyé: Lundi 10 Novembre 2014 17:22:04 Objet: Re: [ceph-users] Typical 10GbE latency On 08-11-14 02:42, Gary M wrote: Wido, Take the switch out of the path between nodes and remeasure.. ICMP-echo requests are very low priority traffic for switches and network stacks. I tried with a direct TwinAx and fiber cable. No difference. If you really want to know, place a network analyzer between the nodes to measure the request packet to response packet latency.. The ICMP traffic to the ping application is not accurate in the sub-millisecond range. And should only be used as a rough estimate. True, I fully agree with you. But, why is everybody showing a lower latency here? My latencies are about 40% higher then what I see in this setup and other setups. You also may want to install the high resolution timer patch, sometimes called HRT, to the kernel which may give you different results. ICMP traffic takes a different path than the TCP traffic and should not be considered an indicator of defect. Yes, I'm aware. But it still doesn't explain me why the latency on other systems, which are in production, is lower then on this idle system. I believe the ping app calls the sendto system call.(sorry its been a while since I last looked) Systems calls can take between .1us and .2us each. However, the ping application makes several of these calls and waits for a signal from the kernel. The wait for a signal means the ping application must wait to be rescheduled to report the time.Rescheduling will depend on a lot of other factors in the os. eg, timers, card interrupts other tasks with higher priorities. Reporting the time must add a few more systems calls for this to happen. As the ping application loops to post the next ping request which again requires a few systems calls which may cause a task switch while in each system call. For the above factors, the ping application is not a good representation of network performance due to factors in the application and network traffic shaping performed at the switch and the tcp stacks. I think that netperf is probably a better tool, but that also does TCP latencies. I want the real IP latency, so I assumed that ICMP would be the most simple one. The other setups I have access to are in production and do not have any special tuning, yet their latency is still lower then on this new deployment. That's what gets me confused. Wido cheers, gary On Fri, Nov 7, 2014 at 4:32 PM, Łukasz Jagiełło jagiello.luk...@gmail.com mailto:jagiello.luk...@gmail.com wrote: Hi, rtt min/avg/max/mdev = 0.070/0.177/0.272/0.049 ms 04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) at both hosts and Arista 7050S-64 between. Both hosts were part of active ceph cluster. On Thu, Nov 6, 2014 at 5:18 AM, Wido den Hollander w...@42on.com mailto:w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 tel:%2B31%20%280%2920%20700%209902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Łukasz Jagiełło lukaszatjagiellodotorg ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
Re: [ceph-users] PG's incomplete after OSD failure
I've done a bit more work tonight and managed to get some more data back. Osd.121, which was previously completely dead, has made it through an XFS repair with a more fault tolerant HBA firmware and I was able to export both of the placement groups required using ceph_objectstore_tool. The osd would probably boot if I hadn't already marked it as lost :( I've basically got it down to two options. I can import the exported data from osd.121 into osd.190 which would complete the PG but this fails with a filestore feature mismatch because the sharded objects feature is missing on the target osd. Export has incompatible features set compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo object,3=object locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded objects,12=transaction hints} The second one would be to run a ceph pg force_create_pg on each of the problem PG's to reset them back to empty and them import the data using ceph_objectstore_tool import-rados. Unfortunately this has failed as well when I tested ceph pg force_create_pg on an incomplete PG in another pool. The PG gets set to creating but then goes back to incomplete after a few minutes. I've trawled the mailing list for solutions but have come up empty, neither problem appears to have been resolved before. On Tue, Nov 11, 2014 at 5:54 PM, Matthew Anderson manderson8...@gmail.com wrote: Thanks for your reply Sage! I've tested with 8.6ae and no luck I'm afraid. Steps taken were - Stop osd.117 Export 8.6ae from osd.117 Remove 8.6ae from osd.117 start osd.117 restart osd.190 after still showing incomplete After this the PG was still showing incomplete and ceph pg dump_stuck inactive shows - pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 8.6ae 0 0 0 0 0 0 0 0 incomplete 2014-11-11 17:34:27.168078 0'0 161425:40 [117,190] 117 [117,190] 117 86424'389748 2013-09-09 16:52:58.796650 86424'389748 2013-09-09 16:52:58.796650 I then tried an export from OSD 190 to OSD 117 by doing - Stop osd.190 and osd.117 Export pg 8.6ae from osd.190 Import from file generated in previous step into osd.117 Boot both osd.190 and osd.117 When osd.117 attempts to start it generates an failed assert, full log is here http://pastebin.com/S4CXrTAL -1 2014-11-11 17:25:15.130509 7f9f44512900 0 osd.117 161404 load_pgs 0 2014-11-11 17:25:18.604696 7f9f44512900 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f9f44512900 time 2014-11-11 17:25:18.602626 osd/OSD.h: 715: FAILED assert(ret) ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xb8231b] 2: (OSDService::get_map(unsigned int)+0x3f) [0x6eea2f] 3: (OSD::load_pgs()+0x1b78) [0x6aae18] 4: (OSD::init()+0x71f) [0x6abf5f] 5: (main()+0x252c) [0x638cfc] 6: (__libc_start_main()+0xf5) [0x7f9f41650ec5] 7: /usr/bin/ceph-osd() [0x651027] I also attempted the same steps with 8.ca and got the same results. The below is the current state of the pg with it removed from osd.111 - pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 8.ca 2440 0 0 0 0 10219748864 9205 9205 incomplete 2014-11-11 17:39:28.570675 160435'959618 161425:6071759 [190,111] 190 [190,111] 190 86417'207324 2013-09-09 12:58:10.749001 86229'196887 2013-09-02 12:57:58.162789 Any idea of where I can go from here? One thought I had was setting osd.111 and osd.117 out of the cluster and once the data is moved I can shut them down and mark them as lost which would make osd.190 the only replica available for those PG's. Thanks again On Tue, Nov 11, 2014 at 1:10 PM, Sage Weil sw...@redhat.com wrote: On Tue, 11 Nov 2014, Matthew Anderson wrote: Just an update, it appears that no data actually exists for those PG's on osd.117 and osd.111 but it's showing as incomplete anyway. So for the 8.ca PG, osd.111 has only an empty directory but osd 190 is filled with data. For 8.6ae, osd.117 has no data in the pg directory and osd.190 is filled with data as before. Since all of the required data is on OSD.190, would there be a way to make osd.111 and osd.117 forget they have ever seen the two incomplete PG's and therefore restart backfilling? Ah, that's good news. You should know that the copy on osd.190 is slightly out of date, but it is much better than losing the entire contents of the PG. More specifically, for 8.6ae the latest version was 1935986 but the osd.190 is 1935747, about 200 writes in the past. You'll need to fsck the RBD images after this is all done. I don't think we've tested this recovery scenario, but I think you'll be able to recovery with
[ceph-users] Not finding systemd files in Giant CentOS7 packages
I was trying to get systemd to bring up the monitor using the new systemd files in Giant. However, I'm not finding the systemd files included in the CentOS 7 packages. Are they missing or am I confused about how it should work? ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) Installed Packages ceph.x86_64 1:0.87-0.el7.centos @Ceph ceph-common.x86_64 1:0.87-0.el7.centos @Ceph ceph-deploy.noarch 1.5.19-0 @Ceph-noarch ceph-release.noarch 1-0.el7 installed libcephfs1.x86_641:0.87-0.el7.centos @Ceph python-ceph.x86_64 1:0.87-0.el7.centos @Ceph Thanks, Robert LeBlanc ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull
Find out which OSD it is: ceph health detail Squeeze blocks off the affected OSD: ceph osd reweight OSDNUM 0.8 Repeat with any OSD which becomes toofull. Your cluster is only about 50% used, so I think this will be enough. Then when it finishes, allow data back on OSD: ceph osd reweight OSDNUM 1 Hopefully ceph will someday be taught to move PGs in a better order! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Federated gateways
Is that radosgw log from the primary or the secondary zone? Nothing in that log jumps out at me. I see you're running 0.80.5. Are you using Apache 2.4? There is a known issue with Apache 2.4 on the primary and replication. It's fixed, just waiting for the next firefly release. Although, that causes 40x errors with Apache 2.4, not 500 errors. Have you verified that both system users can read and write to both clusters? (Just make sure you clean up the writes to the slave cluster). On Tue, Nov 11, 2014 at 6:51 AM, Aaron Bassett aa...@five3genomics.com wrote: Ok I believe I’ve made some progress here. I have everything syncing *except* data. The data is getting 500s when it tries to sync to the backup zone. I have a log from the radosgw with debug cranked up to 20: 2014-11-11 14:37:06.688331 7f54447f0700 1 == starting new request req=0x7f546800f3b0 = 2014-11-11 14:37:06.688978 7f54447f0700 0 WARNING: couldn't find acl header for bucket, generating default 2014-11-11 14:37:06.689358 7f54447f0700 1 -- 172.16.10.103:0/1007381 -- 172.16.10.103:6934/14875 -- osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 -- ?+0 0x7f534800d770 con 0x7f53f00053f0 2014-11-11 14:37:06.689396 7f54447f0700 20 -- 172.16.10.103:0/1007381 submit_message osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 remote, 172.16.10.103:6934/14875, have pipe. 2014-11-11 14:37:06.689481 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.689592 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer encoding 48 features 17592186044415 0x7f534800d770 osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 2014-11-11 14:37:06.689756 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer signed seq # 48): sig = 206599450695048354 2014-11-11 14:37:06.689804 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sending 48 0x7f534800d770 2014-11-11 14:37:06.689884 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.689915 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sleeping 2014-11-11 14:37:06.694968 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ACK 2014-11-11 14:37:06.695053 7f51ff0f0700 15 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ack seq 48 2014-11-11 14:37:06.695067 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader reading tag... 2014-11-11 14:37:06.695079 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got MSG 2014-11-11 14:37:06.695093 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got envelope type=43 src osd.25 front=190 data=0 off 0 2014-11-11 14:37:06.695108 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader wants 190 from dispatch throttler 0/104857600 2014-11-11 14:37:06.695135 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got front 190 2014-11-11 14:37:06.695150 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).aborted = 0 2014-11-11 14:37:06.695158 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got 190 + 0 + 0 byte message 2014-11-11 14:37:06.695284 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got message 48 0x7f51b4001950 osd_op_reply(1783 statelog.obj_opstate.97 [call] v47531'13 uv13 ondisk = 0) v6 2014-11-11 14:37:06.695313
Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull
Thanks Chad. It seems to be working. —Jiten On Nov 11, 2014, at 12:47 PM, Chad Seys cws...@physics.wisc.edu wrote: Find out which OSD it is: ceph health detail Squeeze blocks off the affected OSD: ceph osd reweight OSDNUM 0.8 Repeat with any OSD which becomes toofull. Your cluster is only about 50% used, so I think this will be enough. Then when it finishes, allow data back on OSD: ceph osd reweight OSDNUM 1 Hopefully ceph will someday be taught to move PGs in a better order! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull
How many OSDs are nearfull? I've seen Ceph want two toofull OSDs to swap PGs. In that case, I dynamically raised mon_osd_nearfull_ratio and osd_backfill_full_ratio a bit, then put it back to normal once the scheduling deadlock finished. Keep in mind that ceph osd reweight is temporary. If you mark an osd OUT then IN, the weight will be set to 1.0. If you need something that's persistent, you can use ceph osd crush reweight osd.NUM crust_weight. Look at ceph osd tree to get the current weight. I also recommend stepping towards your goal. Changing either weight can cause a lot of unrelated migrations, and the crush weight seems to cause more than the osd weight. I step osd weight by 0.125, and crush weight by 0.05. On Tue, Nov 11, 2014 at 12:47 PM, Chad Seys cws...@physics.wisc.edu wrote: Find out which OSD it is: ceph health detail Squeeze blocks off the affected OSD: ceph osd reweight OSDNUM 0.8 Repeat with any OSD which becomes toofull. Your cluster is only about 50% used, so I think this will be enough. Then when it finishes, allow data back on OSD: ceph osd reweight OSDNUM 1 Hopefully ceph will someday be taught to move PGs in a better order! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Federated gateways
On Nov 11, 2014, at 4:21 PM, Craig Lewis cle...@centraldesktop.com wrote: Is that radosgw log from the primary or the secondary zone? Nothing in that log jumps out at me. This is the log from the secondary zone. That HTTP 500 response code coming back is the only problem I can find. There are a bunch of 404s from other requests to logs and stuff, but I assume those are normal because there’s no activity going on. I guess it’s just that cryptic WARNING: set_req_state_err err_no=5 resorting to 500 line that’s the problem. I think I need to get a stack trace from that somehow. I see you're running 0.80.5. Are you using Apache 2.4? There is a known issue with Apache 2.4 on the primary and replication. It's fixed, just waiting for the next firefly release. Although, that causes 40x errors with Apache 2.4, not 500 errors. It is apache 2.4, but I’m actually running 0.80.7 so I probably have that bug fix? Have you verified that both system users can read and write to both clusters? (Just make sure you clean up the writes to the slave cluster). Yes I can write everywhere and radosgw-agent isn’t getting any 403s like it was earlier when I had mismatched keys. The .us-nh.rgw.buckets.index pool is syncing properly, as are the users. It seems like really the only thing that isn’t syncing is the .zone.rgw.buckets pool. Thanks, Aaron On Tue, Nov 11, 2014 at 6:51 AM, Aaron Bassett aa...@five3genomics.com mailto:aa...@five3genomics.com wrote: Ok I believe I’ve made some progress here. I have everything syncing *except* data. The data is getting 500s when it tries to sync to the backup zone. I have a log from the radosgw with debug cranked up to 20: 2014-11-11 14:37:06.688331 7f54447f0700 1 == starting new request req=0x7f546800f3b0 = 2014-11-11 14:37:06.688978 7f54447f0700 0 WARNING: couldn't find acl header for bucket, generating default 2014-11-11 14:37:06.689358 7f54447f0700 1 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 -- 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875 -- osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 -- ?+0 0x7f534800d770 con 0x7f53f00053f0 2014-11-11 14:37:06.689396 7f54447f0700 20 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 submit_message osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 remote, 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875, have pipe. 2014-11-11 14:37:06.689481 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.689592 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer encoding 48 features 17592186044415 0x7f534800d770 osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 2014-11-11 14:37:06.689756 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer signed seq # 48): sig = 206599450695048354 2014-11-11 14:37:06.689804 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sending 48 0x7f534800d770 2014-11-11 14:37:06.689884 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.689915 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sleeping 2014-11-11 14:37:06.694968 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ACK 2014-11-11 14:37:06.695053 7f51ff0f0700 15 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ack seq 48 2014-11-11 14:37:06.695067 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 http://172.16.10.103:0/1007381 172.16.10.103:6934/14875
Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull
Actually there were 100’s that were too full. We manually set the OSD weights to 0.5 and it seems to be recovering. Thanks of the tips on crush reweight. I will look into it. —Jiten On Nov 11, 2014, at 1:37 PM, Craig Lewis cle...@centraldesktop.com wrote: How many OSDs are nearfull? I've seen Ceph want two toofull OSDs to swap PGs. In that case, I dynamically raised mon_osd_nearfull_ratio and osd_backfill_full_ratio a bit, then put it back to normal once the scheduling deadlock finished. Keep in mind that ceph osd reweight is temporary. If you mark an osd OUT then IN, the weight will be set to 1.0. If you need something that's persistent, you can use ceph osd crush reweight osd.NUM crust_weight. Look at ceph osd tree to get the current weight. I also recommend stepping towards your goal. Changing either weight can cause a lot of unrelated migrations, and the crush weight seems to cause more than the osd weight. I step osd weight by 0.125, and crush weight by 0.05. On Tue, Nov 11, 2014 at 12:47 PM, Chad Seys cws...@physics.wisc.edu wrote: Find out which OSD it is: ceph health detail Squeeze blocks off the affected OSD: ceph osd reweight OSDNUM 0.8 Repeat with any OSD which becomes toofull. Your cluster is only about 50% used, so I think this will be enough. Then when it finishes, allow data back on OSD: ceph osd reweight OSDNUM 1 Hopefully ceph will someday be taught to move PGs in a better order! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull
0.5 might be too much. All the PGs squeezed off of one OSD will need to be stored on another. The fewer you move the less likely a different OSD will become toofull. Better to adjust in small increments as Craig suggested. Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Is this with a 8192 byte payload? Theoretical transfer time of 1 Gbps (you are only sending one packet so LACP won't help) one direction is 0.061 ms, double that and you are at 0.122 ms of bits in flight, then there is context switching, switch latency (store and forward assumed for 1 Gbps), etc which I'm not sure would fit in the rest of the 0.057 of you min time. If it is a 8192 byte payload, then I'm really impressed! On Tue, Nov 11, 2014 at 11:56 AM, Alexandre DERUMIER aderum...@odiso.com wrote: Don't have yet 10GBE, but here my result my simple lacp on 2 gigabit links with a cisco 6500 rtt min/avg/max/mdev = 0.179/0.202/0.221/0.019 ms (Seem to be lower than your 10gbe nexus) - Mail original - De: Wido den Hollander w...@42on.com À: ceph-users@lists.ceph.com Envoyé: Lundi 10 Novembre 2014 17:22:04 Objet: Re: [ceph-users] Typical 10GbE latency On 08-11-14 02:42, Gary M wrote: Wido, Take the switch out of the path between nodes and remeasure.. ICMP-echo requests are very low priority traffic for switches and network stacks. I tried with a direct TwinAx and fiber cable. No difference. If you really want to know, place a network analyzer between the nodes to measure the request packet to response packet latency.. The ICMP traffic to the ping application is not accurate in the sub-millisecond range. And should only be used as a rough estimate. True, I fully agree with you. But, why is everybody showing a lower latency here? My latencies are about 40% higher then what I see in this setup and other setups. You also may want to install the high resolution timer patch, sometimes called HRT, to the kernel which may give you different results. ICMP traffic takes a different path than the TCP traffic and should not be considered an indicator of defect. Yes, I'm aware. But it still doesn't explain me why the latency on other systems, which are in production, is lower then on this idle system. I believe the ping app calls the sendto system call.(sorry its been a while since I last looked) Systems calls can take between .1us and .2us each. However, the ping application makes several of these calls and waits for a signal from the kernel. The wait for a signal means the ping application must wait to be rescheduled to report the time.Rescheduling will depend on a lot of other factors in the os. eg, timers, card interrupts other tasks with higher priorities. Reporting the time must add a few more systems calls for this to happen. As the ping application loops to post the next ping request which again requires a few systems calls which may cause a task switch while in each system call. For the above factors, the ping application is not a good representation of network performance due to factors in the application and network traffic shaping performed at the switch and the tcp stacks. I think that netperf is probably a better tool, but that also does TCP latencies. I want the real IP latency, so I assumed that ICMP would be the most simple one. The other setups I have access to are in production and do not have any special tuning, yet their latency is still lower then on this new deployment. That's what gets me confused. Wido cheers, gary On Fri, Nov 7, 2014 at 4:32 PM, Łukasz Jagiełło jagiello.luk...@gmail.com mailto:jagiello.luk...@gmail.com wrote: Hi, rtt min/avg/max/mdev = 0.070/0.177/0.272/0.049 ms 04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) at both hosts and Arista 7050S-64 between. Both hosts were part of active ceph cluster. On Thu, Nov 6, 2014 at 5:18 AM, Wido den Hollander w...@42on.com mailto:w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V.
Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull
I agree. This was just our brute-force method on our test cluster. We won't do this on production cluster. --Jiten On Nov 11, 2014, at 2:11 PM, cwseys cws...@physics.wisc.edu wrote: 0.5 might be too much. All the PGs squeezed off of one OSD will need to be stored on another. The fewer you move the less likely a different OSD will become toofull. Better to adjust in small increments as Craig suggested. Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Deep scrub, cache pools, replica 1
On Tue, 11 Nov 2014 10:21:49 -0800 Gregory Farnum wrote: On Mon, Nov 10, 2014 at 10:58 PM, Christian Balzer ch...@gol.com wrote: Hello, One of my clusters has become busy enough (I'm looking at you, evil Window VMs that I shall banish elsewhere soon) to experience client noticeable performance impacts during deep scrub. Before this I instructed all OSDs to deep scrub in parallel at Saturday night and that finished before Sunday morning. So for now I'll fire them off one by one to reduce the load. Looking forward, that cluster doesn't need more space so instead of adding more hosts and OSDs I was thinking of a cache pool instead. I suppose that will keep the clients happy while the slow pool gets scrubbed. Is there anybody who tested cache pools with Firefly and compared the performance to Giant? For testing I'm currently playing with a single storage node and 8 SSD backed OSDs. Now what very much blew my mind is that a pool with a replication of 1 still does quite the impressive read orgy, clearly reading all the data in the PGs. Why? And what is it comparing that data with, the cosmic background radiation? Yeah, cache pools currently do full-object promotions whenever an object is accessed. There are some ideas and projects to improve this or reduce its effects, but they're mostly just getting started. Thanks for confirming that, so probably not much better than Firefly _aside_ from the fact that SSD pools should be quite a bit faster in and by themselves in Giant. Guess there is no other way to find out than to test things, I have a feeling that determining the hot working set otherwise will be rather difficult. At least, I assume that's what you mean by a read orgy; perhaps you are seeing something else entirely? Indeed I did, this was just an observation that any pool with a replica of 1 will still read ALL the data during a deep-scrub. What good would that do? Also, even on cache pools you don't really want to run with 1x replication as they hold the only copy of whatever data is dirty... Oh, I agree, this is for testing only. Also a replica of 1 doesn't have to mean that the data is unsafe (the OSDs could be RAIDed). But even though, in production the loss of a single node shouldn't impact things. And once you go there, a replica of 2 comes naturally. Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Federated gateways
I see you're running 0.80.5. Are you using Apache 2.4? There is a known issue with Apache 2.4 on the primary and replication. It's fixed, just waiting for the next firefly release. Although, that causes 40x errors with Apache 2.4, not 500 errors. It is apache 2.4, but I’m actually running 0.80.7 so I probably have that bug fix? No, the unreleased 0.80.8 has the fix. Have you verified that both system users can read and write to both clusters? (Just make sure you clean up the writes to the slave cluster). Yes I can write everywhere and radosgw-agent isn’t getting any 403s like it was earlier when I had mismatched keys. The .us-nh.rgw.buckets.index pool is syncing properly, as are the users. It seems like really the only thing that isn’t syncing is the .zone.rgw.buckets pool. That's pretty much the same behavior I was seeing with Apache 2.4. Try downgrading the primary cluster to Apache 2.2. In my testing, the secondary cluster could run 2.2 or 2.4. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] v0.88 released
This is the first development release after Giant. The two main features merged this round are the new AsyncMessenger (an alternative implementation of the network layer) from Haomai Wang at UnitedStack, and support for POSIX file locks in ceph-fuse and libcephfs from Yan, Zheng. There is also a big pile of smaller items that re merged while we were stabilizing Giant, including a range of smaller performance and bug fixes and some new tracepoints for LTTNG. Notable Changes --- * ceph-disk: Scientific Linux support (Dan van der Ster) * ceph-disk: respect --statedir for keyring (Loic Dachary) * ceph-fuse, libcephfs: POSIX file lock support (Yan, Zheng) * ceph-fuse, libcephfs: fix cap flush overflow (Greg Farnum, Yan, Zheng) * ceph-fuse, libcephfs: fix root inode xattrs (Yan, Zheng) * ceph-fuse, libcephfs: preserve dir ordering (#9178 Yan, Zheng) * ceph-fuse, libcephfs: trim inodes before reconnecting to MDS (Yan, Zheng) * ceph: do not parse injectargs twice (Loic Dachary) * ceph: make 'ceph -s' output more readable (Sage Weil) * ceph: new 'ceph tell mds.$name_or_rank_or_gid' (John Spray) * ceph: test robustness (Joao Eduardo Luis) * ceph_objectstore_tool: behave with sharded flag (#9661 David Zafman) * cephfs-journal-tool: fix journal import (#10025 John Spray) * cephfs-journal-tool: skip up to expire_pos (#9977 John Spray) * cleanup rados.h definitions with macros (Ilya Dryomov) * common: shared_cache unit tests (Cheng Cheng) * config: add $cctid meta variable (Adam Crume) * crush: fix buffer overrun for poorly formed rules (#9492 Johnu George) * crush: improve constness (Loic Dachary) * crushtool: add --location id command (Sage Weil, Loic Dachary) * default to libnss instead of crypto++ (Federico Gimenez) * doc: ceph osd reweight vs crush weight (Laurent Guerby) * doc: document the LRC per-layer plugin configuration (Yuan Zhou) * doc: erasure code doc updates (Loic Dachary) * doc: misc updates (Alfredo Deza, VRan Liu) * doc: preflight doc fixes (John Wilkins) * doc: update PG count guide (Gerben Meijer, Laurent Guerby, Loic Dachary) * keyvaluestore: misc fixes (Haomai Wang) * keyvaluestore: performance improvements (Haomai Wang) * librados: add rados_pool_get_base_tier() call (Adam Crume) * librados: cap buffer length (Loic Dachary) * librados: fix objecter races (#9617 Josh Durgin) * libradosstriper: misc fixes (Sebastien Ponce) * librbd: add missing python docstrings (Jason Dillaman) * librbd: add readahead (Adam Crume) * librbd: fix cache tiers in list_children and snap_unprotect (Adam Crume) * librbd: fix performance regression in ObjectCacher (#9513 Adam Crume) * librbd: lttng tracepoints (Adam Crume) * librbd: misc fixes (Xinxin Shu, Jason Dillaman) * mds: fix sessionmap lifecycle bugs (Yan, Zheng) * mds: initialize root inode xattr version (Yan, Zheng) * mds: introduce auth caps (John Spray) * mds: misc bugs (Greg Farnum, John Spray, Yan, Zheng, Henry Change) * misc coverity fixes (Danny Al-Gaaf) * mon: add 'ceph osd rename-bucket ...' command (Loic Dachary) * mon: clean up auth list output (Loic Dachary) * mon: fix 'osd crush link' id resolution (John Spray) * mon: fix misc error paths (Joao Eduardo Luis) * mon: fix paxos off-by-one corner case (#9301 Sage Weil) * mon: new 'ceph pool ls [detail]' command (Sage Weil) * mon: wait for writeable before cross-proposing (#9794 Joao Eduardo Luis) * msgr: avoid useless new/delete (Haomai Wang) * msgr: fix delay injection bug (#9910 Sage Weil, Greg Farnum) * msgr: new AsymcMessenger alternative implementation (Haomai Wang) * msgr: prefetch data when doing recv (Yehuda Sadeh) * osd: add erasure code corpus (Loic Dachary) * osd: add misc tests (Loic Dachary, Danny Al-Gaaf) * osd: cleanup boost optionals (William Kennington) * osd: expose non-journal backends via ceph-osd CLI (Hoamai Wang) * osd: fix JSON output for stray OSDs (Loic Dachary) * osd: fix ioprio options (Loic Dachary) * osd: fix transaction accounting (Jianpeng Ma) * osd: misc optimizations (Xinxin Shu, Zhiqiang Wang, Xinze Chi) * osd: use FIEMAP_FLAGS_SYNC instead of fsync (Jianpeng Ma) * rados: fix put of /dev/null (Loic Dachary) * rados: parse command-line arguments more strictly (#8983 Adam Crume) * rbd-fuse: fix memory leak (Adam Crume) * rbd-replay-many (Adam Crume) * rbd-replay: --anonymize flag to rbd-replay-prep (Adam Crume) * rbd: fix 'rbd diff' for non-existent objects (Adam Crume) * rbd: fix error when striping with format 1 (Sebastien Han) * rbd: fix export for image sizes over 2GB (Vicente Cheng) * rbd: use rolling average for rbd bench-write throughput (Jason Dillaman) * rgw: send explicit HTTP status string (Yehuda Sadeh) * rgw: set length for keystone token validation request (#7796 Yehuda Sadeh, Mark Kirkwood) * udev: fix rules for CentOS7/RHEL7 (Loic Dachary) * use clock_gettime instead of gettimeofday (Jianpeng Ma) * vstart.sh: set up environment for s3-tests (Luis Pabon) Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at
[ceph-users] Log reading/how do I tell what an OSD is trying to connect to
I'm having a problem with my cluster. It's running 0.87 right now, but I saw the same behavior with 0.80.5 and 0.80.7. The problem is that my logs are filling up with replacing existing (lossy) channel log lines (see below), to the point where I'm filling drives to 100% almost daily just with logs. It doesn't appear to be network related, because it happens even when talking to other OSDs on the same host. The logs pretty much all point to port 0 on the remote end. Is this an indicator that it's failing to resolve port numbers somehow, or is this normal at this point in connection setup? The systems that are causing this problem are somewhat unusual; they're running OSDs in Docker containers, but they *should* be configured to run as root and have full access to the host's network stack. They manage to work, mostly, but things are still really flaky. Also, is there documentation on what the various fields mean, short of digging through the source? And how does Ceph resolve OSD numbers into host/port addresses? 2014-11-12 01:50:40.802604 7f7828db8700 0 -- 10.2.0.36:6819/1 10.2.0.36:0/1 pipe(0x1ce31c80 sd=135 :6819 s=0 pgs=0 cs=0 l=1 c=0x1e070580).accept replacing existing (lossy) channel (new one lossy=1) 2014-11-12 01:50:40.802708 7f7816538700 0 -- 10.2.0.36:6830/1 10.2.0.36:0/1 pipe(0x1ff61080 sd=120 :6830 s=0 pgs=0 cs=0 l=1 c=0x1f3db2e0).accept replacing existing (lossy) channel (new one lossy=1) 2014-11-12 01:50:40.803346 7f781ba8d700 0 -- 10.2.0.36:6819/1 10.2.0.36:0/1 pipe(0x1ce31180 sd=125 :6819 s=0 pgs=0 cs=0 l=1 c=0x1e070420).accept replacing existing (lossy) channel (new one lossy=1) 2014-11-12 01:50:40.803944 7f781996c700 0 -- 10.2.0.36:6830/1 10.2.0.36:0/1 pipe(0x1ff618c0 sd=107 :6830 s=0 pgs=0 cs=0 l=1 c=0x1f3d8420).accept replacing existing (lossy) channel (new one lossy=1) 2014-11-12 01:50:40.804185 7f7816538700 0 -- 10.2.0.36:6819/1 10.2.0.36:0/1 pipe(0x1ffd1e40 sd=20 :6819 s=0 pgs=0 cs=0 l=1 c=0x1e070840).accept replacing existing (lossy) channel (new one lossy=1) 2014-11-12 01:50:40.805235 7f7813407700 0 -- 10.2.0.36:6819/1 10.2.0.36:0/1 pipe(0x1ffd1340 sd=60 :6819 s=0 pgs=0 cs=0 l=1 c=0x1b2d6260).accept replacing existing (lossy) channel (new one lossy=1) 2014-11-12 01:50:40.806364 7f781bc8f700 0 -- 10.2.0.36:6819/1 10.2.0.36:0/1 pipe(0x1ffd0b00 sd=162 :6819 s=0 pgs=0 cs=0 l=1 c=0x675c580).accept replacing existing (lossy) channel (new one lossy=1) 2014-11-12 01:50:40.806425 7f781aa7d700 0 -- 10.2.0.36:6830/1 10.2.0.36:0/1 pipe(0x1db29600 sd=143 :6830 s=0 pgs=0 cs=0 l=1 c=0x1f3d9600).accept replacing existing (lossy) channel (new one lossy=1) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Triggering shallow scrub on OSD where scrub is already in progress
Hi Greg, I am using 0.86 refering to osd logs to check scrub behaviour.. Please have look at log snippet from osd log ##Triggered scrub on osd.10--- 2014-11-12 16:24:21.393135 7f5026f31700 0 log_channel(default) log [INF] : 0.4 scrub ok 2014-11-12 16:24:24.393586 7f5026f31700 0 log_channel(default) log [INF] : 0.20 scrub ok 2014-11-12 16:24:30.393989 7f5026f31700 0 log_channel(default) log [INF] : 0.21 scrub ok 2014-11-12 16:24:33.394764 7f5026f31700 0 log_channel(default) log [INF] : 0.23 scrub ok 2014-11-12 16:24:34.395293 7f5026f31700 0 log_channel(default) log [INF] : 0.36 scrub ok 2014-11-12 16:24:35.941704 7f5026f31700 0 log_channel(default) log [INF] : 1.1 scrub ok 2014-11-12 16:24:39.533780 7f5026f31700 0 log_channel(default) log [INF] : 1.d scrub ok 2014-11-12 16:24:41.811185 7f5026f31700 0 log_channel(default) log [INF] : 1.44 scrub ok 2014-11-12 16:24:54.257384 7f5026f31700 0 log_channel(default) log [INF] : 1.5b scrub ok 2014-11-12 16:25:02.973101 7f5026f31700 0 log_channel(default) log [INF] : 1.67 scrub ok 2014-11-12 16:25:17.597546 7f5026f31700 0 log_channel(default) log [INF] : 1.6b scrub ok ##Previous scrub is still in progress, triggered scrub on osd.10 again-- CEPH re-started scrub operation 20104-11-12 16:25:19.394029 7f5026f31700 0 log_channel(default) log [INF] : 0.4 scrub ok 2014-11-12 16:25:22.402630 7f5026f31700 0 log_channel(default) log [INF] : 0.20 scrub ok 2014-11-12 16:25:24.695565 7f5026f31700 0 log_channel(default) log [INF] : 0.21 scrub ok 2014-11-12 16:25:25.408821 7f5026f31700 0 log_channel(default) log [INF] : 0.23 scrub ok 2014-11-12 16:25:29.467527 7f5026f31700 0 log_channel(default) log [INF] : 0.36 scrub ok 2014-11-12 16:25:32.558838 7f5026f31700 0 log_channel(default) log [INF] : 1.1 scrub ok 2014-11-12 16:25:35.763056 7f5026f31700 0 log_channel(default) log [INF] : 1.d scrub ok 2014-11-12 16:25:38.166853 7f5026f31700 0 log_channel(default) log [INF] : 1.44 scrub ok 2014-11-12 16:25:40.602758 7f5026f31700 0 log_channel(default) log [INF] : 1.5b scrub ok 2014-11-12 16:25:42.169788 7f5026f31700 0 log_channel(default) log [INF] : 1.67 scrub ok 2014-11-12 16:25:45.851419 7f5026f31700 0 log_channel(default) log [INF] : 1.6b scrub ok 2014-11-12 16:25:51.259453 7f5026f31700 0 log_channel(default) log [INF] : 1.a8 scrub ok 2014-11-12 16:25:53.012220 7f5026f31700 0 log_channel(default) log [INF] : 1.a9 scrub ok 2014-11-12 16:25:54.009265 7f5026f31700 0 log_channel(default) log [INF] : 1.cb scrub ok 2014-11-12 16:25:56.516569 7f5026f31700 0 log_channel(default) log [INF] : 1.e2 scrub ok -Thanks regards, Mallikarjun Biradar On Tue, Nov 11, 2014 at 12:18 PM, Gregory Farnum g...@gregs42.com wrote: On Sun, Nov 9, 2014 at 9:29 PM, Mallikarjun Biradar mallikarjuna.bira...@gmail.com wrote: Hi all, Triggering shallow scrub on OSD where scrub is already in progress, restarts scrub from beginning on that OSD. Steps: Triggered shallow scrub on an OSD (Cluster is running heavy IO) While scrub is in progress, triggered shallow scrub again on that OSD. Observed behavior, is scrub restarted from beginning on that OSD. Please let me know, whether its expected behaviour? What version of Ceph are you seeing this on? How are you identifying that scrub is restarting from the beginning? It sounds sort of familiar to me, but I thought this was fixed so it was a no-op if you issue another scrub. (That's not authoritative though; I might just be missing a reason we want to restart it.) -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rados mkpool fails, but not ceph osd pool create
Hi all, I'm facing a problem on a ceph deployment. rados mkpool always fails: # rados -n client.admin mkpool test error creating pool test: (2) No such file or directory rados lspool and rmpool commands work just fine, and the following also works: # ceph osd pool create test 128 128 pool 'test' created I've enabled rados debug but it really didn't help much. Should I look at mons or osds debug logs? Any idea about what could be happening? Thanks, Gauvain Pocentek Objectif Libre - Infrastructure et Formations Linux http://www.objectif-libre.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com