Re: [ceph-users] Stackforge Puppet Module
Hi David, Yes its clones to the ceph folder. Its only that module which seems to complain, which is a bit odd. I might try and pop onto IRC at some point. Many Thanks, Nick -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of David Moreau Simard Sent: 12 November 2014 14:25 To: Nick Fisk Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Stackforge Puppet Module What comes to mind is that you need to make sure that you've cloned the git repository to /etc/puppet/modules/ceph and not /etc/puppet/modules/puppet-ceph. Feel free to hop on IRC to discuss about puppet-ceph on freenode in #puppet-openstack. You can find me there as dmsimard. -- David Moreau Simard On Nov 12, 2014, at 8:58 AM, Nick Fisk n...@fisk.me.uk wrote: Hi David, Many thanks for your reply. I must admit I have only just started looking at puppet, but a lot of what you said makes sense to me and understand the reason for not having the module auto discover disks. I'm currently having a problem with the ceph::repo class when trying to push this out to a test server:- Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class ceph::repo for ceph-puppet-test on node ceph-puppet-test Warning: Not using cache on failed catalog Error: Could not retrieve catalog; skipping run I'm a bit stuck but will hopefully work out why it's not working soon and then I can attempt your idea of using a script to dynamically pass disks to the puppet module. Thanks, Nick -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of David Moreau Simard Sent: 11 November 2014 12:05 To: Nick Fisk Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Stackforge Puppet Module Hi Nick, The great thing about puppet-ceph's implementation on Stackforge is that it is both unit and integration tested. You can see the integration tests here: https://github.com/ceph/puppet-ceph/tree/master/spec/system Where I'm getting at is that the tests allow you to see how you can use the module to a certain extent. For example, in the OSD integration tests: - https://github.com/ceph/puppet-ceph/blob/master/spec/system/ceph_osd_s pec.rb #L24 and then: - https://github.com/ceph/puppet-ceph/blob/master/spec/system/ceph_osd_s pec.rb #L82-L110 There's no auto discovery mechanism built-in the module right now. It's kind of dangerous, you don't want to format the wrong disks. Now, this doesn't mean you can't discover the disks yourself and pass them to the module from your site.pp or from a composition layer. Here's something I have for my CI environment that uses the $::blockdevices fact to discover all devices, split that fact into a list of the devices and then reject the drives I don't want (such as the OS disk): # Assume OS is installed on xvda/sda/vda. # On an Openstack VM, vdb is ephemeral, we don't want to use vdc. # WARNING: ALL OTHER DISKS WILL BE FORMATTED/PARTITIONED BY CEPH! $block_devices = reject(split($::blockdevices, ','), '(xvda|sda|vda|vdc|sr0)') $devices = prefix($block_devices, '/dev/') And then you can pass $devices to the module. Let me know if you have any questions ! -- David Moreau Simard On Nov 11, 2014, at 6:23 AM, Nick Fisk n...@fisk.me.uk wrote: Hi, I'm just looking through the different methods of deploying Ceph and I particularly liked the idea that the stackforge puppet module advertises of using discover to automatically add new disks. I understand the principle of how it should work; using ceph-disk list to find unknown disks, but I would like to see in a little more detail on how it's been implemented. I've been looking through the puppet module on Github, but I can't see anyway where this discovery is carried out. Could anyone confirm if this puppet modules does currently support the auto discovery and where in the code its carried out? Many Thanks, Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
On 12-11-14 21:12, Udo Lembke wrote: Hi Wido, On 12.11.2014 12:55, Wido den Hollander wrote: (back to list) Indeed, there must be something! But I can't figure it out yet. Same controllers, tried the same OS, direct cables, but the latency is 40% higher. perhaps something with pci-e order / interupts? have you checked the bios settings or use another pcie-slot? That's indeed a good suggestion. I haven't tried, but that is something I should try. Will take me a while to get that tested, but I will give it a try. Udo -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Reusing old journal block device w/ data causes FAILED assert(0)
When I create a new OSD with a block device as journal that has existing data on it, ceph is causing FAILED assert. The block device iss a journal from a previous experiment. It can safely be overwritten. If I zero the block device with dd if=/dev/zero bs=512 count=1000 of=MyJournalDev then the assert doesn't happen. Is there a way to tell mkfs to ignore data on the journal device and just go ahead and clobber it ? 2014-11-13 21:22:26.463359 7f8383486880 -1 journal Unable to read past sequence 2 but header indicates the journal has committed up through 5202, journal is cor rupt os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist, uint64_t, bool*)' thread 7f8383486880 time 2014-11-13 21:22:26.463363 os/FileJournal.cc: 1693: FAILED assert(0) ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xb7ac55] 2: (FileJournal::read_entry(ceph::buffer::list, unsigned long, bool*)+0xb04) [0xa339a4] 3: (JournalingObjectStore::journal_replay(unsigned long)+0x237) [0x910787] 4: (FileStore::mount()+0x3f8b) [0x8e482b] 5: (OSD::mkfs(CephContext*, ObjectStore*, std::string const, uuid_d, int)+0xf0) [0x65d940] 6: (main()+0xbf6) [0x620d76] 7: (__libc_start_main()+0xf5) [0x7f8380823af5] 8: ceph-osd() [0x63a969] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Reusing old journal block device w/ data causes FAILED assert(0)
Hi, Did you mkjournal the reused journal? ceph-osd -i $ID --mkjournal Cheers, Dan On Thu Nov 13 2014 at 2:34:51 PM Anthony Alba ascanio.al...@gmail.com wrote: When I create a new OSD with a block device as journal that has existing data on it, ceph is causing FAILED assert. The block device iss a journal from a previous experiment. It can safely be overwritten. If I zero the block device with dd if=/dev/zero bs=512 count=1000 of=MyJournalDev then the assert doesn't happen. Is there a way to tell mkfs to ignore data on the journal device and just go ahead and clobber it ? 2014-11-13 21:22:26.463359 7f8383486880 -1 journal Unable to read past sequence 2 but header indicates the journal has committed up through 5202, journal is cor rupt os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist, uint64_t, bool*)' thread 7f8383486880 time 2014-11-13 21:22:26.463363 os/FileJournal.cc: 1693: FAILED assert(0) ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xb7ac55] 2: (FileJournal::read_entry(ceph::buffer::list, unsigned long, bool*)+0xb04) [0xa339a4] 3: (JournalingObjectStore::journal_replay(unsigned long)+0x237) [0x910787] 4: (FileStore::mount()+0x3f8b) [0x8e482b] 5: (OSD::mkfs(CephContext*, ObjectStore*, std::string const, uuid_d, int)+0xf0) [0x65d940] 6: (main()+0xbf6) [0x620d76] 7: (__libc_start_main()+0xf5) [0x7f8380823af5] 8: ceph-osd() [0x63a969] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Reusing old journal block device w/ data causes FAILED assert(0)
Ah no. On 13 Nov 2014 21:49, Dan van der Ster daniel.vanders...@cern.ch wrote: Hi, Did you mkjournal the reused journal? ceph-osd -i $ID --mkjournal Cheers, Dan No - however the man page states that --mkjournal is for : Create a new journal file to match an existing object repository. This is useful if the journal device or file is wiped out due to a disk or file system failure. I thought mkfs would create a new OSD and new journal in one shot (the journal device is specified in ceph.conf). In otherwords I do not have an existing object repository.. My steps: ceph.conf: osd journal = /dev/sdb1 # This was used in a previous experiment so has garbage on it # /dev/sdc1 is mounted on /var/lib/ceph/osd/ceph-0 ceph-osd -i 0 --mkfs --mkkey --osd-uuid 123456 At this point it crashes with the FAILED assert. Do you mean I should run ceph-osd -i $ID --mkjournal before the mkfs? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Reusing old journal block device w/ data causes FAILED assert(0)
Hi, On Thu Nov 13 2014 at 3:35:55 PM Anthony Alba ascanio.al...@gmail.com wrote: Ah no. On 13 Nov 2014 21:49, Dan van der Ster daniel.vanders...@cern.ch wrote: Hi, Did you mkjournal the reused journal? ceph-osd -i $ID --mkjournal Cheers, Dan No - however the man page states that --mkjournal is for : Create a new journal file to match an existing object repository. This is useful if the journal device or file is wiped out due to a disk or file system failure. I thought mkfs would create a new OSD and new journal in one shot (the journal device is specified in ceph.conf). In otherwords I do not have an existing object repository.. My steps: ceph.conf: osd journal = /dev/sdb1 # This was used in a previous experiment so has garbage on it # /dev/sdc1 is mounted on /var/lib/ceph/osd/ceph-0 ceph-osd -i 0 --mkfs --mkkey --osd-uuid 123456 At this point it crashes with the FAILED assert. Do you mean I should run ceph-osd -i $ID --mkjournal before the mkfs? I believe that if you now run ceph-osd -i 0 --mkjournal it will setup /dev/sdb1 correctly to be used as the journal. I'm not sure if mkfs is supposed to do this. (BTW, using these commands manually is sort of deprecated now anyway -- you can read through ceph-disk to see how to use them correctly). Cheers, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Negative number of objects degraded for extended period of time
Hi, The Ceph cluster we are running have few OSDs approaching to 95% 1+ weeks ago so I ran a reweight to balance it out, in the meantime, instructing application to purge data not required. But after large amount of data purge issued from application side(all OSDs' usage dropped below 20%), the cluster fall into this weird state for days, the objects degraded remain negative for more than 7 days, I'm seeing some IOs going on on OSDs consistently, but the number(negative) objects degraded does not change much: 2014-11-13 10:43:07.237292 mon.0 [INF] pgmap v5935301: 44816 pgs: 44713 active+clean, 1 active+backfilling, 20 active+remapped+wait_backfill, 27 active+remapped+wait_backfill+backfill_toofull, 11 active+recovery_wait, 33 active+remapped+backfilling, 11 active+wait_backfill+backfill_toofull; 1473 GB data, 2985 GB used, 17123 GB / 20109 GB avail; 30172 kB/s wr, 58 op/s; -13582/1468299 objects degraded (-0.925%) 2014-11-13 10:43:08.248232 mon.0 [INF] pgmap v5935302: 44816 pgs: 44713 active+clean, 1 active+backfilling, 20 active+remapped+wait_backfill, 27 active+remapped+wait_backfill+backfill_toofull, 11 active+recovery_wait, 33 active+remapped+backfilling, 11 active+wait_backfill+backfill_toofull; 1473 GB data, 2985 GB used, 17123 GB / 20109 GB avail; 26459 kB/s wr, 51 op/s; -13582/1468303 objects degraded (-0.925%) Any idea what might be happening here? It seems active+remapped+wait_backfill+backfill_toofull stuck? osdmap e43029: 36 osds: 36 up, 36 in pgmap v5935658: 44816 pgs, 32 pools, 1488 GB data, 714 kobjects 3017 GB used, 17092 GB / 20109 GB avail -13438/1475773 objects degraded (-0.911%) 44713 active+clean 1 active+backfilling 20 active+remapped+wait_backfill 27 active+remapped+wait_backfill+backfill_toofull 11 active+recovery_wait 33 active+remapped+backfilling 11 active+wait_backfill+backfill_toofull client io 478 B/s rd, 40170 kB/s wr, 80 op/s The cluster is running on v0.72.2, we are planning to upgrade cluster to firefly, but I would like to get the cluster state clean first before the upgrade. Thanks, Fred ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CephFS, file layouts pools and rados df
Hi Ceph users, I need to have different filesystem trees in different pools, mainly for security reasons. So I have ceph users (cephx) with specific access on specific pools. I have one metadata pool ('metadata') and tree data pools ('data', 'wimi-files, 'wimi-recette-files'). I used file layouts ( http://ceph.com/docs/master/cephfs/file-layouts/ ) to associate directories with pools. My filesystem looks like that : Path - associated pool / - data /prod - wimi-files /prod/... - wimi-files /recette - wimi-recette-files /recette/... - wimi-recette-files Is it the best way to achieve what I need, since it's not possible to have multiple CephFS on a Ceph cluster ? I ask this because my 'rados df' seems strange to me : pool name category KB objects clones degraded unfound rdrd KB wrwr KB data- 0 90454990 0 0 434686 434686 92940040 metadata- 58591526810 0 0 2168219 2403048804 16461385180433628 wimi-files - 9006435331 101692140 0 0 296284 2747513 19225407 9064999231 wimi-recette-files -1036224 309167 00 0 345223 1401472 658388 1170762 total used 27404544372 19576561 total avail78033398196 total space 105437942568 As you can see, there are 9045499 objects in 'data' pool, while there are only two directories ( 'prod', 'recette' ), and not a single file in this pool. Anyone know how this works ? Thanks in advance ! Regards -- Thomas Lemarchand Cloud Solutions SAS - Responsable des systèmes d'information -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Poor RBD performance as LIO iSCSI target
Running into weird issues here as well in a test environment. I don't have a solution either but perhaps we can find some things in common.. Setup in a nutshell: - Ceph cluster: Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 (OSDs with separate public/cluster network in 10 Gbps) - iSCSI Proxy node (targetcli/LIO): Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 (10 Gbps) - Client node: Ubuntu 12.04, Kernel 3.11 (10 Gbps) Relevant cluster config: Writeback cache tiering with NVME PCI-E cards (2 replica) in front of a erasure coded pool (k=3,m=2) backed by spindles. I'm following the instructions here: http://www.hastexo.com/resources/hints-and-kinks/turning-ceph-rbd-images-san-storage-devices No issues with creating and mapping a 100GB RBD image and then creating the target. I'm interested in finding out the overhead/performance impact of re-exporting through iSCSI so the idea is to run benchmarks. Here's a fio test I'm trying to run on the client node on the mounted iscsi device: fio --name=writefile --size=100G --filesize=100G --filename=/dev/sdu --bs=1M --nrfiles=1 --direct=1 --sync=0 --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 --iodepth=200 --ioengine=libaio The benchmark will eventually hang towards the end of the test for some long seconds before completing. On the proxy node, the kernel complains with iscsi portal login timeout: http://pastebin.com/Q49UnTPr and I also see irqbalance errors in syslog: http://pastebin.com/AiRTWDwR Doing the same test on the machines directly (raw, rbd, on the osd filesystem) doesn't yield any issues. I've tried a couple things to see if I could get things to work... - Set irqbalance --hintpolicy=ignore (http://sourceforge.net/p/e1000/bugs/394/ https://bugs.launchpad.net/ubuntu/+source/irqbalance/+bug/1321425) - Changed size on cache pool to 1 (for the sake of testing, improved performance but still hangs) - Set crush tunables to legacy (and back to optimal) - Various package and kernel versions and putting the proxy node on Ubuntu precise - Formatting and mounting the iscsi block device and running the test on the formatted filesystem I don't think it's related .. but I don't remember running into issues before I've swapped out SSDs for the NVME cards for the cache pool. I don't have time *right now* but I definitely want to test if I am able to reproduce the issue on the SSDs.. Let me know if this gives you any ideas, I'm all ears. -- David Moreau Simard On Oct 28, 2014, at 4:07 PM, Christopher Spearman neromaver...@gmail.com wrote: Sage: That'd be my assumption, performance looked pretty fantastic over loop until it started being used it heavily Mike: The configs you asked for are at the end of this message I've subtracted changed some info, iqn/wwn/portal, for security purposes. The raw loop target configs are all in one since I'm running both types of configs currently. I also included the running config (ls /) of targetcli for anyone interested in what it looks like from the console. The tool I used was dd, I ran through various options using dd but didn't really see much difference. The one on top is my go to command for my first test time dd if=/dev/zero of=test bs=32M count=32 oflag=direct,sync time dd if=/dev/zero of=test bs=32M count=128 oflag=direct,sync time dd if=/dev/zero of=test bs=8M count=512 oflag=direct,sync time dd if=/dev/zero of=test bs=4M count=1024 oflag=direct,sync ---ls / from current targetcli (no mounted ext4 - image file config)--- /iscsi ls / o- / . [...] o- backstores .. [...] | o- block .. [Storage Objects: 2] | | o- ceph_lun0 .. [/dev/loop0 (2.0TiB) write-thru activated] | | o- ceph_noloop00 .. [/dev/rbd/vmiscsi/noloop00 (1.0TiB) write-thru activated] | o- fileio . [Storage Objects: 0] | o- pscsi .. [Storage Objects: 0] | o- ramdisk [Storage Objects: 0] o- iscsi [Targets: 2] | o- iqn.gateway2_01 . [TPGs: 1] | | o- tpg1 ...
Re: [ceph-users] CephFS, file layouts pools and rados df
Hi Sage, Thank you for your answer. So, there is no anticipated problem with how I did ? Does the 'data' pool performance affects directly my filesystem performance, even if there is no file on it ? Do I need to have the same performance policy on 'data' pools than on the other pools ? Can I use the fact that my base data pool 'data' is different than my /real/ data pools to improve filesystem performance (something like putting the 'data' pool on SSDs) ? Regards -- Thomas Lemarchand Cloud Solutions SAS - Responsable des systèmes d'information On jeu., 2014-11-13 at 08:33 -0800, Sage Weil wrote: On Thu, 13 Nov 2014, Thomas Lemarchand wrote: Hi Ceph users, I need to have different filesystem trees in different pools, mainly for security reasons. So I have ceph users (cephx) with specific access on specific pools. I have one metadata pool ('metadata') and tree data pools ('data', 'wimi-files, 'wimi-recette-files'). I used file layouts ( http://ceph.com/docs/master/cephfs/file-layouts/ ) to associate directories with pools. My filesystem looks like that : Path - associated pool / - data /prod - wimi-files /prod/... - wimi-files /recette - wimi-recette-files /recette/... - wimi-recette-files Is it the best way to achieve what I need, since it's not possible to have multiple CephFS on a Ceph cluster ? I ask this because my 'rados df' seems strange to me : pool name category KB objects clones degraded unfound rdrd KB wrwr KB data- 0 90454990 0 0 434686 434686 92940040 metadata- 58591526810 0 0 2168219 2403048804 16461385180433628 wimi-files - 9006435331 101692140 0 0 296284 2747513 19225407 9064999231 wimi-recette-files -1036224 309167 00 0 345223 1401472 658388 1170762 total used 27404544372 19576561 total avail78033398196 total space 105437942568 As you can see, there are 9045499 objects in 'data' pool, while there are only two directories ( 'prod', 'recette' ), and not a single file in this pool. Anyone know how this works ? The MDS puts backtrace objects in the base data pool in order to facilitate fsck and lookup by ino even when the data is stored elsewhere. Other strategies that don't do this are possible, but they're more complicated, and we opted to keep it as simple as possible for now. sage Thanks in advance ! Regards -- Thomas Lemarchand Cloud Solutions SAS - Responsable des syst?mes d'information -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] mds continuously crashing on Firefly
Hi Cephers, Over night, our MDS crashed, failing over to the standby which also crashed! Upon trying to restart them this morning, I find that they no longer start and always seem to crash on the same file in the logs. I've pasted part of a ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms 1' below [1]. Can anyone help me interpret this error? Thanks for your time, Lincoln Bryant [1] -7 2014-11-13 10:52:15.064784 7fc49d8ab700 7 mds.0.locker rdlock_start on (ifile sync-mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync-mix) (iversion lock) cr={374559=0-4194304@1} caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900] -6 2014-11-13 10:52:15.064794 7fc49d8ab700 7 mds.0.locker rdlock_start waiting on (ifile sync-mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync-mix) (iversion lock) cr={374559=0-4194304@1} caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900] -5 2014-11-13 10:52:15.064805 7fc49d8ab700 10 mds.0.cache.ino(1000258c3c8) add_waiter tag 4000 0xbf71920 !ambig 1 !frozen 1 !freezing 1 -4 2014-11-13 10:52:15.064808 7fc49d8ab700 15 mds.0.cache.ino(1000258c3c8) taking waiter here -3 2014-11-13 10:52:15.064810 7fc49d8ab700 10 mds.0.locker nudge_log (ifile sync-mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync-mix) (iversion lock) cr={374559=0-4194304@1} caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900] -2 2014-11-13 10:52:15.064827 7fc49d8ab700 1 -- 192.170.227.116:6800/6489 == osd.104 192.170.227.122:6812/1084 911 osd_op_reply(82611 100022a4e3a. [tmapget 0~0] v0'0 uv78780 ondisk = 0) v6 187+0+1410 (1370366691 0 1858920835) 0x298ffd00 con 0x5b606e0 -1 2014-11-13 10:52:15.064843 7fc49d8ab700 10 mds.0.cache.dir(100022a4e3a) _tmap_fetched 1410 bytes for [dir 100022a4e3a /stash/user/daveminh/data/DUD/ampc/AlGDock/dock/DUDE.decoy.CHB-1l2sA.0-0/ [2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741952 f() n() hs=0+0,ss=0+0 | waiter=1 authpin=1 0x3b0a040] want_dn= 0 2014-11-13 10:52:15.066789 7fc49d8ab700 -1 *** Caught signal (Aborted) ** in thread 7fc49d8ab700 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: /usr/bin/ceph-mds() [0x82f741] 2: /lib64/libpthread.so.0() [0x371c40f710] 3: (gsignal()+0x35) [0x371bc32635] 4: (abort()+0x175) [0x371bc33e15] 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x371e0bea5d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Very Basic question
Hi, On 11/13/2014 06:05 PM, Artem Silenkov wrote: Hello! Only 1 monitor instance? It won't work at most cases. Make more and ensure quorum to reach survivalability. No, three monitor instances, one for each ceph-node. As designed into the quick-ceph-deploy. I tried to kill one of them (the initial monitor) to see what happens and happens that. :-( Ciaoo Luca Regards, Silenkov Artem --- artem.silen...@gmail.com mailto:artem.silen...@gmail.com 2014-11-13 20:02 GMT+03:00 Luca Mazzaferro luca.mazzafe...@rzg.mpg.de mailto:luca.mazzafe...@rzg.mpg.de: Dear Users, I followed the instruction of the storage cluster quick start here: http://ceph.com/docs/master/start/quick-ceph-deploy/ I simulate a little storage with 4 VMs ceph-node[1,2,3] and an admin-node. Everything worked fine until I shut down the initial monitor node (ceph-node1). Also with the other monitors on. I restart the ceph-node1 but the ceph command (running from ceph-admin) fails after hanging for 5 minutes. with this exit code: 2014-11-13 17:33:31.711410 7f6a5b1af700 0 monclient(hunting): authenticate timed out after 300 2014-11-13 17:33:31.711522 7f6a5b1af700 0 librados: client.admin authentication error (110) Connection timed out If I go to the ceph-node1 and restart the services: [root@ceph-node1 ~]# service ceph status === mon.ceph-node1 === mon.ceph-node1: running {version:0.80.7} === osd.2 === osd.2: not running. === mds.ceph-node1 === mds.ceph-node1: running {version:0.80.7} [root@ceph-node1 ~]# service ceph status === mon.ceph-node1 === mon.ceph-node1: running {version:0.80.7} === osd.2 === osd.2: not running. === mds.ceph-node1 === mds.ceph-node1: running {version:0.80.7} I don't understand how to properly restart a node. Can anyone help me? Thank you. Cheers. Luca ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Very Basic question
What does ceph -s output when things are working? Does the ceph.conf on your admin node contain the address of each monitor? (Paste is the relevant lines.) it will need to or the ceph tool won't be able to find the monitors even though the system is working. -Greg On Thu, Nov 13, 2014 at 9:11 AM Luca Mazzaferro luca.mazzafe...@rzg.mpg.de wrote: Hi, On 11/13/2014 06:05 PM, Artem Silenkov wrote: Hello! Only 1 monitor instance? It won't work at most cases. Make more and ensure quorum to reach survivalability. No, three monitor instances, one for each ceph-node. As designed into the quick-ceph-deploy. I tried to kill one of them (the initial monitor) to see what happens and happens that. :-( Ciaoo Luca Regards, Silenkov Artem --- artem.silen...@gmail.com 2014-11-13 20:02 GMT+03:00 Luca Mazzaferro luca.mazzafe...@rzg.mpg.de: Dear Users, I followed the instruction of the storage cluster quick start here: http://ceph.com/docs/master/start/quick-ceph-deploy/ I simulate a little storage with 4 VMs ceph-node[1,2,3] and an admin-node. Everything worked fine until I shut down the initial monitor node (ceph-node1). Also with the other monitors on. I restart the ceph-node1 but the ceph command (running from ceph-admin) fails after hanging for 5 minutes. with this exit code: 2014-11-13 17:33:31.711410 7f6a5b1af700 0 monclient(hunting): authenticate timed out after 300 2014-11-13 17:33:31.711522 7f6a5b1af700 0 librados: client.admin authentication error (110) Connection timed out If I go to the ceph-node1 and restart the services: [root@ceph-node1 ~]# service ceph status === mon.ceph-node1 === mon.ceph-node1: running {version:0.80.7} === osd.2 === osd.2: not running. === mds.ceph-node1 === mds.ceph-node1: running {version:0.80.7} [root@ceph-node1 ~]# service ceph status === mon.ceph-node1 === mon.ceph-node1: running {version:0.80.7} === osd.2 === osd.2: not running. === mds.ceph-node1 === mds.ceph-node1: running {version:0.80.7} I don't understand how to properly restart a node. Can anyone help me? Thank you. Cheers. Luca ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds continuously crashing on Firefly
Hi all, Just providing an update to this -- I started the mds daemon on a new server and rebooted a box with a hung CephFS mount (from the first crash) and the problem seems to have gone away. I'm still not sure why the mds was shutting down with a Caught signal, though. Cheers, Lincoln On Nov 13, 2014, at 11:01 AM, Lincoln Bryant wrote: Hi Cephers, Over night, our MDS crashed, failing over to the standby which also crashed! Upon trying to restart them this morning, I find that they no longer start and always seem to crash on the same file in the logs. I've pasted part of a ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms 1' below [1]. Can anyone help me interpret this error? Thanks for your time, Lincoln Bryant [1] -7 2014-11-13 10:52:15.064784 7fc49d8ab700 7 mds.0.locker rdlock_start on (ifile sync-mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync-mix) (iversion lock) cr={374559=0-4194304@1} caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900] -6 2014-11-13 10:52:15.064794 7fc49d8ab700 7 mds.0.locker rdlock_start waiting on (ifile sync-mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync-mix) (iversion lock) cr={374559=0-4194304@1} caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900] -5 2014-11-13 10:52:15.064805 7fc49d8ab700 10 mds.0.cache.ino(1000258c3c8) add_waiter tag 4000 0xbf71920 !ambig 1 !frozen 1 !freezing 1 -4 2014-11-13 10:52:15.064808 7fc49d8ab700 15 mds.0.cache.ino(1000258c3c8) taking waiter here -3 2014-11-13 10:52:15.064810 7fc49d8ab700 10 mds.0.locker nudge_log (ifile sync-mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync-mix) (iversion lock) cr={374559=0-4194304@1} caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900] -2 2014-11-13 10:52:15.064827 7fc49d8ab700 1 -- 192.170.227.116:6800/6489 == osd.104 192.170.227.122:6812/1084 911 osd_op_reply(82611 100022a4e3a. [tmapget 0~0] v0'0 uv78780 ondisk = 0) v6 187+0+1410 (1370366691 0 1858920835) 0x298ffd00 con 0x5b606e0 -1 2014-11-13 10:52:15.064843 7fc49d8ab700 10 mds.0.cache.dir(100022a4e3a) _tmap_fetched 1410 bytes for [dir 100022a4e3a /stash/user/daveminh/data/DUD/ampc/AlGDock/dock/DUDE.decoy.CHB-1l2sA.0-0/ [2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741952 f() n() hs=0+0,ss=0+0 | waiter=1 authpin=1 0x3b0a040] want_dn= 0 2014-11-13 10:52:15.066789 7fc49d8ab700 -1 *** Caught signal (Aborted) ** in thread 7fc49d8ab700 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: /usr/bin/ceph-mds() [0x82f741] 2: /lib64/libpthread.so.0() [0x371c40f710] 3: (gsignal()+0x35) [0x371bc32635] 4: (abort()+0x175) [0x371bc33e15] 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x371e0bea5d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Poor RBD performance as LIO iSCSI target
___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Indeed, there must be something! But I can't figure it out yet. Same controllers, tried the same OS, direct cables, but the latency is 40% higher. Wido, just an educated guess: Did you check the offload settings of your NIC? ethtool -k IFNAME should you provide that. - Stephan -- Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-44 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin signature.asc Description: This is a digitally signed message part ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
any special parameters (or best practice) regarding the offload settings for the NICs? I got two ports: p4p1 (Public net) and p4p2 (Cluster internal), the cluster internal has MTU 9000 across all the OSD servers and of course on the SW ports: ceph@cephosd01:~$ ethtool -k p4p1 Features for p4p1: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on tx-checksum-fcoe-crc: on [fixed] tx-checksum-sctp: on scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp6-segmentation: on udp-fragmentation-offload: off [fixed] generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: on rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off receive-hashing: on highdma: on [fixed] rx-vlan-filter: on vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: on [fixed] tx-gre-segmentation: off [fixed] tx-ipip-segmentation: off [fixed] tx-sit-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] tx-mpls-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off busy-poll: on [fixed] ceph@cephosd01:~$ ethtool -k p4p2 Features for p4p2: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on tx-checksum-fcoe-crc: on [fixed] tx-checksum-sctp: on scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp6-segmentation: on udp-fragmentation-offload: off [fixed] generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: on rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off receive-hashing: on highdma: on [fixed] rx-vlan-filter: on vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: on [fixed] tx-gre-segmentation: off [fixed] tx-ipip-segmentation: off [fixed] tx-sit-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] tx-mpls-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off busy-poll: on [fixed] ceph@cephosd01:~$ German Anders --- Original message --- Asunto: Re: [ceph-users] Typical 10GbE latency De: Stephan Seitz s.se...@heinlein-support.de Para: Wido den Hollander w...@42on.com Cc: ceph-users@lists.ceph.com Fecha: Thursday, 13/11/2014 15:39 Indeed, there must be something! But I can't figure it out yet. Same controllers, tried the same OS, direct cables, but the latency is 40% higher. Wido, just an educated guess: Did you check the offload settings of your NIC? ethtool -k IFNAME should you provide that. - Stephan -- Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-44 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Poor RBD performance as LIO iSCSI target
That's interesting. Although I'm running 3.16.7 and that I'd expect the patch to be in already, I'll downgrade to the working 3.16.0 kernel and report back if this fixes the issue. Thanks for the pointer. -- David Moreau Simard On Nov 13, 2014, at 1:15 PM, German Anders gand...@despegar.com wrote: Is possible that you hit bug #8818 ? German Anders --- Original message --- Asunto: Re: [ceph-users] Poor RBD performance as LIO iSCSI target De: David Moreau Simard dmsim...@iweb.com Para: Christopher Spearman neromaver...@gmail.com Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com Fecha: Thursday, 13/11/2014 13:17 Running into weird issues here as well in a test environment. I don't have a solution either but perhaps we can find some things in common.. Setup in a nutshell: - Ceph cluster: Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 (OSDs with separate public/cluster network in 10 Gbps) - iSCSI Proxy node (targetcli/LIO): Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 (10 Gbps) - Client node: Ubuntu 12.04, Kernel 3.11 (10 Gbps) Relevant cluster config: Writeback cache tiering with NVME PCI-E cards (2 replica) in front of a erasure coded pool (k=3,m=2) backed by spindles. I'm following the instructions here: http://www.hastexo.com/resources/hints-and-kinks/turning-ceph-rbd-images-san-storage-devices No issues with creating and mapping a 100GB RBD image and then creating the target. I'm interested in finding out the overhead/performance impact of re-exporting through iSCSI so the idea is to run benchmarks. Here's a fio test I'm trying to run on the client node on the mounted iscsi device: fio --name=writefile --size=100G --filesize=100G --filename=/dev/sdu --bs=1M --nrfiles=1 --direct=1 --sync=0 --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 --iodepth=200 --ioengine=libaio The benchmark will eventually hang towards the end of the test for some long seconds before completing. On the proxy node, the kernel complains with iscsi portal login timeout: http://pastebin.com/Q49UnTPr and I also see irqbalance errors in syslog: http://pastebin.com/AiRTWDwR Doing the same test on the machines directly (raw, rbd, on the osd filesystem) doesn't yield any issues. I've tried a couple things to see if I could get things to work... - Set irqbalance --hintpolicy=ignore (http://sourceforge.net/p/e1000/bugs/394/ https://bugs.launchpad.net/ubuntu/+source/irqbalance/+bug/1321425) - Changed size on cache pool to 1 (for the sake of testing, improved performance but still hangs) - Set crush tunables to legacy (and back to optimal) - Various package and kernel versions and putting the proxy node on Ubuntu precise - Formatting and mounting the iscsi block device and running the test on the formatted filesystem I don't think it's related .. but I don't remember running into issues before I've swapped out SSDs for the NVME cards for the cache pool. I don't have time *right now* but I definitely want to test if I am able to reproduce the issue on the SSDs.. Let me know if this gives you any ideas, I'm all ears. -- David Moreau Simard On Oct 28, 2014, at 4:07 PM, Christopher Spearman neromaver...@gmail.com wrote: Sage: That'd be my assumption, performance looked pretty fantastic over loop until it started being used it heavily Mike: The configs you asked for are at the end of this message I've subtracted changed some info, iqn/wwn/portal, for security purposes. The raw loop target configs are all in one since I'm running both types of configs currently. I also included the running config (ls /) of targetcli for anyone interested in what it looks like from the console. The tool I used was dd, I ran through various options using dd but didn't really see much difference. The one on top is my go to command for my first test time dd if=/dev/zero of=test bs=32M count=32 oflag=direct,sync time dd if=/dev/zero of=test bs=32M count=128 oflag=direct,sync time dd if=/dev/zero of=test bs=8M count=512 oflag=direct,sync time dd if=/dev/zero of=test bs=4M count=1024 oflag=direct,sync ---ls / from current targetcli (no mounted ext4 - image file config)--- /iscsi ls / o- / . [...] o- backstores .. [...] | o- block .. [Storage Objects: 2] | | o- ceph_lun0 .. [/dev/loop0 (2.0TiB) write-thru activated] | | o- ceph_noloop00 .. [/dev/rbd/vmiscsi/noloop00 (1.0TiB) write-thru activated] | o-
Re: [ceph-users] Poor RBD performance as LIO iSCSI target
___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Poor RBD performance as LIO iSCSI target
On 11/13/2014 10:17 AM, David Moreau Simard wrote: Running into weird issues here as well in a test environment. I don't have a solution either but perhaps we can find some things in common.. Setup in a nutshell: - Ceph cluster: Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 (OSDs with separate public/cluster network in 10 Gbps) - iSCSI Proxy node (targetcli/LIO): Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 (10 Gbps) - Client node: Ubuntu 12.04, Kernel 3.11 (10 Gbps) Relevant cluster config: Writeback cache tiering with NVME PCI-E cards (2 replica) in front of a erasure coded pool (k=3,m=2) backed by spindles. I'm following the instructions here: http://www.hastexo.com/resources/hints-and-kinks/turning-ceph-rbd-images-san-storage-devices No issues with creating and mapping a 100GB RBD image and then creating the target. I'm interested in finding out the overhead/performance impact of re-exporting through iSCSI so the idea is to run benchmarks. Here's a fio test I'm trying to run on the client node on the mounted iscsi device: fio --name=writefile --size=100G --filesize=100G --filename=/dev/sdu --bs=1M --nrfiles=1 --direct=1 --sync=0 --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 --iodepth=200 --ioengine=libaio The benchmark will eventually hang towards the end of the test for some long seconds before completing. On the proxy node, the kernel complains with iscsi portal login timeout: http://pastebin.com/Q49UnTPr and I also see irqbalance errors in syslog: http://pastebin.com/AiRTWDwR You are hitting a different issue. German Anders is most likely correct and you hit the rbd hang. That then caused the iscsi/scsi command to timeout which caused the scsi error handler to run. In your logs we see the LIO error handler has received a task abort from the initiator and that timed out which caused the escalation (iscsi portal login related messages). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-osd mkfs mkkey hangs on ARM
Hi Sage, Here you go: http://paste.openstack.org/show/132936/ Harm Op 13-11-14 om 00:44 schreef Sage Weil: On Wed, 12 Nov 2014, Harm Weites wrote: Hi, When trying to add a new OSD to my cluster the ceph-osd process hangs: # ceph-osd -i $id --mkfs --mkkey nothing At this point I have to explicitly kill -9 the ceph-osd since it doesn't respond to anything. It also didn't adhere to my foreground debug log request; the logs are empty. Stracing the ceph-osd [2] shows its very busy with this: nanosleep({0, 201}, NULL) = 0 gettimeofday({1415741192, 862216}, NULL) = 0 nanosleep({0, 201}, NULL) = 0 gettimeofday({1415741192, 864563}, NULL) = 0 Can you gdb attach to the ceph-osd process while it is in this state and see what 'bt' says? sage I've rebuilt python to undo a threading regression [2], though that's unrelated to this issue. It did fix ceph not returning properly after commands like 'ceph osd tree' though, so it is usefull. This machine is Fedora 21 on ARM with ceph-0.80.7-1.fc21.armv7hl. The mon/mds/osd are all x86, CentOS 7. Could this be a configuration issue on my end or is something just broken on my platform? # lscpu Architecture: armv7l Byte Order:Little Endian CPU(s):2 On-line CPU(s) list: 0,1 Thread(s) per core:1 Core(s) per socket:2 Socket(s): 1 Model name:ARMv7 Processor rev 4 (v7l) [1] http://paste.openstack.org/show/132555/ [2] http://bugs.python.org/issue21963 Regards, Harm ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-osd mkfs mkkey hangs on ARM
This appears to be a buggy libtcmalloc. Ceph hasn't gotten to main() yet from the looks of things.. tcmalloc is still initializing. Hopefully fedora has a newer versin of the package? sage On Thu, 13 Nov 2014, Harm Weites wrote: Hi Sage, Here you go: http://paste.openstack.org/show/132936/ Harm Op 13-11-14 om 00:44 schreef Sage Weil: On Wed, 12 Nov 2014, Harm Weites wrote: Hi, When trying to add a new OSD to my cluster the ceph-osd process hangs: # ceph-osd -i $id --mkfs --mkkey nothing At this point I have to explicitly kill -9 the ceph-osd since it doesn't respond to anything. It also didn't adhere to my foreground debug log request; the logs are empty. Stracing the ceph-osd [2] shows its very busy with this: nanosleep({0, 201}, NULL) = 0 gettimeofday({1415741192, 862216}, NULL) = 0 nanosleep({0, 201}, NULL) = 0 gettimeofday({1415741192, 864563}, NULL) = 0 Can you gdb attach to the ceph-osd process while it is in this state and see what 'bt' says? sage I've rebuilt python to undo a threading regression [2], though that's unrelated to this issue. It did fix ceph not returning properly after commands like 'ceph osd tree' though, so it is usefull. This machine is Fedora 21 on ARM with ceph-0.80.7-1.fc21.armv7hl. The mon/mds/osd are all x86, CentOS 7. Could this be a configuration issue on my end or is something just broken on my platform? # lscpu Architecture: armv7l Byte Order:Little Endian CPU(s):2 On-line CPU(s) list: 0,1 Thread(s) per core:1 Core(s) per socket:2 Socket(s): 1 Model name:ARMv7 Processor rev 4 (v7l) [1] http://paste.openstack.org/show/132555/ [2] http://bugs.python.org/issue21963 Regards, Harm ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Multiple rules in a ruleset: any examples? Which rule wins?
Hi list, When there are multiple rules in a ruleset, is it the case that first one wins? When will a rule faisl, does it fall through to the next rule? Are min_size, max_size the only determinants? Are there any examples? The only examples I've see put one rule per ruleset (e.g. the docs have a ssd/platter example but that shows 1 rule per ruleset) Regards -Anthony ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Solaris 10 VMs extremely slow in KVM on Ceph RBD Devices
Hi Christoph, Am 12.11.2014 17:29, schrieb Christoph Adomeit: Hi, i installed a Ceph Cluster with 50 OSDs on 4 Hosts and finally I am really happy with it. Linux and Windows VMs run really fast in KVM on the Ceph Storage. Only my Solaris 10 guests are terribly slow on ceph rbd storage. A solaris on Ceph Storage needs 15 Minutes to boot. When I move the Solaris Image to the old nexenta nfs storage and start it on the same kvm host it will fly and boot in 1,5 Minutes. I have tested ceph firefly and giant and the Problem is with both ceph versions. The performance problem is not only with booting. The problem continues when the server is up. EVerything is terribly slow. So the only difference here is ceph vs. nexenta nfs storage that causes the big performance problems. The solaris guests have zfs root standard installation. Does anybody have an idea or a hint what might go on here and what I should try to make solaris 10 Guests faster on ceph storage ? zfs uses copy-on-write which i think is the reason why it is so slow on rbd. have you tried ufs? -- Mit freundlichen Grüßen, Florian Wiessner Smart Weblications GmbH Martinsberger Str. 1 D-95119 Naila fon.: +49 9282 9638 200 fax.: +49 9282 9638 205 24/7: +49 900 144 000 00 - 0,99 EUR/Min* http://www.smart-weblications.de -- Sitz der Gesellschaft: Naila Geschäftsführer: Florian Wiessner HRB-Nr.: HRB 3840 Amtsgericht Hof *aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Multiple rules in a ruleset: any examples? Which rule wins?
On Thu, Nov 13, 2014 at 2:58 PM, Anthony Alba ascanio.al...@gmail.com wrote: Hi list, When there are multiple rules in a ruleset, is it the case that first one wins? When will a rule faisl, does it fall through to the next rule? Are min_size, max_size the only determinants? Are there any examples? The only examples I've see put one rule per ruleset (e.g. the docs have a ssd/platter example but that shows 1 rule per ruleset) The intention of rulesets is that they are used only for pools of different sizes, so the behavior when you have multiple rules which match to a given size is probably not well-defined. That said, even using multiple rules in a single ruleset is not well tested and I believe the functionality is being removed in the next release. I would recommend against using them. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Multiple rules in a ruleset: any examples? Which rule wins?
Thanks! What happens when the lone rule fails? Is there a fallback rule that will place the blob in a random PG? Say I misconfigure, and my choose/chooseleaf don't add up to pool min size. (This also explains why all examples in the wild use only 1 rule per ruleset.) On Fri, Nov 14, 2014 at 7:03 AM, Gregory Farnum g...@gregs42.com wrote: On Thu, Nov 13, 2014 at 2:58 PM, Anthony Alba ascanio.al...@gmail.com wrote: Hi list, When there are multiple rules in a ruleset, is it the case that first one wins? When will a rule faisl, does it fall through to the next rule? Are min_size, max_size the only determinants? Are there any examples? The only examples I've see put one rule per ruleset (e.g. the docs have a ssd/platter example but that shows 1 rule per ruleset) The intention of rulesets is that they are used only for pools of different sizes, so the behavior when you have multiple rules which match to a given size is probably not well-defined. That said, even using multiple rules in a single ruleset is not well tested and I believe the functionality is being removed in the next release. I would recommend against using them. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Solaris 10 VMs extremely slow in KVM on Ceph RBD Devices
Hello, On Wed, 12 Nov 2014 17:29:43 +0100 Christoph Adomeit wrote: Hi, i installed a Ceph Cluster with 50 OSDs on 4 Hosts and finally I am really happy with it. Linux and Windows VMs run really fast in KVM on the Ceph Storage. Only my Solaris 10 guests are terribly slow on ceph rbd storage. A solaris on Ceph Storage needs 15 Minutes to boot. When I move the Solaris Image to the old nexenta nfs storage and start it on the same kvm host it will fly and boot in 1,5 Minutes. I have tested ceph firefly and giant and the Problem is with both ceph versions. The performance problem is not only with booting. The problem continues when the server is up. EVerything is terribly slow. What actual disk driver are you using with the Solaris KVMs? This sounds a lot like Windows VMs with IDE emulation, before installing the virtio drivers. Regards, Christian So the only difference here is ceph vs. nexenta nfs storage that causes the big performance problems. The solaris guests have zfs root standard installation. Does anybody have an idea or a hint what might go on here and what I should try to make solaris 10 Guests faster on ceph storage ? Many Thanks Christoph -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] calamari build failure
Hello, I’m trying to setup calamari with reference to http://ceph.com/category/ceph-gui/. I could create package of calamari server. but the creation of calamari client was failed. Following is the procedure. build process was failed. How can I fix this? # git clone https://github.com/ceph/calamari-clients.git # cd calamari-clients/vagrant/precise-build/ # vagrant up # vagrant ssh # sudo apt-get install ruby1.9.1 ruby1.9.1-dev python-software-properties g++ make git debhelper build-essential devscripts # sudo apt-add-repository http://ppa.launchpad.net/chris-lea/node.js/ubuntu # sudo apt-get update # sudo apt-get install nodejs # sudo npm install -g bower@1.3.8 # sudo npm install -g grunt-cli # sudo gem install compass # sudo salt-call state.highstate - [INFO ] Completed state [make build-product] at time 11:27:42.011180 [INFO ] Running state [cp calamari-clients*tar.gz /git/] at time 11:27:42.014689 [INFO ] Executing state cmd.run for cp calamari-clients*tar.gz /git/ [INFO ] Executing command 'cp calamari-clients*tar.gz /git/' as user 'vagrant' in directory '/home/vagrant/clients' [INFO ] {'pid': 30841, 'retcode': 0, 'stderr': '', 'stdout': ''} [INFO ] Completed state [cp calamari-clients*tar.gz /git/] at time 11:27:42.290320 [ERROR ] An un-handled exception was caught by salt's global exception handler: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1914-1916: ordinal not in range(256) Traceback (most recent call last): File /usr/bin/salt-call, line 11, in module salt_call() File /usr/lib/pymodules/python2.7/salt/scripts.py, line 82, in salt_call client.run() File /usr/lib/pymodules/python2.7/salt/cli/__init__.py, line 319, in run caller.run() File /usr/lib/pymodules/python2.7/salt/cli/caller.py, line 148, in run self.opts) File /usr/lib/pymodules/python2.7/salt/output/__init__.py, line 49, in display_output print(display_data) UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1914-1916: ordinal not in range(256) Traceback (most recent call last): File /usr/bin/salt-call, line 11, in module salt_call() File /usr/lib/pymodules/python2.7/salt/scripts.py, line 82, in salt_call client.run() File /usr/lib/pymodules/python2.7/salt/cli/__init__.py, line 319, in run caller.run() File /usr/lib/pymodules/python2.7/salt/cli/caller.py, line 148, in run self.opts) File /usr/lib/pymodules/python2.7/salt/output/__init__.py, line 49, in display_output print(display_data) UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1914-1916: ordinal not in range(256) —— Thank you. — idzzy ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] calamari build failure
Hi, Which version are you currently running? |# salt-call --version| On 11/14/14 9:34 AM, idzzy wrote: Hello, I'm trying to setup calamari with reference to http://ceph.com/category/ceph-gui/. I could create package of calamari server. but the creation of calamari client was failed. Following is the procedure. build process was failed. How can I fix this? # git clone https://github.com/ceph/calamari-clients.git # cd calamari-clients/vagrant/precise-build/ # vagrant up # vagrant ssh # sudo apt-get install ruby1.9.1 ruby1.9.1-dev python-software-properties g++ make git debhelper build-essential devscripts # sudo apt-add-repository http://ppa.launchpad.net/chris-lea/node.js/ubuntu # sudo apt-get update # sudo apt-get install nodejs # sudo npm install -g bower@1.3.8 # sudo npm install -g grunt-cli # sudo gem install compass # sudo salt-call state.highstate - [INFO] Completed state [make build-product] at time 11:27:42.011180 [INFO] Running state [cp calamari-clients*tar.gz /git/] at time 11:27:42.014689 [INFO] Executing state cmd.run for cp calamari-clients*tar.gz /git/ [INFO] Executing command 'cp calamari-clients*tar.gz /git/' as user 'vagrant' in directory '/home/vagrant/clients' [INFO] {'pid': 30841, 'retcode': 0, 'stderr': '', 'stdout': ''} [INFO] Completed state [cp calamari-clients*tar.gz /git/] at time 11:27:42.290320 [ERROR ] An un-handled exception was caught by salt's global exception handler: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1914-1916: ordinal not in range(256) Traceback (most recent call last): File /usr/bin/salt-call, line 11, in module salt_call() File /usr/lib/pymodules/python2.7/salt/scripts.py, line 82, in salt_call client.run() File /usr/lib/pymodules/python2.7/salt/cli/__init__.py, line 319, in run caller.run() File /usr/lib/pymodules/python2.7/salt/cli/caller.py, line 148, in run self.opts) File /usr/lib/pymodules/python2.7/salt/output/__init__.py, line 49, in display_output print(display_data) UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1914-1916: ordinal not in range(256) Traceback (most recent call last): File /usr/bin/salt-call, line 11, in module salt_call() File /usr/lib/pymodules/python2.7/salt/scripts.py, line 82, in salt_call client.run() File /usr/lib/pymodules/python2.7/salt/cli/__init__.py, line 319, in run caller.run() File /usr/lib/pymodules/python2.7/salt/cli/caller.py, line 148, in run self.opts) File /usr/lib/pymodules/python2.7/salt/output/__init__.py, line 49, in display_output print(display_data) UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1914-1916: ordinal not in range(256) -- Thank you. --- idzzy ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] calamari build failure
Hi, vagrant@precise64:/git$ salt-call --version salt-call 2014.1.13 (Hydrogen) As of now I can see the calamari-client pkg in vagrant:/git directory. Does this mean the building pkg success? But what was the error message which I sent in previous mail? vagrant@precise64:/git$ ls -l total 3340 drwxr-xr-x 1 vagrant vagrant 748 Nov 13 10:21 calamari-clients -rw-r--r-- 1 vagrant vagrant 1705604 Nov 13 11:27 calamari-clients_1.2.1.1-36-g535e2d9_all.deb -rw-r--r-- 1 vagrant vagrant 1711393 Nov 13 11:27 calamari-clients-build-output.tar.gz Thank you. — idzzy On November 14, 2014 at 10:53:22 AM, Mark Loza (ml...@morphlabs.com) wrote: Hi, Which version are you currently running? # salt-call --version On 11/14/14 9:34 AM, idzzy wrote: Hello, I’m trying to setup calamari with reference to http://ceph.com/category/ceph-gui/. I could create package of calamari server. but the creation of calamari client was failed. Following is the procedure. build process was failed. How can I fix this? # git clone https://github.com/ceph/calamari-clients.git # cd calamari-clients/vagrant/precise-build/ # vagrant up # vagrant ssh # sudo apt-get install ruby1.9.1 ruby1.9.1-dev python-software-properties g++ make git debhelper build-essential devscripts # sudo apt-add-repository http://ppa.launchpad.net/chris-lea/node.js/ubuntu # sudo apt-get update # sudo apt-get install nodejs # sudo npm install -g bower@1.3.8 # sudo npm install -g grunt-cli # sudo gem install compass # sudo salt-call state.highstate - [INFO ] Completed state [make build-product] at time 11:27:42.011180 [INFO ] Running state [cp calamari-clients*tar.gz /git/] at time 11:27:42.014689 [INFO ] Executing state cmd.run for cp calamari-clients*tar.gz /git/ [INFO ] Executing command 'cp calamari-clients*tar.gz /git/' as user 'vagrant' in directory '/home/vagrant/clients' [INFO ] {'pid': 30841, 'retcode': 0, 'stderr': '', 'stdout': ''} [INFO ] Completed state [cp calamari-clients*tar.gz /git/] at time 11:27:42.290320 [ERROR ] An un-handled exception was caught by salt's global exception handler: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1914-1916: ordinal not in range(256) Traceback (most recent call last): File /usr/bin/salt-call, line 11, in module salt_call() File /usr/lib/pymodules/python2.7/salt/scripts.py, line 82, in salt_call client.run() File /usr/lib/pymodules/python2.7/salt/cli/__init__.py, line 319, in run caller.run() File /usr/lib/pymodules/python2.7/salt/cli/caller.py, line 148, in run self.opts) File /usr/lib/pymodules/python2.7/salt/output/__init__.py, line 49, in display_output print(display_data) UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1914-1916: ordinal not in range(256) Traceback (most recent call last): File /usr/bin/salt-call, line 11, in module salt_call() File /usr/lib/pymodules/python2.7/salt/scripts.py, line 82, in salt_call client.run() File /usr/lib/pymodules/python2.7/salt/cli/__init__.py, line 319, in run caller.run() File /usr/lib/pymodules/python2.7/salt/cli/caller.py, line 148, in run self.opts) File /usr/lib/pymodules/python2.7/salt/output/__init__.py, line 49, in display_output print(display_data) UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1914-1916: ordinal not in range(256) —— Thank you. — idzzy ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Upgrade to 0.80.7-0.el6 from 0.80.1-0.el6, OSD crashes on startup
I upgraded my mons to the latest version and they appear to work, I then upgraded my mds and it seems fine. I then upgraded one OSD node and the OSD fails to start with the following dump, any help is appreciated: --- begin dump of recent events --- 0 2014-11-13 18:20:15.625793 7fbd973ce7a0 -1 *** Caught signal (Aborted) ** in thread 7fbd973ce7a0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: /usr/bin/ceph-osd() [0x9bd2a1] 2: (()+0xf710) [0x7fbd96373710] 3: (gsignal()+0x35) [0x7fbd95245925] 4: (abort()+0x175) [0x7fbd95247105] 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fbd95affa5d] 6: (()+0xbcbe6) [0x7fbd95afdbe6] 7: (()+0xbcc13) [0x7fbd95afdc13] 8: (()+0xbcd0e) [0x7fbd95afdd0e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f2) [0xafbe22] 10: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::buffer::list*)+0x4ea) [0x7f729a] 11: (OSD::load_pgs()+0x18f1) [0x64f2b1] 12: (OSD::init()+0x22c0) [0x6536f0] 13: (main()+0x35bc) [0x5fe39c] 14: (__libc_start_main()+0xfd) [0x7fbd95231d1d] 15: /usr/bin/ceph-osd() [0x5f9e49] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 1 max_new 1000 log_file /var/log/ceph/us-west01-osd.0.log --- end dump of recent events --- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade to 0.80.7-0.el6 from 0.80.1-0.el6, OSD crashes on startup
On Thu, 13 Nov 2014, Joshua McClintock wrote: I upgraded my mons to the latest version and they appear to work, I then upgraded my mds and it seems fine. I then upgraded one OSD node and the OSD fails to start with the following dump, any help is appreciated: --- begin dump of recent events --- 0 2014-11-13 18:20:15.625793 7fbd973ce7a0 -1 *** Caught signal (Aborted) ** in thread 7fbd973ce7a0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: /usr/bin/ceph-osd() [0x9bd2a1] 2: (()+0xf710) [0x7fbd96373710] 3: (gsignal()+0x35) [0x7fbd95245925] 4: (abort()+0x175) [0x7fbd95247105] 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fbd95affa5d] 6: (()+0xbcbe6) [0x7fbd95afdbe6] 7: (()+0xbcc13) [0x7fbd95afdc13] 8: (()+0xbcd0e) [0x7fbd95afdd0e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f2) [0xafbe22] 10: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, ceph::buffer::list*)+0x4ea) [0x7f729a] 11: (OSD::load_pgs()+0x18f1) [0x64f2b1] 12: (OSD::init()+0x22c0) [0x6536f0] 13: (main()+0x35bc) [0x5fe39c] 14: (__libc_start_main()+0xfd) [0x7fbd95231d1d] 15: /usr/bin/ceph-osd() [0x5f9e49] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. Hey, this looks like a different report we saw recently off-list! In that case, they were upgrading from 0.80.4 to 0.80.7. Opening #10105. You only restarting a single OSD? I would hold off on restarting any more for the time being. Can you attach the output from ls /var/lib/ceph/osd/ceph-NNN/current and ceph-kvstore-tool /var/lib/ceph/osd/ceph-NNN/current/omap list (you may need to install the ceph-tests rpm to get ceph-kvstore-tool). Thanks! sage --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 1 max_new 1000 log_file /var/log/ceph/us-west01-osd.0.log --- end dump of recent events --- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Multiple rules in a ruleset: any examples? Which rule wins?
On Thu, Nov 13, 2014 at 3:11 PM, Anthony Alba ascanio.al...@gmail.com wrote: Thanks! What happens when the lone rule fails? Is there a fallback rule that will place the blob in a random PG? Say I misconfigure, and my choose/chooseleaf don't add up to pool min size. There's no built-in fallback rule or anything like that. If the rule fails to map for new PGs, they will just not be created. If you have existing PGs and something changes so they don't map, then they will generally continue to run on the OSDs they were previously stored on. No data is ever removed until it's already fully replicated, so breaking your rules can cause various liveness issues with your cluster but it will never compromise data durability. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade to 0.80.7-0.el6 from 0.80.1-0.el6, OSD crashes on startup
[root@ceph-node20 ~]# ls /var/lib/ceph/osd/us-west01-0/current 0.10_head 0.1a_head 0.23_head 0.2c_head 0.37_head 0.3_head 0.b_head 1.10_head 1.1b_head 1.24_head 1.2c_head 1.3a_head 1.b_head 2.16_head 2.1_head 2.2a_head 2.32_head 2.3a_head 2.a_head omap 0.11_head 0.1d_head 0.25_head 0.2e_head 0.38_head 0.4_head 0.c_head 1.13_head 1.1d_head 1.26_head 1.2f_head 1.3b_head 1.e_head 2.1a_head 2.22_head 2.2c_head 2.33_head 2.3e_head 2.b_head 0.13_head 0.1f_head 0.26_head 0.2f_head 0.3b_head 0.5_head 0.d_head 1.16_head 1.1f_head 1.27_head 1.31_head 1.3e_head 2.0_head 2.1b_head 2.25_head 2.2e_head 2.36_head 2.3f_head 2.c_head 0.16_head 0.20_head 0.27_head 0.30_head 0.3c_head 0.6_head 0.e_head 1.18_head 1.20_head 1.29_head 1.36_head 1.3_head 2.10_head 2.1c_head 2.26_head 2.2f_head 2.37_head 2.4_head commit_op_seq 0.18_head 0.21_head 0.28_head 0.33_head 0.3e_head 0.7_head 0.f_head 1.19_head 1.22_head 1.2a_head 1.37_head 1.4_head 2.11_head 2.1d_head 2.27_head 2.30_head 2.38_head 2.7_head meta 0.19_head 0.22_head 0.29_head 0.35_head 0.3f_head 0.9_head 1.0_head 1.1a_head 1.23_head 1.2b_head 1.39_head 1.a_head 2.12_head 2.1e_head 2.28_head 2.31_head 2.39_head 2.8_head nosnap The output from the other command was too long to post, here's the link to the full dump: http://pastee.co/Kd1BlP Here's the last 100-200 lines: ... ... ... ... _HOBJTOSEQ_:pglog%u2%e2e...0.none.516B9E4C _HOBJTOSEQ_:pglog%u2%e2f...0.none.516B9F1C _HOBJTOSEQ_:pglog%u2%e30...0.none.516BFD4B _HOBJTOSEQ_:pglog%u2%e31...0.none.516BF21B _HOBJTOSEQ_:pglog%u2%e32...0.none.516BF3AB _HOBJTOSEQ_:pglog%u2%e33...0.none.516BF37B _HOBJTOSEQ_:pglog%u2%e36...0.none.516BF16B _HOBJTOSEQ_:pglog%u2%e37...0.none.516BF63B _HOBJTOSEQ_:pglog%u2%e38...0.none.516BF7CB _HOBJTOSEQ_:pglog%u2%e39...0.none.516BF49B _HOBJTOSEQ_:pglog%u2%e3a...0.none.516B933C _HOBJTOSEQ_:pglog%u2%e3e...0.none.516B96FC _HOBJTOSEQ_:pglog%u2%e3f...0.none.516B978C _HOBJTOSEQ_:pglog%u2%e4...0.none.103ABD8E _HOBJTOSEQ_:pglog%u2%e7...0.none.103AB3BE _HOBJTOSEQ_:pglog%u2%e8...0.none.103AB34E _HOBJTOSEQ_:pglog%u2%ea...0.none.103A5CBF _HOBJTOSEQ_:pglog%u2%eb...0.none.103A5C4F _HOBJTOSEQ_:pglog%u2%ec...0.none.103A5D1F _SYS_:HEADER *** Caught signal (Bus error) ** in thread 7f64e92ce760 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: ceph-kvstore-tool() [0x4bf2e1] 2: (()+0xf710) [0x7f64e86e0710] 3: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions const, leveldb::BlockHandle const, leveldb::BlockContents*)+0x1cb) [0x7f64e8e9f73b] 4: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const, leveldb::Slice const)+0x291) [0x7f64e8ea0de1] 5: (()+0x3a412) [0x7f64e8ea3412] 6: (()+0x3a6f8) [0x7f64e8ea36f8] 7: (()+0x3a78d) [0x7f64e8ea378d] 8: (()+0x3761a) [0x7f64e8ea061a] 9: (()+0x20fd2) [0x7f64e8e89fd2] 10: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::next()+0x47) [0x4ba417] 11: (StoreTool::traverse(std::string const, bool, std::ostream*)+0x1da) [0x4b65fa] 12: (main()+0x2cc) [0x4b26fc] 13: (__libc_start_main()+0xfd) [0x7f64e77a2d1d] 14: ceph-kvstore-tool() [0x4b21b9] 2014-11-13 21:19:18.318941 7f64e92ce760 -1 *** Caught signal (Bus error) ** in thread 7f64e92ce760 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: ceph-kvstore-tool() [0x4bf2e1] 2: (()+0xf710) [0x7f64e86e0710] 3: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions const, leveldb::BlockHandle const, leveldb::BlockContents*)+0x1cb) [0x7f64e8e9f73b] 4: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const, leveldb::Slice const)+0x291) [0x7f64e8ea0de1] 5: (()+0x3a412) [0x7f64e8ea3412] 6: (()+0x3a6f8) [0x7f64e8ea36f8] 7: (()+0x3a78d) [0x7f64e8ea378d] 8: (()+0x3761a) [0x7f64e8ea061a] 9: (()+0x20fd2) [0x7f64e8e89fd2] 10: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::next()+0x47) [0x4ba417] 11: (StoreTool::traverse(std::string const, bool, std::ostream*)+0x1da) [0x4b65fa] 12: (main()+0x2cc) [0x4b26fc] 13: (__libc_start_main()+0xfd) [0x7f64e77a2d1d] 14: ceph-kvstore-tool() [0x4b21b9] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- -13 2014-11-13 21:19:04.689722 7f64e92ce760 5 asok(0x1f1b5b0) register_command perfcounters_dump hook 0x1f1b510 -12 2014-11-13 21:19:04.689754 7f64e92ce760 5 asok(0x1f1b5b0) register_command 1 hook 0x1f1b510 -11 2014-11-13 21:19:04.689771 7f64e92ce760 5 asok(0x1f1b5b0) register_command perf dump hook 0x1f1b510 -10 2014-11-13 21:19:04.689778 7f64e92ce760 5 asok(0x1f1b5b0) register_command perfcounters_schema hook 0x1f1b510 -9 2014-11-13 21:19:04.689787 7f64e92ce760 5 asok(0x1f1b5b0) register_command 2 hook 0x1f1b510 -8 2014-11-13 21:19:04.689793 7f64e92ce760 5 asok(0x1f1b5b0) register_command perf schema hook 0x1f1b510 -7
Re: [ceph-users] Upgrade to 0.80.7-0.el6 from 0.80.1-0.el6, OSD crashes on startup
Hmm, looks like leveldb is hitting a problem. Is there anything in the kernel log (dmesg) that suggests a disk or file system problem? Are you able to, say, tar up the current/omap directory without problems? This is a single OSD, right? None of the others have been upgraded yet? sage On Thu, 13 Nov 2014, Joshua McClintock wrote: [root@ceph-node20 ~]# ls /var/lib/ceph/osd/us-west01-0/current 0.10_head 0.1a_head 0.23_head 0.2c_head 0.37_head 0.3_head 0.b_head 1.10_head 1.1b_head 1.24_head 1.2c_head 1.3a_head 1.b_head 2.16_head 2.1_head 2.2a_head 2.32_head 2.3a_head 2.a_head omap 0.11_head 0.1d_head 0.25_head 0.2e_head 0.38_head 0.4_head 0.c_head 1.13_head 1.1d_head 1.26_head 1.2f_head 1.3b_head 1.e_head 2.1a_head 2.22_head 2.2c_head 2.33_head 2.3e_head 2.b_head 0.13_head 0.1f_head 0.26_head 0.2f_head 0.3b_head 0.5_head 0.d_head 1.16_head 1.1f_head 1.27_head 1.31_head 1.3e_head 2.0_head 2.1b_head 2.25_head 2.2e_head 2.36_head 2.3f_head 2.c_head 0.16_head 0.20_head 0.27_head 0.30_head 0.3c_head 0.6_head 0.e_head 1.18_head 1.20_head 1.29_head 1.36_head 1.3_head 2.10_head 2.1c_head 2.26_head 2.2f_head 2.37_head 2.4_head commit_op_seq 0.18_head 0.21_head 0.28_head 0.33_head 0.3e_head 0.7_head 0.f_head 1.19_head 1.22_head 1.2a_head 1.37_head 1.4_head 2.11_head 2.1d_head 2.27_head 2.30_head 2.38_head 2.7_head meta 0.19_head 0.22_head 0.29_head 0.35_head 0.3f_head 0.9_head 1.0_head 1.1a_head 1.23_head 1.2b_head 1.39_head 1.a_head 2.12_head 2.1e_head 2.28_head 2.31_head 2.39_head 2.8_head nosnap The output from the other command was too long to post, here's the link to the full dump: http://pastee.co/Kd1BlP Here's the last 100-200 lines: ... ... ... ... _HOBJTOSEQ_:pglog%u2%e2e...0.none.516B9E4C _HOBJTOSEQ_:pglog%u2%e2f...0.none.516B9F1C _HOBJTOSEQ_:pglog%u2%e30...0.none.516BFD4B _HOBJTOSEQ_:pglog%u2%e31...0.none.516BF21B _HOBJTOSEQ_:pglog%u2%e32...0.none.516BF3AB _HOBJTOSEQ_:pglog%u2%e33...0.none.516BF37B _HOBJTOSEQ_:pglog%u2%e36...0.none.516BF16B _HOBJTOSEQ_:pglog%u2%e37...0.none.516BF63B _HOBJTOSEQ_:pglog%u2%e38...0.none.516BF7CB _HOBJTOSEQ_:pglog%u2%e39...0.none.516BF49B _HOBJTOSEQ_:pglog%u2%e3a...0.none.516B933C _HOBJTOSEQ_:pglog%u2%e3e...0.none.516B96FC _HOBJTOSEQ_:pglog%u2%e3f...0.none.516B978C _HOBJTOSEQ_:pglog%u2%e4...0.none.103ABD8E _HOBJTOSEQ_:pglog%u2%e7...0.none.103AB3BE _HOBJTOSEQ_:pglog%u2%e8...0.none.103AB34E _HOBJTOSEQ_:pglog%u2%ea...0.none.103A5CBF _HOBJTOSEQ_:pglog%u2%eb...0.none.103A5C4F _HOBJTOSEQ_:pglog%u2%ec...0.none.103A5D1F _SYS_:HEADER *** Caught signal (Bus error) ** in thread 7f64e92ce760 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: ceph-kvstore-tool() [0x4bf2e1] 2: (()+0xf710) [0x7f64e86e0710] 3: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions const, leveldb::BlockHandle const, leveldb::BlockContents*)+0x1cb) [0x7f64e8e9f73b] 4: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const, leveldb::Slice const)+0x291) [0x7f64e8ea0de1] 5: (()+0x3a412) [0x7f64e8ea3412] 6: (()+0x3a6f8) [0x7f64e8ea36f8] 7: (()+0x3a78d) [0x7f64e8ea378d] 8: (()+0x3761a) [0x7f64e8ea061a] 9: (()+0x20fd2) [0x7f64e8e89fd2] 10: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::next()+0x47) [0x4ba417] 11: (StoreTool::traverse(std::string const, bool, std::ostream*)+0x1da) [0x4b65fa] 12: (main()+0x2cc) [0x4b26fc] 13: (__libc_start_main()+0xfd) [0x7f64e77a2d1d] 14: ceph-kvstore-tool() [0x4b21b9] 2014-11-13 21:19:18.318941 7f64e92ce760 -1 *** Caught signal (Bus error) ** in thread 7f64e92ce760 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: ceph-kvstore-tool() [0x4bf2e1] 2: (()+0xf710) [0x7f64e86e0710] 3: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions const, leveldb::BlockHandle const, leveldb::BlockContents*)+0x1cb) [0x7f64e8e9f73b] 4: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const, leveldb::Slice const)+0x291) [0x7f64e8ea0de1] 5: (()+0x3a412) [0x7f64e8ea3412] 6: (()+0x3a6f8) [0x7f64e8ea36f8] 7: (()+0x3a78d) [0x7f64e8ea378d] 8: (()+0x3761a) [0x7f64e8ea061a] 9: (()+0x20fd2) [0x7f64e8e89fd2] 10: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::next()+0x47) [0x4ba417] 11: (StoreTool::traverse(std::string const, bool, std::ostream*)+0x1da) [0x4b65fa] 12: (main()+0x2cc) [0x4b26fc] 13: (__libc_start_main()+0xfd) [0x7f64e77a2d1d] 14: ceph-kvstore-tool() [0x4b21b9] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- -13 2014-11-13 21:19:04.689722 7f64e92ce760 5 asok(0x1f1b5b0) register_command perfcounters_dump hook 0x1f1b510 -12 2014-11-13
Re: [ceph-users] Upgrade to 0.80.7-0.el6 from 0.80.1-0.el6, OSD crashes on startup
AH! Sorry for the false alarm, I clearly have a hard drive problem ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata2.00: BMDMA stat 0x24 ata2.00: failed command: READ DMA ata2.00: cmd c8/00:08:38:bd:70/00:00:00:00:00/ef tag 0 dma 4096 in res 51/40:00:3f:bd:70/40:00:21:00:00/ef Emask 0x9 (media error) ata2.00: status: { DRDY ERR } ata2.00: error: { UNC } ata2.00: configured for UDMA/133 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata2.00: BMDMA stat 0x24 ata2.00: failed command: READ DMA ata2.00: cmd c8/00:08:38:bd:70/00:00:00:00:00/ef tag 0 dma 4096 in res 51/40:00:3f:bd:70/40:00:21:00:00/ef Emask 0x9 (media error) ata2.00: status: { DRDY ERR } ata2.00: error: { UNC } ata2.00: configured for UDMA/133 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata2.00: BMDMA stat 0x24 ata2.00: failed command: READ DMA ata2.00: cmd c8/00:08:38:bd:70/00:00:00:00:00/ef tag 0 dma 4096 in res 51/40:00:3f:bd:70/40:00:21:00:00/ef Emask 0x9 (media error) ata2.00: status: { DRDY ERR } ata2.00: error: { UNC } ata2.00: configured for UDMA/133 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata2.00: BMDMA stat 0x24 ata2.00: failed command: READ DMA On Thu, Nov 13, 2014 at 9:28 PM, Sage Weil s...@newdream.net wrote: Hmm, looks like leveldb is hitting a problem. Is there anything in the kernel log (dmesg) that suggests a disk or file system problem? Are you able to, say, tar up the current/omap directory without problems? This is a single OSD, right? None of the others have been upgraded yet? sage On Thu, 13 Nov 2014, Joshua McClintock wrote: [root@ceph-node20 ~]# ls /var/lib/ceph/osd/us-west01-0/current 0.10_head 0.1a_head 0.23_head 0.2c_head 0.37_head 0.3_head 0.b_head 1.10_head 1.1b_head 1.24_head 1.2c_head 1.3a_head 1.b_head 2.16_head 2.1_head 2.2a_head 2.32_head 2.3a_head 2.a_head omap 0.11_head 0.1d_head 0.25_head 0.2e_head 0.38_head 0.4_head 0.c_head 1.13_head 1.1d_head 1.26_head 1.2f_head 1.3b_head 1.e_head 2.1a_head 2.22_head 2.2c_head 2.33_head 2.3e_head 2.b_head 0.13_head 0.1f_head 0.26_head 0.2f_head 0.3b_head 0.5_head 0.d_head 1.16_head 1.1f_head 1.27_head 1.31_head 1.3e_head 2.0_head 2.1b_head 2.25_head 2.2e_head 2.36_head 2.3f_head 2.c_head 0.16_head 0.20_head 0.27_head 0.30_head 0.3c_head 0.6_head 0.e_head 1.18_head 1.20_head 1.29_head 1.36_head 1.3_head 2.10_head 2.1c_head 2.26_head 2.2f_head 2.37_head 2.4_head commit_op_seq 0.18_head 0.21_head 0.28_head 0.33_head 0.3e_head 0.7_head 0.f_head 1.19_head 1.22_head 1.2a_head 1.37_head 1.4_head 2.11_head 2.1d_head 2.27_head 2.30_head 2.38_head 2.7_head meta 0.19_head 0.22_head 0.29_head 0.35_head 0.3f_head 0.9_head 1.0_head 1.1a_head 1.23_head 1.2b_head 1.39_head 1.a_head 2.12_head 2.1e_head 2.28_head 2.31_head 2.39_head 2.8_head nosnap The output from the other command was too long to post, here's the link to the full dump: http://pastee.co/Kd1BlP Here's the last 100-200 lines: ... ... ... ... _HOBJTOSEQ_:pglog%u2%e2e...0.none.516B9E4C _HOBJTOSEQ_:pglog%u2%e2f...0.none.516B9F1C _HOBJTOSEQ_:pglog%u2%e30...0.none.516BFD4B _HOBJTOSEQ_:pglog%u2%e31...0.none.516BF21B _HOBJTOSEQ_:pglog%u2%e32...0.none.516BF3AB _HOBJTOSEQ_:pglog%u2%e33...0.none.516BF37B _HOBJTOSEQ_:pglog%u2%e36...0.none.516BF16B _HOBJTOSEQ_:pglog%u2%e37...0.none.516BF63B _HOBJTOSEQ_:pglog%u2%e38...0.none.516BF7CB _HOBJTOSEQ_:pglog%u2%e39...0.none.516BF49B _HOBJTOSEQ_:pglog%u2%e3a...0.none.516B933C _HOBJTOSEQ_:pglog%u2%e3e...0.none.516B96FC _HOBJTOSEQ_:pglog%u2%e3f...0.none.516B978C _HOBJTOSEQ_:pglog%u2%e4...0.none.103ABD8E _HOBJTOSEQ_:pglog%u2%e7...0.none.103AB3BE _HOBJTOSEQ_:pglog%u2%e8...0.none.103AB34E _HOBJTOSEQ_:pglog%u2%ea...0.none.103A5CBF _HOBJTOSEQ_:pglog%u2%eb...0.none.103A5C4F _HOBJTOSEQ_:pglog%u2%ec...0.none.103A5D1F _SYS_:HEADER *** Caught signal (Bus error) ** in thread 7f64e92ce760 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: ceph-kvstore-tool() [0x4bf2e1] 2: (()+0xf710) [0x7f64e86e0710] 3: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions const, leveldb::BlockHandle const, leveldb::BlockContents*)+0x1cb) [0x7f64e8e9f73b] 4: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const, leveldb::Slice const)+0x291) [0x7f64e8ea0de1] 5: (()+0x3a412) [0x7f64e8ea3412] 6: (()+0x3a6f8) [0x7f64e8ea36f8] 7: (()+0x3a78d) [0x7f64e8ea378d] 8: (()+0x3761a) [0x7f64e8ea061a] 9: (()+0x20fd2) [0x7f64e8e89fd2] 10: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::next()+0x47) [0x4ba417] 11: (StoreTool::traverse(std::string const, bool,