Re: [ceph-users] mds isn't working anymore after osd's running full
Hello Greg, I saw that the site of the previous link of the logs uses a very short expiring time so I uploaded it to another one: http://www.mediafire.com/download/gikiy7cqs42cllt/ceph-mds.th1-mon001.log.tar.gz Thanks, Jasper Van: gregory.far...@inktank.com [gregory.far...@inktank.com] namens Gregory Farnum [gfar...@redhat.com] Verzonden: donderdag 30 oktober 2014 1:03 Aan: Jasper Siero CC: John Spray; ceph-users Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full On Wed, Oct 29, 2014 at 7:51 AM, Jasper Siero jasper.si...@target-holding.nl wrote: Hello Greg, I added the debug options which you mentioned and started the process again: [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph --reset-journal 0 old journal was 9483323613~134233517 new journal start will be 9621733376 (4176246 bytes past old end) writing journal head writing EResetJournal entry done [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 -c /etc/ceph/ceph.conf --cluster ceph --undump-journal 0 journaldumptgho-mon001 undump journaldumptgho-mon001 start 9483323613 len 134213311 writing header 200. writing 9483323613~1048576 writing 9484372189~1048576 writing 9485420765~1048576 writing 9486469341~1048576 writing 9487517917~1048576 writing 9488566493~1048576 writing 9489615069~1048576 writing 9490663645~1048576 writing 9491712221~1048576 writing 9492760797~1048576 writing 9493809373~1048576 writing 9494857949~1048576 writing 9495906525~1048576 writing 9496955101~1048576 writing 9498003677~1048576 writing 9499052253~1048576 writing 9500100829~1048576 writing 9501149405~1048576 writing 9502197981~1048576 writing 9503246557~1048576 writing 9504295133~1048576 writing 9505343709~1048576 writing 9506392285~1048576 writing 9507440861~1048576 writing 9508489437~1048576 writing 9509538013~1048576 writing 9510586589~1048576 writing 9511635165~1048576 writing 9512683741~1048576 writing 9513732317~1048576 writing 9514780893~1048576 writing 9515829469~1048576 writing 9516878045~1048576 writing 9517926621~1048576 writing 9518975197~1048576 writing 9520023773~1048576 writing 9521072349~1048576 writing 9522120925~1048576 writing 9523169501~1048576 writing 9524218077~1048576 writing 9525266653~1048576 writing 9526315229~1048576 writing 9527363805~1048576 writing 9528412381~1048576 writing 9529460957~1048576 writing 9530509533~1048576 writing 9531558109~1048576 writing 9532606685~1048576 writing 9533655261~1048576 writing 9534703837~1048576 writing 9535752413~1048576 writing 9536800989~1048576 writing 9537849565~1048576 writing 9538898141~1048576 writing 9539946717~1048576 writing 9540995293~1048576 writing 9542043869~1048576 writing 9543092445~1048576 writing 9544141021~1048576 writing 9545189597~1048576 writing 9546238173~1048576 writing 9547286749~1048576 writing 9548335325~1048576 writing 9549383901~1048576 writing 9550432477~1048576 writing 9551481053~1048576 writing 9552529629~1048576 writing 9553578205~1048576 writing 9554626781~1048576 writing 9555675357~1048576 writing 9556723933~1048576 writing 9557772509~1048576 writing 9558821085~1048576 writing 9559869661~1048576 writing 9560918237~1048576 writing 9561966813~1048576 writing 9563015389~1048576 writing 9564063965~1048576 writing 9565112541~1048576 writing 9566161117~1048576 writing 9567209693~1048576 writing 9568258269~1048576 writing 9569306845~1048576 writing 9570355421~1048576 writing 9571403997~1048576 writing 9572452573~1048576 writing 9573501149~1048576 writing 9574549725~1048576 writing 9575598301~1048576 writing 9576646877~1048576 writing 9577695453~1048576 writing 9578744029~1048576 writing 9579792605~1048576 writing 9580841181~1048576 writing 9581889757~1048576 writing 9582938333~1048576 writing 9583986909~1048576 writing 9585035485~1048576 writing 9586084061~1048576 writing 9587132637~1048576 writing 9588181213~1048576 writing 9589229789~1048576 writing 9590278365~1048576 writing 9591326941~1048576 writing 9592375517~1048576 writing 9593424093~1048576 writing 9594472669~1048576 writing 9595521245~1048576 writing 9596569821~1048576 writing 9597618397~1048576 writing 9598666973~1048576 writing 9599715549~1048576 writing 9600764125~1048576 writing 9601812701~1048576 writing 9602861277~1048576 writing 9603909853~1048576 writing 9604958429~1048576 writing 9606007005~1048576 writing 9607055581~1048576 writing 9608104157~1048576 writing 9609152733~1048576 writing 9610201309~1048576 writing 9611249885~1048576 writing 9612298461~1048576 writing 9613347037~1048576 writing 9614395613~1048576 writing 9615444189~1048576 writing 9616492765~1044159 done.
Re: [ceph-users] ceph version 0.79, rbd flatten report Segmentation fault (core dumped)
On Mon, Nov 3, 2014 at 9:31 AM, duan.xuf...@zte.com.cn wrote: root@CONTROLLER-4F:~# rbd -p volumes flatten f3e81ea3-1d5b-487a-a55e-53efff604d54_disk *** Caught signal (Segmentation fault) ** in thread 7fe99984f700 ceph version 0.79 (4c2d73a5095f527c3a2168deb5fa54b3c8991a6e) 1: (()+0x22a4f) [0x7fe9a1745a4f] 2: (()+0x10340) [0x7fe9a00f2340] 3: (librbd::aio_read(librbd::ImageCtx*, std::vectorstd::pairunsigned long, unsigned long, std::allocatorstd::pairunsigned long, unsigned long const, char*, ceph::buffer::list*, librbd::AioCompletion*)+0x24) [0x7fe9a125daf4] 4: (librbd::AioRequest::read_from_parent(std::vectorstd::pairunsigned long, unsigned long, std::allocatorstd::pairunsigned long, unsigned long )+0x85) [0x7fe9a1242745] 5: (librbd::AioRead::should_complete(int)+0x352) [0x7fe9a1242ca2] 6: (librbd::rados_req_cb(void*, void*)+0x1b) [0x7fe9a124cd7b] 7: (librados::C_AioComplete::finish(int)+0x1d) [0x7fe9a04a355d] 8: (Context::complete(int)+0x9) [0x7fe9a0480579] 9: (Finisher::finisher_thread_entry()+0x1b8) [0x7fe9a0531758] 10: (()+0x8182) [0x7fe9a00ea182] 11: (clone()+0x6d) [0x7fe99f2ce30d] 2014-11-03 14:21:02.413259 7fe99984f700 -1 *** Caught signal (Segmentation fault) ** in thread 7fe99984f700 ceph version 0.79 (4c2d73a5095f527c3a2168deb5fa54b3c8991a6e) 1: (()+0x22a4f) [0x7fe9a1745a4f] 2: (()+0x10340) [0x7fe9a00f2340] 3: (librbd::aio_read(librbd::ImageCtx*, std::vectorstd::pairunsigned long, unsigned long, std::allocatorstd::pairunsigned long, unsigned long const, char*, ceph::buffer::list*, librbd::AioCompletion*)+0x24) [0x7fe9a125daf4] 4: (librbd::AioRequest::read_from_parent(std::vectorstd::pairunsigned long, unsigned long, std::allocatorstd::pairunsigned long, unsigned long )+0x85) [0x7fe9a1242745] 5: (librbd::AioRead::should_complete(int)+0x352) [0x7fe9a1242ca2] 6: (librbd::rados_req_cb(void*, void*)+0x1b) [0x7fe9a124cd7b] 7: (librados::C_AioComplete::finish(int)+0x1d) [0x7fe9a04a355d] 8: (Context::complete(int)+0x9) [0x7fe9a0480579] 9: (Finisher::finisher_thread_entry()+0x1b8) [0x7fe9a0531758] 10: (()+0x8182) [0x7fe9a00ea182] 11: (clone()+0x6d) [0x7fe99f2ce30d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- -113 2014-11-03 14:21:01.948799 7fe9a170c7c0 5 asok(0x7fe9a33483f0) register_command perfcounters_dump hook 0x7fe9a3349ee0 -112 2014-11-03 14:21:01.948850 7fe9a170c7c0 5 asok(0x7fe9a33483f0) register_command 1 hook 0x7fe9a3349ee0 -111 2014-11-03 14:21:01.948856 7fe9a170c7c0 5 asok(0x7fe9a33483f0) register_command perf dump hook 0x7fe9a3349ee0 -110 2014-11-03 14:21:01.948894 7fe9a170c7c0 5 asok(0x7fe9a33483f0) register_command perfcounters_schema hook 0x7fe9a3349ee0 -109 2014-11-03 14:21:01.948906 7fe9a170c7c0 5 asok(0x7fe9a33483f0) register_command 2 hook 0x7fe9a3349ee0 -108 2014-11-03 14:21:01.948915 7fe9a170c7c0 5 asok(0x7fe9a33483f0) register_command perf schema hook 0x7fe9a3349ee0 -107 2014-11-03 14:21:01.948919 7fe9a170c7c0 5 asok(0x7fe9a33483f0) register_command config show hook 0x7fe9a3349ee0 -106 2014-11-03 14:21:01.948931 7fe9a170c7c0 5 asok(0x7fe9a33483f0) register_command config set hook 0x7fe9a3349ee0 -105 2014-11-03 14:21:01.948936 7fe9a170c7c0 5 asok(0x7fe9a33483f0) register_command config get hook 0x7fe9a3349ee0 -104 2014-11-03 14:21:01.948944 7fe9a170c7c0 5 asok(0x7fe9a33483f0) register_command log flush hook 0x7fe9a3349ee0 -103 2014-11-03 14:21:01.948947 7fe9a170c7c0 5 asok(0x7fe9a33483f0) register_command log dump hook 0x7fe9a3349ee0 -102 2014-11-03 14:21:01.948954 7fe9a170c7c0 5 asok(0x7fe9a33483f0) register_command log reopen hook 0x7fe9a3349ee0 -101 2014-11-03 14:21:01.955080 7fe9a170c7c0 10 monclient(hunting): build_initial_monmap -100 2014-11-03 14:21:01.955154 7fe9a170c7c0 1 librados: starting msgr at :/0 -99 2014-11-03 14:21:01.955169 7fe9a170c7c0 1 librados: starting objecter -98 2014-11-03 14:21:01.955227 7fe9a170c7c0 1 -- :/0 messenger.start -97 2014-11-03 14:21:01.955271 7fe9a170c7c0 1 librados: setting wanted keys -96 2014-11-03 14:21:01.955279 7fe9a170c7c0 1 librados: calling monclient init -95 2014-11-03 14:21:01.955280 7fe9a170c7c0 10 monclient(hunting): init -94 2014-11-03 14:21:01.955295 7fe9a170c7c0 5 adding auth protocol: cephx -93 2014-11-03 14:21:01.955304 7fe9a170c7c0 10 monclient(hunting): auth_supported 2 method cephx -92 2014-11-03 14:21:01.955521 7fe9a170c7c0 2 auth: KeyRing::load: loaded key file /etc/ceph/ceph.client.admin.keyring -91 2014-11-03 14:21:01.955627 7fe9a170c7c0 10 monclient(hunting): _reopen_session rank -1 name -90 2014-11-03 14:21:01.955718 7fe9a170c7c0 10 monclient(hunting): picked mon.noname-a con 0x7fe9a336b660 addr 192.129.0.230:6789/0 -89 2014-11-03 14:21:01.955769 7fe9a170c7c0 10 monclient(hunting): _send_mon_message to mon.noname-a at
Re: [ceph-users] rhel7 krbd backported module repo ?
On Mon, Nov 3, 2014 at 7:35 AM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, I would like to known if a repository is available for rhel7/centos7 with last krbd module backported ? I known that such module is available in ceph enterprise repos, but is it available for non subscribers ? Not that I know of. krbd *fixes* are getting backported to stable kernels regularly though. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD MTBF
On Mon, Sep 29, 2014 at 10:31:03AM +0200, Emmanuel Lacour wrote: Dear ceph users, we are managing ceph clusters since 1 year now. Our setup is typically made of Supermicro servers with OSD sata drives and journal on SSD. Those SSD are all failing one after the other after one year :( We used Samsung 850 pro (120Go) with two setup (small nodes with 2 ssd, 2 HD in 1U): s/850/840 A quick update on this, those SSDs continues to fails, we replace each with Intel S3700 and are rebuilding nodes with a different partition table (RAID only for OS, one journal on each SSD, over provisionning). We sent back Samsung SSD for warranty, its'very easy and one week later we receive SSD with same S/N and smart ok but ... we tried to use back two of those and they failed one day later. So sorry for samsung, but I definitely do not recommend using 840 Pro on ceph clusters! -- Easter-eggs Spécialiste GNU/Linux 44-46 rue de l'Ouest - 75014 Paris - France - Métro Gaité Phone: +33 (0) 1 43 35 00 37- Fax: +33 (0) 1 43 35 00 76 mailto:elac...@easter-eggs.com - http://www.easter-eggs.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] giant release osd down
Hello, On Mon, 3 Nov 2014 01:01:32 -0500 (EST) Ian Colle wrote: Christian, Why are you not fond of ceph-deploy? In short, this very thread. Ceph-deploy hides a number of things from the users that are pretty vital for a working ceph cluster and insufficiently or not at all documented in the manual-deploy documentation. Specifically the GPT magic, which isn't documented at all (and no, dissecting python code or some blurb on GIT is not the same as documentation on the Ceph homepage) and flag files like sysvinit. There a numerous cases in this ML where people wound up with OSDs that didn't start (at least at boot time) due to this omission, dependence on ceph-deploy. That GPT magic also makes things a lot less flexible (can't use a full device, have to partition it first) and leads to hilarious things like ceph-deploy preparing an OSD and udev happily starting it up even though that wasn't requested. So when people fail to do a manual deploy the answer tends to be use ceph-deploy (and go from there in my particular reply) instead of Did you follow the docs in section blah?. Then there are problems with ceph-deploy itself, like correctly picking up formatting parameters from the config, but NOT defaulting to the filesystem type specified there. And since it's role is supposed to be helping people with quick deployment (and teardown) of test clusters, the lack of the remove functionality for OSDs isn't particular helpful either. Christian Ian R. Colle Global Director of Software Engineering Red Hat (Inktank is now part of Red Hat!) http://www.linkedin.com/in/ircolle http://www.twitter.com/ircolle Cell: +1.303.601.7713 Email: ico...@redhat.com - Original Message - From: Christian Balzer ch...@gol.com To: ceph-us...@ceph.com Cc: Shiv Raj Singh virk.s...@gmail.com Sent: Sunday, November 2, 2014 8:37:18 AM Subject: Re: [ceph-users] giant release osd down Hello, On Mon, 3 Nov 2014 00:48:20 +1300 Shiv Raj Singh wrote: Hi All I am new to ceph and I have been trying to configure 3 node ceph cluster with 1 monitor and 2 osd nodes. I have reinstall and recreated the cluster three teams and I ma stuck against the wall . My monitor is working as desired (I guess) but the status of the ods is down. I am following this link http://docs.ceph.com/docs/v0.80.5/install/manual-deployment/ for configuring the osd. The reason why I am not using ceph-deply is because I want to understand the technology. can someone please help e udnerstand what im doing wrong !! :-) !! a) You're using OSS. Caveat emperor and so forth. In particular you seem to be following documentation for Firefly while the 64 PGs below indicate that you're actually installing Giant. b) Since Firefly Ceph defaults to a replication size of 3, so 2 OSD won't do. c) But wait, you specified a pool size of 2 in your OSD section! Tough luck, because since Firefly there is a bug that at the very least prevents OSD and RGW parameters from being parsed outside the global section (which incidentally is what the documentation you cited suggests...) d) Your OSDs are down, so all of the above is (kinda) pointless. So without further info (log files, etc) we won't be able to help you much. My suggestion would be to take the above to heart, try with ceph-deploy (which I'm not fond of) and if that works try again manually and see where it fails. Regards, Christian *Some useful diagnostic information * ceph2:~$ ceph osd tree # idweight type name up/down reweight -1 2 root default -3 1 host ceph2 0 1 osd.0 down0 -2 1 host ceph3 1 1 osd.1 down0 ceph health detail HEALTH_WARN 64 pgs stuck inactive; 64 pgs stuck unclean pg 0.22 is stuck inactive since forever, current state creating, last acting [] pg 0.21 is stuck inactive since forever, current state creating, last acting [] pg 0.20 is stuck inactive since forever, current state creating, last acting [] ceph -s cluster a04ee359-82f8-44c4-89b5-60811bef3f19 health HEALTH_WARN 64 pgs stuck inactive; 64 pgs stuck unclean monmap e1: 1 mons at {ceph1=192.168.101.41:6789/0}, election epoch 1, quorum 0 ceph1 osdmap e9: 2 osds: 0 up, 0 in pgmap v10: 64 pgs, 1 pools, 0 bytes data, 0 objects 0 kB used, 0 kB / 0 kB avail 64 creating My configurations are as below: sudo nano /etc/ceph/ceph.conf [global] fsid = a04ee359-82f8-44c4-89b5-60811bef3f19 mon initial members = ceph1 mon host = 192.168.101.41 public network = 192.168.101.0/24 auth cluster required = cephx auth service required = cephx auth client required = cephx [osd] osd journal size = 1024 filestore xattr use omap
Re: [ceph-users] rhel7 krbd backported module repo ?
There's this one: http://gitbuilder.ceph.com/kmod-rpm-rhel7beta-x86_64-basic/ref/rhel7/x86_64/ But that hasn't been updated since July. Cheers, Dan On Mon Nov 03 2014 at 5:35:23 AM Alexandre DERUMIER aderum...@odiso.com wrote: Hi, I would like to known if a repository is available for rhel7/centos7 with last krbd module backported ? I known that such module is available in ceph enterprise repos, but is it available for non subscribers ? Regards, Alexandre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rhel7 krbd backported module repo ?
Not that I know of. krbd *fixes* are getting backported to stable kernels regularly though. Thanks. (I was thinking more about new features support like coming discard support in 3.18 for example) - Mail original - De: Ilya Dryomov ilya.dryo...@inktank.com À: Alexandre DERUMIER aderum...@odiso.com Cc: ceph-users ceph-users@lists.ceph.com Envoyé: Lundi 3 Novembre 2014 10:09:14 Objet: Re: [ceph-users] rhel7 krbd backported module repo ? On Mon, Nov 3, 2014 at 7:35 AM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, I would like to known if a repository is available for rhel7/centos7 with last krbd module backported ? I known that such module is available in ceph enterprise repos, but is it available for non subscribers ? Not that I know of. krbd *fixes* are getting backported to stable kernels regularly though. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rhel7 krbd backported module repo ?
http://gitbuilder.ceph.com/kmod-rpm-rhel7beta-x86_64-basic/ref/rhel7/x86_64/ But that hasn't been updated since July. Great ! Thanks! (I think it's build from https://github.com/ceph/ceph-client/tree/rhel7 ?) - Mail original - De: Dan van der Ster daniel.vanders...@cern.ch À: Alexandre DERUMIER aderum...@odiso.com, ceph-users ceph-users@lists.ceph.com Envoyé: Lundi 3 Novembre 2014 10:17:51 Objet: Re: [ceph-users] rhel7 krbd backported module repo ? There's this one: http://gitbuilder.ceph.com/kmod-rpm-rhel7beta-x86_64-basic/ref/rhel7/x86_64/ But that hasn't been updated since July. Cheers, Dan On Mon Nov 03 2014 at 5:35:23 AM Alexandre DERUMIER aderum...@odiso.com wrote: Hi, I would like to known if a repository is available for rhel7/centos7 with last krbd module backported ? I known that such module is available in ceph enterprise repos, but is it available for non subscribers ? Regards, Alexandre __ _ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/ listinfo.cgi/ceph-users-ceph. com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] 0.87 rados df fault
Hello all, I upgraded my cluster to Giant. Everything is working well, but on one mon I get a strange error when I do rados df : root@a-mon:~# rados df 2014-11-03 10:57:15.313618 7ff2434f0700 0 -- :/1009400 10.94.67.202:6789/0 pipe(0xe37890 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0xe37b20).fault pool name category KB objects clones degraded unfound rdrd KB wrwr KB data- 0 88057910 0 0 434686 434686 90533620 metadata- 63991517680 0 0 1852535 1746370585 15900570178050318 wimi-files - 8893618079 99833970 0 0 296284 2747513 18874311 8951883370 wimi-recette-files - 978453 235134 00 0 272389 1321262 498429 1042175 total used 27056765864 19076090 total avail78381176704 total space 105437942568 root@a-mon:~# ceph -v ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) In the same cluster, on another mon, no problem : root@c-mon:/etc/ceph# rados df pool name category KB objects clones degraded unfound rdrd KB wrwr KB data- 0 88056340 0 0 434686 434686 90532050 metadata- 63626517680 0 0 1852535 1746370585 15900450178049886 wimi-files - 8893618079 99833970 0 0 296284 2747513 18874311 8951883370 wimi-recette-files - 978449 235100 00 0 272352 1321225 498232 1042138 total used 27056761472 19075899 total avail78381181096 total space 105437942568 root@c-mon:/etc/ceph# ceph -v ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) Is it a known error ? I can fill a formal bug report if needed. This problem is not important, but I fear implications outside of rados df. Regards, -- Thomas Lemarchand Cloud Solutions SAS - Responsable des systèmes d'information -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Fwd: Error creating monitors
Can someone please help out. I am stuck Regards, Sakhi Hadebe Engineer: South African National Research Network (SANReN)Competency Area, Meraka, CSIR Tel: +27 12 841 2308 Fax: +27 12 841 4223 Cell: +27 71 331 9622 Email: shad...@csir.co.za Sakhi Hadebe 10/31/2014 1:28 PM Hi Support, I attempt to test ceph storage cluster on a 3 node cluster. I have installed Ubuntu 12.04 LTS in all 3 nodes. While attempting to create the monitors fro node 2 and node3, I am getting the error below: [ceph-node3][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory But mon.ceph1 gets created with no errors. What could I be doing wrong? These commands are executed on the primary node, node1. Please help. Regards, Sakhi Hadebe Engineer: South African National Research Network (SANReN)Competency Area, Meraka, CSIR Tel: +27 12 841 2308 Fax: +27 12 841 4223 Cell: +27 71 331 9622 Email: shad...@csir.co.za -- This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard. The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html. This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. Please consider the environment before printing this email. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 0.87 rados df fault
Update : this error is linked to a crashed mon. It crashed during the weekend. I try to understand why. I never had a mon crash before Giant. -- Thomas Lemarchand Cloud Solutions SAS - Responsable des systèmes d'information On lun., 2014-11-03 at 11:08 +0100, Thomas Lemarchand wrote: Hello all, I upgraded my cluster to Giant. Everything is working well, but on one mon I get a strange error when I do rados df : root@a-mon:~# rados df 2014-11-03 10:57:15.313618 7ff2434f0700 0 -- :/1009400 10.94.67.202:6789/0 pipe(0xe37890 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0xe37b20).fault pool name category KB objects clones degraded unfound rdrd KB wrwr KB data- 0 88057910 0 0 434686 434686 90533620 metadata- 63991517680 0 0 1852535 1746370585 15900570178050318 wimi-files - 8893618079 99833970 0 0 296284 2747513 18874311 8951883370 wimi-recette-files - 978453 235134 00 0 272389 1321262 498429 1042175 total used 27056765864 19076090 total avail78381176704 total space 105437942568 root@a-mon:~# ceph -v ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) In the same cluster, on another mon, no problem : root@c-mon:/etc/ceph# rados df pool name category KB objects clones degraded unfound rdrd KB wrwr KB data- 0 88056340 0 0 434686 434686 90532050 metadata- 63626517680 0 0 1852535 1746370585 15900450178049886 wimi-files - 8893618079 99833970 0 0 296284 2747513 18874311 8951883370 wimi-recette-files - 978449 235100 00 0 272352 1321225 498232 1042138 total used 27056761472 19075899 total avail78381181096 total space 105437942568 root@c-mon:/etc/ceph# ceph -v ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) Is it a known error ? I can fill a formal bug report if needed. This problem is not important, but I fear implications outside of rados df. Regards, -- Thomas Lemarchand Cloud Solutions SAS - Responsable des systèmes d'information -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 0.87 rados df fault
Update : /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746084] [21787] 0 21780 492110 185044 920 240143 0 ceph-mon /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746115] [13136] 0 1313652172 1753 590 0 ceph /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746126] Out of memory: Kill process 21787 (ceph-mon) score 827 or sacrifice child /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746262] Killed process 21787 (ceph-mon) total-vm:1968440kB, anon-rss:740176kB, file-rss:0kB OOM kill. I have 1GB memory on my mons, and 1GB swap. It's the only mon that crashed. Is there a change in memory requirement from Firefly ? Regards, -- Thomas Lemarchand Cloud Solutions SAS - Responsable des systèmes d'information On lun., 2014-11-03 at 11:47 +0100, Thomas Lemarchand wrote: Update : this error is linked to a crashed mon. It crashed during the weekend. I try to understand why. I never had a mon crash before Giant. -- Thomas Lemarchand Cloud Solutions SAS - Responsable des systèmes d'information On lun., 2014-11-03 at 11:08 +0100, Thomas Lemarchand wrote: Hello all, I upgraded my cluster to Giant. Everything is working well, but on one mon I get a strange error when I do rados df : root@a-mon:~# rados df 2014-11-03 10:57:15.313618 7ff2434f0700 0 -- :/1009400 10.94.67.202:6789/0 pipe(0xe37890 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0xe37b20).fault pool name category KB objects clones degraded unfound rdrd KB wrwr KB data- 0 88057910 0 0 434686 434686 90533620 metadata- 63991517680 0 0 1852535 1746370585 15900570178050318 wimi-files - 8893618079 99833970 0 0 296284 2747513 18874311 8951883370 wimi-recette-files - 978453 235134 00 0 272389 1321262 498429 1042175 total used 27056765864 19076090 total avail78381176704 total space 105437942568 root@a-mon:~# ceph -v ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) In the same cluster, on another mon, no problem : root@c-mon:/etc/ceph# rados df pool name category KB objects clones degraded unfound rdrd KB wrwr KB data- 0 88056340 0 0 434686 434686 90532050 metadata- 63626517680 0 0 1852535 1746370585 15900450178049886 wimi-files - 8893618079 99833970 0 0 296284 2747513 18874311 8951883370 wimi-recette-files - 978449 235100 00 0 272352 1321225 498232 1042138 total used 27056761472 19075899 total avail78381181096 total space 105437942568 root@c-mon:/etc/ceph# ceph -v ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) Is it a known error ? I can fill a formal bug report if needed. This problem is not important, but I fear implications outside of rados df. Regards, -- Thomas Lemarchand Cloud Solutions SAS - Responsable des systèmes d'information -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] where to download 0.87 RPMS?
On 11/01/2014 05:10 AM, Patrick McGarry wrote: As I understand it SUSE does their own builds of things. Just on cursory examination it looks like the following repo uses Firefly: https://susestudio.com/a/HVbCUu/master-ceph This is Jan Kalcic's ceph appliance, using packages from: http://download.opensuse.org/repositories/home:/jkalcic:/ceph/ These are built from https://build.opensuse.org/project/show/home:jkalcic:ceph (firefly 0.80.1), which is building against SLE 11 SP3. We've got a slightly newer Firefly (0.80.5) in https://build.opensuse.org/package/show/filesystems/ceph which is building for several versions of openSUSE, but SLES is not presently enabled (I'm not sure why offhand :-/) IIRC there was some discussion among a few of us about having a specific subproject (filesystems:ceph) on build.opensuse.org, where we could offer builds of ceph for various different SUSE Linuxen without implicitly pulling in the 100-odd non-ceph-related packages from the filesystems repo. I'll see about chasing this up. and there is some Calamari work going in here: https://susestudio.com/a/eEqfPk/calamari-opensuse-13-1 This is a Calamari appliance I was experimenting with, using packages from: http://download.opensuse.org/repositories/systemsmanagement:/calamari/ These are quite current, and are built from https://build.opensuse.org/project/show/systemsmanagement:calamari (the calamari stuff is only building for openSUSE, but salt and diamond here are building for SLE 11 SP3, so as to allow SLE 11 SP3 ceph clusters to hook up to a calamari server running on openSUSE). My guess is that the master-ceph repo will be updated to Giant once they have a chance to get to it, but I'm guessing Tim Serong from SUSE could probably shed more light on that if he is available. Yeah, someone needs to get filesystems/ceph on build.opensuse.org updated to Giant (or moved to filesystems:ceph/ceph then updated to Giant), but nobody has had a chance yet. Regards, Tim Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph On Fri, Oct 31, 2014 at 1:55 PM, Sanders, Bill bill.sand...@teradata.com wrote: No SLES rpm's this release or for Firefly. Is there an issue with building for SLES, or is it just no longer targeted? Bill From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Patrick McGarry [patr...@inktank.com] Sent: Friday, October 31, 2014 4:46 AM To: Kenneth Waegeman Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] where to download 0.87 RPMS? Might be worth looking at the new download infrastructure. If you always want the latest you can try: http://download.ceph.com/ceph/latest/ On Oct 31, 2014 6:17 AM, Kenneth Waegeman kenneth.waege...@ugent.be wrote: Thanks. It would be nice though to have a repo where all the packages are. We lock our packages ourselves, so we would just need to bump the version instead of adding a repo for each major version:) - Message from Irek Fasikhov malm...@gmail.com - Date: Thu, 30 Oct 2014 13:37:34 +0400 From: Irek Fasikhov malm...@gmail.com Subject: Re: [ceph-users] where to download 0.87 RPMS? To: Kenneth Waegeman kenneth.waege...@ugent.be Cc: Patrick McGarry patr...@inktank.com, ceph-users ceph-users@lists.ceph.com Hi. Use http://ceph.com/rpm-giant/ 2014-10-30 12:34 GMT+03:00 Kenneth Waegeman kenneth.waege...@ugent.be: Hi, Will http://ceph.com/rpm/ also be updated to have the giant packages? Thanks Kenneth - Message from Patrick McGarry patr...@inktank.com - Date: Wed, 29 Oct 2014 22:13:50 -0400 From: Patrick McGarry patr...@inktank.com Subject: Re: [ceph-users] where to download 0.87 RPMS? To: 廖建锋 de...@f-club.cn Cc: ceph-users ceph-users@lists.ceph.com I have updated the http://ceph.com/get page to reflect a more generic approach to linking. It's also worth noting that the new http://download.ceph.com/ infrastructure is available now. To get to the rpms specifically you can either crawl the download.ceph.com tree or use the symlink at http://ceph.com/rpm-giant/ Hope that (and the updated linkage on ceph.com/get) helps. Thanks! Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph On Wed, Oct 29, 2014 at 9:15 PM, 廖建锋 de...@f-club.cn wrote: ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com - End message from Patrick McGarry patr...@inktank.com - -- Met vriendelijke groeten, Kenneth Waegeman
Re: [ceph-users] giant release osd down
On Mon, 3 Nov 2014, Mark Kirkwood wrote: On 03/11/14 14:56, Christian Balzer wrote: On Sun, 2 Nov 2014 14:07:23 -0800 (PST) Sage Weil wrote: On Mon, 3 Nov 2014, Christian Balzer wrote: c) But wait, you specified a pool size of 2 in your OSD section! Tough luck, because since Firefly there is a bug that at the very least prevents OSD and RGW parameters from being parsed outside the global section (which incidentally is what the documentation you cited suggests...) It just needs to be in the [mon] or [global] section. While that is true for the pool default values and even documented (not the [mon] bit from a quick glance though) wouldn't you agree that having osd* parameters that don't work inside the [osd] section to be at the very least very non-intuitive? Also as per the below thread, clearly something more systemic is going on with config parsing: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg13859.html Ah, I missed that thread. Sounds like three separate bugs: - pool defaults not used for initial pools - osd_mkfs_type not respected by ceph-disk - osd_* settings not working The last one is a real shock; I would expect all kinds of things to break very badly if the [osd] section config behavior was not working. I take it you mean these options: osd_op_threads = 10 osd_scrub_load_threshold = 2.5 How did you determine that they weren't taking effect? You can do 'ceph daemon osd.NNN config show | grep osd_op_threads' to see the value in teh running process. If you have a moment, can you also open tickets in teh tracker for the other two? Thanks! sage +1 I'd like to see clear(er) descriptions (and perhaps enforcement?) of which parameters go in which section. I'm with Christian on this - osd* params that don't work inside the [osd] section is just a foot gun for new (ahem - and not so new) users! regards Mark ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs survey results
In the Ceph session at the OpenStack summit someone asked what the CephFS survey results looked like. Here's the link: https://www.surveymonkey.com/results/SM-L5JV7WXL/ In short, people want fsck multimds snapshots quotas sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] giant release osd down
On Mon, 3 Nov 2014 06:02:08 -0800 (PST) Sage Weil wrote: On Mon, 3 Nov 2014, Mark Kirkwood wrote: On 03/11/14 14:56, Christian Balzer wrote: On Sun, 2 Nov 2014 14:07:23 -0800 (PST) Sage Weil wrote: On Mon, 3 Nov 2014, Christian Balzer wrote: c) But wait, you specified a pool size of 2 in your OSD section! Tough luck, because since Firefly there is a bug that at the very least prevents OSD and RGW parameters from being parsed outside the global section (which incidentally is what the documentation you cited suggests...) It just needs to be in the [mon] or [global] section. While that is true for the pool default values and even documented (not the [mon] bit from a quick glance though) wouldn't you agree that having osd* parameters that don't work inside the [osd] section to be at the very least very non-intuitive? Also as per the below thread, clearly something more systemic is going on with config parsing: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg13859.html Ah, I missed that thread. Sounds like three separate bugs: - pool defaults not used for initial pools Precisely. Not as much of a biggy with Giant, as only RBD gets created by default and that is easily deleted and re-created. But counter-intuitive nevertheless. - osd_mkfs_type not respected by ceph-disk If that is what ceph-deploy uses (and doesn't overwrite internally), yes. - osd_* settings not working The * I can not be sure of, but for the options below, yes. Also read the entire thread, this at the very least also affects radosgw settings. The last one is a real shock; I would expect all kinds of things to break very badly if the [osd] section config behavior was not working. Most of those not working will actually have little, immediately noticeable impact. I take it you mean these options: osd_op_threads = 10 osd_scrub_load_threshold = 2.5 How did you determine that they weren't taking effect? You can do 'ceph daemon osd.NNN config show | grep osd_op_threads' to see the value in teh running process. I did exactly that (back in emperor times) and again now (see the last mail by me in that thread). If you have a moment, can you also open tickets in teh tracker for the other two? I will, probably not before Wednesday though. I was basically waiting for somebody from the dev team to pipe up, like in this mail. It's probably bothersome to monitor a ML for stuff like this, but on the other hand if the official stance is all bug reports to the tracker then expect to comb through a lot (more than already) brainfarts in there. Christian +1 I'd like to see clear(er) descriptions (and perhaps enforcement?) of which parameters go in which section. I'm with Christian on this - osd* params that don't work inside the [osd] section is just a foot gun for new (ahem - and not so new) users! regards Mark -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] emperor - firefly 0.80.7 upgrade problem
Hi All, I upgraded from emperor to firefly. Initial upgrade went smoothly and all placement groups were active+clean . Next I executed 'ceph osd crush tunables optimal' to upgrade CRUSH mapping. Now I keep having OSDs go down or have requests blocked for long periods of time. I start back up the down OSDs and recovery eventually stops, but with 100s of incomplete and down+incomplete pgs remaining. The ceph web page says If you see this state [incomplete], report a bug, and try to start any failed OSDs that may contain the needed information. Well, all the OSDs are up, though some have blocked requests. Also, the logs of the OSDs which go down have this message: 2014-11-02 21:46:33.615829 7ffcf0421700 0 -- 192.168.164.192:6810/31314 192.168.164.186:6804/20934 pipe(0x2faa0280 sd=261 :6810 s=2 pgs=9 19 cs=25 l=0 c=0x2ed022c0).fault with nothing to send, going to standby 2014-11-02 21:49:11.440142 7ffce4cf3700 0 -- 192.168.164.192:6810/31314 192.168.164.186:6804/20934 pipe(0xe512a00 sd=249 :6810 s=0 pgs=0 cs=0 l=0 c=0x2a308b00).accept connect_seq 26 vs existing 25 state standby 2014-11-02 21:51:20.085676 7ffcf6e3e700 -1 osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::statePG::RecoveryS tate::Crashed, PG::RecoveryState::RecoveryMachine::my_context)' thread 7ffcf6e3e700 time 2014-11-02 21:51:20.052242 osd/PG.cc: 5424: FAILED assert(0 == we got a bad state machine event) ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: (PG::RecoveryState::Crashed::Crashed(boost::statechart::statePG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine, boost::mpl: :listmpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_: :na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, (boost::statechart::history_mode)0::my_context)+0x12f) [0x87c6ef] 2: /usr/bin/ceph-osd() [0x8aeae9] 3: (boost::statechart::detail::reaction_result boost::statechart::simple_statePG::RecoveryState::Started, PG::RecoveryState::RecoveryMachin e, PG::RecoveryState::Start, (boost::statechart::history_mode)0::local_react_impl_non_empty::local_react_implboost::mpl::list2boost::state chart::custom_reactionPG::IntervalFlush, boost::statechart::transitionboost::statechart::event_base, PG::RecoveryState::Crashed, boost::st atechart::detail::no_contextboost::statechart::event_base, boost::statechart::detail::no_contextboost::statechart::event_base::no_functi on , boost::statechart::simple_statePG::RecoveryState::Started, PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start, (boost::stat echart::history_mode)0 (boost::statechart::simple_statePG::RecoveryState::Started, PG::RecoveryState::RecoveryMachine, PG::RecoveryState:: Start, (boost::statechart::history_mode)0, boost::statechart::event_base const, void const*)+0xbf) [0x8dd3ff] 4: (boost::statechart::detail::reaction_result boost::statechart::simple_statePG::RecoveryState::Started, PG::RecoveryState::RecoveryMachin e, PG::RecoveryState::Start, (boost::statechart::history_mode)0::local_react_impl_non_empty::local_react_implboost::mpl::list3boost::state chart::custom_reactionPG::FlushedEvt, boost::statechart::custom_reactionPG::IntervalFlush, boost::statechart::transitionboost::statechar t::event_base, PG::RecoveryState::Crashed, boost::statechart::detail::no_contextboost::statechart::event_base, boost::statechart::detail:: no_contextboost::statechart::event_base::no_function , boost::statechart::simple_statePG::RecoveryState::Started, PG::RecoveryState::Rec overyMachine, PG::RecoveryState::Start, (boost::statechart::history_mode)0 (boost::statechart::simple_statePG::RecoveryState::Started, PG: :RecoveryState::RecoveryMachine, PG::RecoveryState::Start, (boost::statechart::history_mode)0, boost::statechart::event_base const, void c onst*)+0x57) [0x8dd4e7] 5: (boost::statechart::detail::reaction_result boost::statechart::simple_statePG::RecoveryState::Started, PG::RecoveryState::RecoveryMachin e, PG::RecoveryState::Start, (boost::statechart::history_mode)0::local_react_impl_non_empty::local_react_implboost::mpl::list5boost::state chart::custom_reactionPG::AdvMap, boost::statechart::custom_reactionPG::NullEvt, boost::statechart::custom_reactionPG::FlushedEvt, boos t::statechart::custom_reactionPG::IntervalFlush, boost::statechart::transitionboost::statechart::event_base, PG::RecoveryState::Crashed, b oost::statechart::detail::no_contextboost::statechart::event_base, boost::statechart::detail::no_contextboost::statechart::event_base::n o_function , boost::statechart::simple_statePG::RecoveryState::Started, PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start, (boo st::statechart::history_mode)0 (boost::statechart::simple_statePG::RecoveryState::Started, PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start, (boost::statechart::history_mode)0, boost::statechart::event_base const,
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
P.S. The OSDs interacted with some 3.14 krbd clients before I realized that kernel version was too old for the firefly CRUSH map. Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Swift + radosgw: How do I find accounts/containers/objects limitation?
Thanks. I think the limit is 100 by default and it can be disabled. As far as I understand, there are no object limit on radosgw side of things only from Swift end (i.e. 5GB) right? In short, if someone tries to upload a 1TB of object onto Swift + RadosGW, it has to be truncated at the Swift API layer using -segment-size of 5GB but there's no hard limitation imposed by radosgw... correct? --Narendra From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Daniel Schneller Sent: Saturday, November 01, 2014 7:15 PM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Swift + radosgw: How do I find accounts/containers/objects limitation? To remove the max_bucket limit I used radosgw-admin user modify --uid=username --max-buckets=0 Off the top of my head, I think radosgw-admin user info --uid=username will show you the current values without changing anything. See also this thread I started about this topic a few weeks ago. https://www.mail-archive.com/ceph-users@lists.ceph.com/msg12840.html Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
On Mon, Nov 3, 2014 at 7:46 AM, Chad Seys cws...@physics.wisc.edu wrote: Hi All, I upgraded from emperor to firefly. Initial upgrade went smoothly and all placement groups were active+clean . Next I executed 'ceph osd crush tunables optimal' to upgrade CRUSH mapping. Okay...you know that's a data movement command, right? So you should expect it to impact operations. (Although not the crashes you're witnessing.) Now I keep having OSDs go down or have requests blocked for long periods of time. I start back up the down OSDs and recovery eventually stops, but with 100s of incomplete and down+incomplete pgs remaining. The ceph web page says If you see this state [incomplete], report a bug, and try to start any failed OSDs that may contain the needed information. Well, all the OSDs are up, though some have blocked requests. Also, the logs of the OSDs which go down have this message: 2014-11-02 21:46:33.615829 7ffcf0421700 0 -- 192.168.164.192:6810/31314 192.168.164.186:6804/20934 pipe(0x2faa0280 sd=261 :6810 s=2 pgs=9 19 cs=25 l=0 c=0x2ed022c0).fault with nothing to send, going to standby 2014-11-02 21:49:11.440142 7ffce4cf3700 0 -- 192.168.164.192:6810/31314 192.168.164.186:6804/20934 pipe(0xe512a00 sd=249 :6810 s=0 pgs=0 cs=0 l=0 c=0x2a308b00).accept connect_seq 26 vs existing 25 state standby 2014-11-02 21:51:20.085676 7ffcf6e3e700 -1 osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::statePG::RecoveryS tate::Crashed, PG::RecoveryState::RecoveryMachine::my_context)' thread 7ffcf6e3e700 time 2014-11-02 21:51:20.052242 osd/PG.cc: 5424: FAILED assert(0 == we got a bad state machine event) These failures are usually the result of adjusting tunables without having upgraded all the machines in the cluster — although they should also be fixed in v0.80.7. Are you still seeing crashes, or just the PG state issues? -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 0.87 rados df fault
On Mon, Nov 3, 2014 at 4:40 AM, Thomas Lemarchand thomas.lemarch...@cloud-solutions.fr wrote: Update : /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746084] [21787] 0 21780 492110 185044 920 240143 0 ceph-mon /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746115] [13136] 0 1313652172 1753 590 0 ceph /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746126] Out of memory: Kill process 21787 (ceph-mon) score 827 or sacrifice child /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746262] Killed process 21787 (ceph-mon) total-vm:1968440kB, anon-rss:740176kB, file-rss:0kB OOM kill. I have 1GB memory on my mons, and 1GB swap. It's the only mon that crashed. Is there a change in memory requirement from Firefly ? There generally shouldn't be, but I don't think it's something we monitored closely. More likely your monitor was running near its memory limit already and restarting all the OSDs (and servicing the resulting changes) pushed it over the edge. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] giant release osd down
Thanks for the comments guys I'm going to deploy it from scratch and this time ill capture every price of debug information. Hopefully this will give me the reasons why ,, thanks On Mon, Nov 3, 2014 at 7:01 PM, Ian Colle ico...@redhat.com wrote: Christian, Why are you not fond of ceph-deploy? Ian R. Colle Global Director of Software Engineering Red Hat (Inktank is now part of Red Hat!) http://www.linkedin.com/in/ircolle http://www.twitter.com/ircolle Cell: +1.303.601.7713 Email: ico...@redhat.com - Original Message - From: Christian Balzer ch...@gol.com To: ceph-us...@ceph.com Cc: Shiv Raj Singh virk.s...@gmail.com Sent: Sunday, November 2, 2014 8:37:18 AM Subject: Re: [ceph-users] giant release osd down Hello, On Mon, 3 Nov 2014 00:48:20 +1300 Shiv Raj Singh wrote: Hi All I am new to ceph and I have been trying to configure 3 node ceph cluster with 1 monitor and 2 osd nodes. I have reinstall and recreated the cluster three teams and I ma stuck against the wall . My monitor is working as desired (I guess) but the status of the ods is down. I am following this link http://docs.ceph.com/docs/v0.80.5/install/manual-deployment/ for configuring the osd. The reason why I am not using ceph-deply is because I want to understand the technology. can someone please help e udnerstand what im doing wrong !! :-) !! a) You're using OSS. Caveat emperor and so forth. In particular you seem to be following documentation for Firefly while the 64 PGs below indicate that you're actually installing Giant. b) Since Firefly Ceph defaults to a replication size of 3, so 2 OSD won't do. c) But wait, you specified a pool size of 2 in your OSD section! Tough luck, because since Firefly there is a bug that at the very least prevents OSD and RGW parameters from being parsed outside the global section (which incidentally is what the documentation you cited suggests...) d) Your OSDs are down, so all of the above is (kinda) pointless. So without further info (log files, etc) we won't be able to help you much. My suggestion would be to take the above to heart, try with ceph-deploy (which I'm not fond of) and if that works try again manually and see where it fails. Regards, Christian *Some useful diagnostic information * ceph2:~$ ceph osd tree # idweight type name up/down reweight -1 2 root default -3 1 host ceph2 0 1 osd.0 down0 -2 1 host ceph3 1 1 osd.1 down0 ceph health detail HEALTH_WARN 64 pgs stuck inactive; 64 pgs stuck unclean pg 0.22 is stuck inactive since forever, current state creating, last acting [] pg 0.21 is stuck inactive since forever, current state creating, last acting [] pg 0.20 is stuck inactive since forever, current state creating, last acting [] ceph -s cluster a04ee359-82f8-44c4-89b5-60811bef3f19 health HEALTH_WARN 64 pgs stuck inactive; 64 pgs stuck unclean monmap e1: 1 mons at {ceph1=192.168.101.41:6789/0}, election epoch 1, quorum 0 ceph1 osdmap e9: 2 osds: 0 up, 0 in pgmap v10: 64 pgs, 1 pools, 0 bytes data, 0 objects 0 kB used, 0 kB / 0 kB avail 64 creating My configurations are as below: sudo nano /etc/ceph/ceph.conf [global] fsid = a04ee359-82f8-44c4-89b5-60811bef3f19 mon initial members = ceph1 mon host = 192.168.101.41 public network = 192.168.101.0/24 auth cluster required = cephx auth service required = cephx auth client required = cephx [osd] osd journal size = 1024 filestore xattr use omap = true osd pool default size = 2 osd pool default min size = 1 osd pool default pg num = 333 osd pool default pgp num = 333 osd crush chooseleaf type = 1 [mon.ceph1] host = ceph1 mon addr = 192.168.101.41:6789 [osd.0] host = ceph2 #devs = {path-to-device} [osd.1] host = ceph3 #devs = {path-to-device} .. OSD mount location On ceph2 /dev/sdb1 5.0G 1.1G 4.0G 21% /var/lib/ceph/osd/ceph-0 on Ceph3 /dev/sdb1 5.0G 1.1G 4.0G 21% /var/lib/ceph/osd/ceph-1 My Linux OS lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 14.04 LTS Release:14.04 Codename: trusty Regards Shiv -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Swift + radosgw: How do I find accounts/containers/objects limitation?
On Mon, Nov 3, 2014 at 9:37 AM, Narendra Trivedi (natrived) natri...@cisco.com wrote: Thanks. I think the limit is 100 by default and it can be disabled. As far as I understand, there are no object limit on radosgw side of things only from Swift end (i.e. 5GB) ….right? In short, if someone tries to upload a 1TB of object onto Swift + RadosGW, it has to be truncated at the Swift API layer using –segment-size of 5GB but there’s no hard limitation imposed by radosgw… correct? radosgw won't allow a segment that is larger than 5GB either. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
[ Re-adding the list. ] On Mon, Nov 3, 2014 at 10:49 AM, Chad Seys cws...@physics.wisc.edu wrote: Next I executed 'ceph osd crush tunables optimal' to upgrade CRUSH mapping. Okay...you know that's a data movement command, right? Yes. So you should expect it to impact operations. These failures are usually the result of adjusting tunables without having upgraded all the machines in the cluster — although they should also be fixed in v0.80.7. Are you still seeing crashes, or just the PG state issues? Still getting crashes. I believe all nodes are running 0.80.7 . Does ceph have a command to check this? (Otherwise I'll do an ssh-many to check.) There's a ceph osd metadata command, but i don't recall if it's in Firefly or only giant. :) Thanks! C. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
There's a ceph osd metadata command, but i don't recall if it's in Firefly or only giant. :) It's in firefly. Thanks, very handy. All the OSDs are running 0.80.7 at the moment. What next? Thanks again, Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
Okay, assuming this is semi-predictable, can you start up one of the OSDs that is going to fail with debug osd = 20, debug filestore = 20, and debug ms = 1 in the config file and then put the OSD log somewhere accessible after it's crashed? Can you also verify that all of your monitors are running firefly, and then issue the command ceph scrub and report the output? -Greg On Mon, Nov 3, 2014 at 11:07 AM, Chad Seys cws...@physics.wisc.edu wrote: There's a ceph osd metadata command, but i don't recall if it's in Firefly or only giant. :) It's in firefly. Thanks, very handy. All the OSDs are running 0.80.7 at the moment. What next? Thanks again, Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
On Monday, November 03, 2014 13:22:47 you wrote: Okay, assuming this is semi-predictable, can you start up one of the OSDs that is going to fail with debug osd = 20, debug filestore = 20, and debug ms = 1 in the config file and then put the OSD log somewhere accessible after it's crashed? Alas, I have not yet noticed a pattern. Only thing I think is true is that they go down when I first make CRUSH changes. Then after restarting, they run without going down again. All the OSDs are running at the moment. What I've been doing is marking OUT the OSDs on which a request is blocked, letting the PGs recover, (drain the OSD of PGs completely), then remove and readd the OSD. So far OSDs treated this way no longer have blocked requests. Also, seems as though that slowly decreases the number of incomplete and down+incomplete PGs . Can you also verify that all of your monitors are running firefly, and then issue the command ceph scrub and report the output? Sure, should I wait until the current rebalancing is finished? Thanks, Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
On Mon, Nov 3, 2014 at 11:41 AM, Chad Seys cws...@physics.wisc.edu wrote: On Monday, November 03, 2014 13:22:47 you wrote: Okay, assuming this is semi-predictable, can you start up one of the OSDs that is going to fail with debug osd = 20, debug filestore = 20, and debug ms = 1 in the config file and then put the OSD log somewhere accessible after it's crashed? Alas, I have not yet noticed a pattern. Only thing I think is true is that they go down when I first make CRUSH changes. Then after restarting, they run without going down again. All the OSDs are running at the moment. Oh, interesting. What CRUSH changes exactly are you making that are spawning errors? What I've been doing is marking OUT the OSDs on which a request is blocked, letting the PGs recover, (drain the OSD of PGs completely), then remove and readd the OSD. So far OSDs treated this way no longer have blocked requests. Also, seems as though that slowly decreases the number of incomplete and down+incomplete PGs . Can you also verify that all of your monitors are running firefly, and then issue the command ceph scrub and report the output? Sure, should I wait until the current rebalancing is finished? I don't think it should matter, although I confess I'm not sure how much monitor load the scrubbing adds. (It's a monitor check; doesn't hit the OSDs at all.) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
On Monday, November 03, 2014 13:50:05 you wrote: On Mon, Nov 3, 2014 at 11:41 AM, Chad Seys cws...@physics.wisc.edu wrote: On Monday, November 03, 2014 13:22:47 you wrote: Okay, assuming this is semi-predictable, can you start up one of the OSDs that is going to fail with debug osd = 20, debug filestore = 20, and debug ms = 1 in the config file and then put the OSD log somewhere accessible after it's crashed? Alas, I have not yet noticed a pattern. Only thing I think is true is that they go down when I first make CRUSH changes. Then after restarting, they run without going down again. All the OSDs are running at the moment. Oh, interesting. What CRUSH changes exactly are you making that are spawning errors? Maybe I miswrote: I've been marking OUT OSDs with blocked requests. Then if a OSD becomes too_full I use 'ceph osd reweight' to squeeze blocks off of the too_full OSD. (Maybe that is not technically a CRUSH map change?) I don't think it should matter, although I confess I'm not sure how much monitor load the scrubbing adds. (It's a monitor check; doesn't hit the OSDs at all.) $ ceph scrub No output. Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
On Mon, Nov 3, 2014 at 12:28 PM, Chad Seys cws...@physics.wisc.edu wrote: On Monday, November 03, 2014 13:50:05 you wrote: On Mon, Nov 3, 2014 at 11:41 AM, Chad Seys cws...@physics.wisc.edu wrote: On Monday, November 03, 2014 13:22:47 you wrote: Okay, assuming this is semi-predictable, can you start up one of the OSDs that is going to fail with debug osd = 20, debug filestore = 20, and debug ms = 1 in the config file and then put the OSD log somewhere accessible after it's crashed? Alas, I have not yet noticed a pattern. Only thing I think is true is that they go down when I first make CRUSH changes. Then after restarting, they run without going down again. All the OSDs are running at the moment. Oh, interesting. What CRUSH changes exactly are you making that are spawning errors? Maybe I miswrote: I've been marking OUT OSDs with blocked requests. Then if a OSD becomes too_full I use 'ceph osd reweight' to squeeze blocks off of the too_full OSD. (Maybe that is not technically a CRUSH map change?) No, it is a change, I just want to make sure I understand the scenario. So you're reducing CRUSH weights on full OSDs, and then *other* OSDs are crashing on these bad state machine events? I don't think it should matter, although I confess I'm not sure how much monitor load the scrubbing adds. (It's a monitor check; doesn't hit the OSDs at all.) $ ceph scrub No output. Oh, yeah, I think that output goes to the central log at a later time. (Will show up in ceph -w if you're watching, or can be accessed from the monitor nodes; in their data directory I think?) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Giant not fixed RepllicatedPG:NotStrimming?
Can you reproduce with debug osd = 20 debug filestore = 20 debug ms = 1 In the [osd] section of that osd's ceph.conf? -Sam On Sun, Nov 2, 2014 at 9:10 PM, Ta Ba Tuan tua...@vccloud.vn wrote: Hi Sage, Samuel All, I upgraded to GAINT, but still appearing that errors |: I'm trying on deleting related objects/volumes, but very hard to verify missing objects :(. Guide me to resolve it, please! (I send attached detail log). 2014-11-03 11:37:57.730820 7f28fb812700 0 osd.21 105950 do_command r=0 2014-11-03 11:37:57.856578 7f28fc013700 -1 *** Caught signal (Segmentation fault) ** in thread 7f28fc013700 ceph version 0.87-6-gdba7def (dba7defc623474ad17263c9fccfec60fe7a439f0) 1: /usr/bin/ceph-osd() [0x9b6725] 2: (()+0xfcb0) [0x7f291fc2acb0] 3: (ReplicatedPG::trim_object(hobject_t const)+0x395) [0x811b55] 4: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const)+0x43e) [0x82b9be] 5: (boost::statechart::simple_stateReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::listmpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, (boost::statechart::history_mode)0::react_impl(boost::statechart::event_base const, void const*)+0xc0) [0x870ce0] 6: (boost::statechart::state_machineReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::null_exception_translator::process_queued_events()+0xfb) [0x85618b] 7: (boost::statechart::state_machineReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::null_exception_translator::process_event(boost::statechart::event_base const)+0x1e) [0x85633e] 8: (ReplicatedPG::snap_trimmer()+0x4f8) [0x7d5ef8] 9: (OSD::SnapTrimWQ::_process(PG*)+0x14) [0x673ab4] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0x48e) [0xa8fade] 11: (ThreadPool::WorkThread::entry()+0x10) [0xa92870] 12: (()+0x7e9a) [0x7f291fc22e9a] 13: (clone()+0x6d) [0x7f291e5ed31d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. -9993 2014-11-03 11:37:47.689335 7f28fc814700 1 -- 172.30.5.2:6803/7606 -- 172.30.5.1:6886/3511 -- MOSDPGPull(6.58e 105950 [PullOp(87f82d8e/rbd_data.45e62779c99cf1.22b5/head//6, recovery_info: ObjectRecoveryInfo(87f82d8e/rbd_data.45e62779c99cf1.22b5/head//6@105938'11622009, copy_subset: [0~18446744073709551615], clone_subset: {}), recovery_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 -- ?+0 0x26c59000 con 0x22fbc420 -2 2014-11-03 11:37:57.853585 7f2902820700 5 osd.21 pg_epoch: 105950 pg[24.9e4( v 105946'113392 lc 105946'113391 (103622'109598,105946'113392] local-les=1 05948 n=88 ec=25000 les/c 105948/105943 105947/105947/105947) [21,112,33] r=0 lpr=105947 pi=105933-105946/4 crt=105946'113392 lcod 0'0 mlcod 0'0 active+recovery _wait+degraded m=1 snaptrimq=[303~3,307~1]] enter Started/Primary/Active/Recovering -1 2014-11-03 11:37:57.853735 7f28fc814700 1 -- 172.30.5.2:6803/7606 -- 172.30.5.9:6806/24552 -- MOSDPGPull(24.9e4 105950 [PullOp(5abb99e4/rbd_data.5dd32 f2ae8944a.0165/head//24, recovery_info: ObjectRecoveryInfo(5abb99e4/rbd_data.5dd32f2ae8944a.0165/head//24@105946'113392, copy_subset: [0 ~18446744073709551615], clone_subset: {}), recovery_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_c omplete:false))]) v2 -- ?+0 0x229e7e00 con 0x22fb7000 0 2014-11-03 11:37:57.856578 7f28fc013700 -1 *** Caught signal (Segmentation fault) ** Thanks! -- Tuan HaNoi-VietNam On 11/01/2014 09:21 AM, Ta Ba Tuan wrote: Hi Samuel and Sage, I will upgrde to Giant soon, Thank you so much. -- Tuan HaNoi-VietNam On 11/01/2014 01:10 AM, Samuel Just wrote: You should start by upgrading to giant, many many bug fixes went in between .86 and giant. -Sam On Fri, Oct 31, 2014 at 8:54 AM, Ta Ba Tuan tua...@vccloud.vn wrote: Hi Sage Weil Thank for your repling. Yes, I'm using Ceph v.0.86, I report some related bugs, Hope you help me, 2014-10-31 15:34:52.927965 7f85efb6b700 0 osd.21 104744 do_command r=0 2014-10-31 15:34:53.105533 7f85f036c700 -1 *** Caught signal (Segmentation fault) ** in thread 7f85f036c700 ceph version 0.86-106-g6f8524e (6f8524ef7673ab4448de2e0ff76638deaf03cae8) 1: /usr/bin/ceph-osd() [0x9b6655] 2: (()+0xfcb0) [0x7f8615726cb0] 3: (ReplicatedPG::trim_object(hobject_t const)+0x395) [0x811c25] 4: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const)+0x43e) [0x82baae] 5: (boost::statechart::simple_stateReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::listmpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
No, it is a change, I just want to make sure I understand the scenario. So you're reducing CRUSH weights on full OSDs, and then *other* OSDs are crashing on these bad state machine events? That is right. The other OSDs shutdown sometime later. (Not immediately.) I really haven't tested to see if the OSDs will stay up with if there are no manipulations. Need to wait with the PGs to settle for awhile, which I haven't done yet. I don't think it should matter, although I confess I'm not sure how much monitor load the scrubbing adds. (It's a monitor check; doesn't hit the OSDs at all.) $ ceph scrub No output. Oh, yeah, I think that output goes to the central log at a later time. (Will show up in ceph -w if you're watching, or can be accessed from the monitor nodes; in their data directory I think?) OK. Will doing ceph scrub again result in the same output? If so, I'll run it again and look for output in ceph -w when the migrations have stopped. Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs survey results
On 4 November 2014 01:50, Sage Weil s...@newdream.net wrote: In the Ceph session at the OpenStack summit someone asked what the CephFS survey results looked like. Thanks Sage, that was me! Here's the link: https://www.surveymonkey.com/results/SM-L5JV7WXL/ In short, people want fsck multimds snapshots quotas TBH I'm a bit surprised by a couple of these and hope maybe you guys will apply a certain amount of filtering on this... fsck and quotas were there for me, but multimds and snapshots are what I'd consider icing features - they're nice to have but not on the critical path to using cephfs instead of e.g. nfs in a production setting. I'd have thought stuff like small file performance and gateway support was much more relevant to uptake and positive/pain-free UX. Interested to hear others rationale here. -- Cheers, ~Blairo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
If you have osds that are close to full, you may be hitting 9626. I pushed a branch based on v0.80.7 with the fix, wip-v0.80.7-9626. -Sam On Mon, Nov 3, 2014 at 2:09 PM, Chad Seys cws...@physics.wisc.edu wrote: No, it is a change, I just want to make sure I understand the scenario. So you're reducing CRUSH weights on full OSDs, and then *other* OSDs are crashing on these bad state machine events? That is right. The other OSDs shutdown sometime later. (Not immediately.) I really haven't tested to see if the OSDs will stay up with if there are no manipulations. Need to wait with the PGs to settle for awhile, which I haven't done yet. I don't think it should matter, although I confess I'm not sure how much monitor load the scrubbing adds. (It's a monitor check; doesn't hit the OSDs at all.) $ ceph scrub No output. Oh, yeah, I think that output goes to the central log at a later time. (Will show up in ceph -w if you're watching, or can be accessed from the monitor nodes; in their data directory I think?) OK. Will doing ceph scrub again result in the same output? If so, I'll run it again and look for output in ceph -w when the migrations have stopped. Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] giant release osd down
On 04/11/14 03:02, Sage Weil wrote: On Mon, 3 Nov 2014, Mark Kirkwood wrote: Ah, I missed that thread. Sounds like three separate bugs: - pool defaults not used for initial pools - osd_mkfs_type not respected by ceph-disk - osd_* settings not working The last one is a real shock; I would expect all kinds of things to break very badly if the [osd] section config behavior was not working. I wonder if this sort of thing has escaped notice because ceph-deploy seems to plonk stuff into [global] only, I guess this acts as an implicit encouragement to have everything in there (e.g I note in our production setup that we have the rbd_cache* settings in [global] instead of [client]). regards Mark ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] osd down question
hello, I am running ceph v0.87 for one week, at this week, many osd have marking down, but I run ps -ef | grep osd, I can see the osd process, the osd not really down, then, I check osd log, I see many logs like osd.XX from dead osd.YY,marking down, if the 0.87 will check other osd process ? if some osd is down, then the mon will mark the current to down state ? This will cause a chain reaction, leading to failure of the entire cluster, it is a bug ?___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com