Re: [ceph-users] Multiple journals and an OSD on one SSD doable?
Just used the method in the link you sent me to test one of the EVO 850s, with one job it reached a speed of around 2.5MB/s but it didn't max out until around 32 jobs at 24MB/s: sudo fio --filename=/dev/sdh --direct=1 --sync=1 --rw=write --bs=4k --numjobs=32 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test write: io=1507.4MB, bw=25723KB/s, iops=6430, runt= 60007msec Also tested a Micron 550 we had sitting around and it maxed out at 2.5mb/s, both results conflict with the chart Regards, Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From: Christian Balzer ch...@gol.com To: ceph-us...@ceph.com ceph-us...@ceph.com Cc: cameron.scr...@solnet.co.nz Date: 08/06/2015 02:40 p.m. Subject:Re: [ceph-users] Multiple journals and an OSD on one SSD doable? On Mon, 8 Jun 2015 14:30:17 +1200 cameron.scr...@solnet.co.nz wrote: Thanks for all the feedback. What makes the EVOs unusable? They should have plenty of speed but your link has them at 1.9MB/s, is it just the way they handle O_DIRECT and D_SYNC? Precisely. Read that ML thread for details. And once more, they also are not very endurable. So depending on your usage pattern and Ceph (Ceph itself and the underlying FS) write amplification their TBW/$ will be horrible, costing you more in the end than more expensive, but an order of magnitude more endurable DC SSDs. Not sure if we will be able to spend anymore, we may just have to take the performance hit until we can get more money for the project. You could cheap out with 200GB DC S3700s (half the price), but they will definitely become the bottleneck at a combined max speed of about 700MB/s, as opposed to the 400GB ones at 900MB/s combined. Christian Thanks, Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From: Christian Balzer ch...@gol.com To: ceph-us...@ceph.com ceph-us...@ceph.com Cc: cameron.scr...@solnet.co.nz Date: 08/06/2015 02:00 p.m. Subject:Re: [ceph-users] Multiple journals and an OSD on one SSD doable? Cameron, To offer at least some constructive advice here instead of just all doom and gloom, here's what I'd do: Replace the OS SSDs with 2 400GB Intel DC S3700s (or S3710s). They have enough BW to nearly saturate your network. Put all your journals on them (3 SSD OSD and 3 HDD OSD per). While that's a bad move from a failure domain perspective, your budget probably won't allow for anything better and those are VERY reliable and just as important durable SSDs. This will give you the speed your current setup is capable of, probably limited by the CPU when it comes to SSD pool operations. Christian On Mon, 8 Jun 2015 10:44:06 +0900 Christian Balzer wrote: Hello Cameron, On Mon, 8 Jun 2015 13:13:33 +1200 cameron.scr...@solnet.co.nz wrote: Hi Christian, Yes we have purchased all our hardware, was very hard to convince management/finance to approve it, so some of the stuff we have is a bit cheap. Unfortunate. Both the done deal and the cheapness. We have four storage nodes each with 6 x 6TB Western Digital Red SATA Drives (WD60EFRX-68M) and 6 x 1TB Samsung EVO 850s SSDs and 2x250GB Samsung EVO 850s (for OS raid). CPUs are Intel Atom C2750 @ 2.40GHz (8 Cores) with 32 GB of RAM. We have a 10Gig Network. I wish there was a nice way to say this, but it unfortunately boils down to a You're fooked. There have been many discussions about which SSDs are usable with Ceph, very recently as well. Samsung EVOs (the non DC type for sure) are basically unusable for journals. See the recent thread: Possible improvements for a slow write speed (excluding independent SSD journals) and: http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ for reference. I presume your intention for the 1TB SSDs is for a SSD backed pool? Note that the EVOs have a pretty low (guaranteed) endurance, so aside from needing journal SSDs that actually can do the job, you're looking at wearing them out rather quickly (depending on your use case of course). Now with SSD based OSDs or even HDD based OSDs with SSD journals that CPU looks a bit anemic. More below: The two options we are considering are: 1) Use two of the 1TB SSDs for the spinning disk journals (3 each) and then use the remaining 900+GB of each drive as an OSD to be part of the cache pool. 2) Put the spinning disk journals on the OS
Re: [ceph-users] Multiple journals and an OSD on one SSD doable?
The other option we were considering was putting the journals on the OS SSDs, they are only 250GB and the rest would be for the OS. Is that a decent option? Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From: Somnath Roy somnath@sandisk.com To: cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz, ceph-us...@ceph.com ceph-us...@ceph.com Date: 08/06/2015 09:34 a.m. Subject:RE: [ceph-users] Multiple journals and an OSD on one SSD doable? Cameron, Generally, it’s not a good idea. You want to protect your SSDs used as journal.If any problem on that disk, you will be losing all of your dependent OSDs. I don’t think a bigger journal will gain you much performance , so, default 5 GB journal size should be good enough. If you want to reduce the fault domain and want to put 3 journals on a SSD , go for minimum size and high endurance SSDs for that. Now, if you want to use your rest of space of 1 TB ssd, creating just OSDs will not gain you much (rather may get some burst performance). You may want to consider the following. 1. If your spindle OSD size is much bigger than 900 GB , you don’t want to make all OSDs of similar sizes, cache pool could be one of your option. But, remember, cache pool can wear out your SSDs faster as presently I guess it is not optimizing the extra writes. Sorry, I don’t have exact data as I am yet to test that out. 2. If you want to make all the OSDs of similar sizes and you will be able to create a substantial number of OSDs with your unused SSDs (depends on how big the cluster is), you may want to put all of your primary OSDs to SSD and gain significant performance boost for read. Also, in this case, I don’t think you will be getting any burst performance. Thanks Regards Somnath From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of cameron.scr...@solnet.co.nz Sent: Sunday, June 07, 2015 1:49 PM To: ceph-us...@ceph.com Subject: [ceph-users] Multiple journals and an OSD on one SSD doable? Setting up a Ceph cluster and we want the journals for our spinning disks to be on SSDs but all of our SSDs are 1TB. We were planning on putting 3 journals on each SSD, but that leaves 900+GB unused on the drive, is it possible to use the leftover space as another OSD or will it affect performance too much? Thanks, Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nzAttention: This email may contain information intended for the sole use of the original recipient. Please respect this when sharing or disclosing this email's contents with any third party. If you believe you have received this email in error, please delete it and notify the sender or postmas...@solnetsolutions.co.nz as soon as possible. The content of this email does not necessarily reflect the views of Solnet Solutions Ltd. PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). Attention: This email may contain information intended for the sole use of the original recipient. Please respect this when sharing or disclosing this email's contents with any third party. If you believe you have received this email in error, please delete it and notify the sender or postmas...@solnetsolutions.co.nz as soon as possible. The content of this email does not necessarily reflect the views of Solnet Solutions Ltd. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Multiple journals and an OSD on one SSD doable?
Hi Christian, Yes we have purchased all our hardware, was very hard to convince management/finance to approve it, so some of the stuff we have is a bit cheap. We have four storage nodes each with 6 x 6TB Western Digital Red SATA Drives (WD60EFRX-68M) and 6 x 1TB Samsung EVO 850s SSDs and 2x250GB Samsung EVO 850s (for OS raid). CPUs are Intel Atom C2750 @ 2.40GHz (8 Cores) with 32 GB of RAM. We have a 10Gig Network. The two options we are considering are: 1) Use two of the 1TB SSDs for the spinning disk journals (3 each) and then use the remaining 900+GB of each drive as an OSD to be part of the cache pool. 2) Put the spinning disk journals on the OS SSDs and use the 2 1TB SSDs for the cache pool. In both cases the other 4 1TB SSDs will be part of their own tier. Thanks a lot! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From: Christian Balzer ch...@gol.com To: ceph-us...@ceph.com ceph-us...@ceph.com Cc: cameron.scr...@solnet.co.nz Date: 08/06/2015 12:18 p.m. Subject:Re: [ceph-users] Multiple journals and an OSD on one SSD doable? Hello, On Mon, 8 Jun 2015 09:55:56 +1200 cameron.scr...@solnet.co.nz wrote: The other option we were considering was putting the journals on the OS SSDs, they are only 250GB and the rest would be for the OS. Is that a decent option? You'll be getting a LOT better advice if you're telling us more details. For starters, have you bought the hardware yet? Tell us about your design, how many initial storage nodes, how many HDDs/SSDs per node, what CPUs/RAM/network? What SSDs are we talking about, exact models please. (Both the sizes you mentioned do not ring a bell for DC level SSDs I'm aware of) That said, I'm using Intel DC S3700s for mixed OS and journal use with good results. In your average Ceph storage node, normal OS (logging mostly) activity is a minute drop in the bucket for any decent SSD, so nearly all of it's resources are available to journals. You want to match the number of journals per SSD according to the capabilities of your SSD, HDDs and network. For example 8 HDD OSDs with 2 200GB DC S3700 and a 10Gb/s network is a decent match. The two SSDs at 900MB/s would appear to be the bottleneck, but in reality I'd expect the HDDs to be it. Never mind that you'd be more likely to be IOPS than bandwidth bound. Regards, Christian Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From: Somnath Roy somnath@sandisk.com To: cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz, ceph-us...@ceph.com ceph-us...@ceph.com Date: 08/06/2015 09:34 a.m. Subject:RE: [ceph-users] Multiple journals and an OSD on one SSD doable? Cameron, Generally, it’s not a good idea. You want to protect your SSDs used as journal.If any problem on that disk, you will be losing all of your dependent OSDs. I don’t think a bigger journal will gain you much performance , so, default 5 GB journal size should be good enough. If you want to reduce the fault domain and want to put 3 journals on a SSD , go for minimum size and high endurance SSDs for that. Now, if you want to use your rest of space of 1 TB ssd, creating just OSDs will not gain you much (rather may get some burst performance). You may want to consider the following. 1. If your spindle OSD size is much bigger than 900 GB , you don’t want to make all OSDs of similar sizes, cache pool could be one of your option. But, remember, cache pool can wear out your SSDs faster as presently I guess it is not optimizing the extra writes. Sorry, I don’t have exact data as I am yet to test that out. 2. If you want to make all the OSDs of similar sizes and you will be able to create a substantial number of OSDs with your unused SSDs (depends on how big the cluster is), you may want to put all of your primary OSDs to SSD and gain significant performance boost for read. Also, in this case, I don’t think you will be getting any burst performance. Thanks Regards Somnath From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of cameron.scr...@solnet.co.nz Sent: Sunday, June 07, 2015 1:49 PM To: ceph-us...@ceph.com Subject: [ceph-users] Multiple journals and an OSD on one SSD doable? Setting up a Ceph cluster and we want the journals for our spinning disks to be on SSDs but all of our SSDs are 1TB. We were planning on putting 3 journals on each SSD, but that leaves 900+GB unused on the drive, is it possible to use the leftover space as another OSD or will it affect performance too much
Re: [ceph-users] Multiple journals and an OSD on one SSD doable?
Thanks for all the feedback. What makes the EVOs unusable? They should have plenty of speed but your link has them at 1.9MB/s, is it just the way they handle O_DIRECT and D_SYNC? Not sure if we will be able to spend anymore, we may just have to take the performance hit until we can get more money for the project. Thanks, Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From: Christian Balzer ch...@gol.com To: ceph-us...@ceph.com ceph-us...@ceph.com Cc: cameron.scr...@solnet.co.nz Date: 08/06/2015 02:00 p.m. Subject:Re: [ceph-users] Multiple journals and an OSD on one SSD doable? Cameron, To offer at least some constructive advice here instead of just all doom and gloom, here's what I'd do: Replace the OS SSDs with 2 400GB Intel DC S3700s (or S3710s). They have enough BW to nearly saturate your network. Put all your journals on them (3 SSD OSD and 3 HDD OSD per). While that's a bad move from a failure domain perspective, your budget probably won't allow for anything better and those are VERY reliable and just as important durable SSDs. This will give you the speed your current setup is capable of, probably limited by the CPU when it comes to SSD pool operations. Christian On Mon, 8 Jun 2015 10:44:06 +0900 Christian Balzer wrote: Hello Cameron, On Mon, 8 Jun 2015 13:13:33 +1200 cameron.scr...@solnet.co.nz wrote: Hi Christian, Yes we have purchased all our hardware, was very hard to convince management/finance to approve it, so some of the stuff we have is a bit cheap. Unfortunate. Both the done deal and the cheapness. We have four storage nodes each with 6 x 6TB Western Digital Red SATA Drives (WD60EFRX-68M) and 6 x 1TB Samsung EVO 850s SSDs and 2x250GB Samsung EVO 850s (for OS raid). CPUs are Intel Atom C2750 @ 2.40GHz (8 Cores) with 32 GB of RAM. We have a 10Gig Network. I wish there was a nice way to say this, but it unfortunately boils down to a You're fooked. There have been many discussions about which SSDs are usable with Ceph, very recently as well. Samsung EVOs (the non DC type for sure) are basically unusable for journals. See the recent thread: Possible improvements for a slow write speed (excluding independent SSD journals) and: http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ for reference. I presume your intention for the 1TB SSDs is for a SSD backed pool? Note that the EVOs have a pretty low (guaranteed) endurance, so aside from needing journal SSDs that actually can do the job, you're looking at wearing them out rather quickly (depending on your use case of course). Now with SSD based OSDs or even HDD based OSDs with SSD journals that CPU looks a bit anemic. More below: The two options we are considering are: 1) Use two of the 1TB SSDs for the spinning disk journals (3 each) and then use the remaining 900+GB of each drive as an OSD to be part of the cache pool. 2) Put the spinning disk journals on the OS SSDs and use the 2 1TB SSDs for the cache pool. Cache pools aren't all that speedy currently (research the ML archives), even less so with the SSDs you have. Christian In both cases the other 4 1TB SSDs will be part of their own tier. Thanks a lot! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From: Christian Balzer ch...@gol.com To: ceph-us...@ceph.com ceph-us...@ceph.com Cc: cameron.scr...@solnet.co.nz Date: 08/06/2015 12:18 p.m. Subject:Re: [ceph-users] Multiple journals and an OSD on one SSD doable? Hello, On Mon, 8 Jun 2015 09:55:56 +1200 cameron.scr...@solnet.co.nz wrote: The other option we were considering was putting the journals on the OS SSDs, they are only 250GB and the rest would be for the OS. Is that a decent option? You'll be getting a LOT better advice if you're telling us more details. For starters, have you bought the hardware yet? Tell us about your design, how many initial storage nodes, how many HDDs/SSDs per node, what CPUs/RAM/network? What SSDs are we talking about, exact models please. (Both the sizes you mentioned do not ring a bell for DC level SSDs I'm aware of) That said, I'm using Intel DC S3700s for mixed OS and journal use with good results. In your average Ceph storage node, normal OS (logging mostly) activity is a minute drop in the bucket for any decent SSD, so nearly all of it's resources are available
Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)
It mostly like is the model of switch. In its settings the minimum frame size you can set is 1518, default MTU is 1500, seems the switch wants the 18 byte difference. We are using a pair of Netgear XS712T and bonded pairs of Intel 10-Gigabit X540-AT2 (rev 01) with 3 VLans. Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From: Somnath Roy somnath@sandisk.com To: cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz, Jan Schermer j...@schermer.cz Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com, ceph-users ceph-users-boun...@lists.ceph.com Date: 04/06/2015 11:13 a.m. Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Hmm…Thanks for sharing this.. Any chance it depends on switch ? Could you please share what NIC card and switch you are using ? Thanks Regards Somnath From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] Sent: Wednesday, June 03, 2015 4:07 PM To: Somnath Roy; Jan Schermer Cc: ceph-users@lists.ceph.com; ceph-users Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) The interface MTU has to be 18 or more bytes lower than the switch MTU or it just stops working. As far as I know the monitor communication is not being encapsulated by any SDN. Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From:Somnath Roy somnath@sandisk.com To:Jan Schermer j...@schermer.cz, cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz Cc:ceph-users@lists.ceph.com ceph-users@lists.ceph.com, ceph-users ceph-users-boun...@lists.ceph.com Date:04/06/2015 02:58 a.m. Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) The TCP_NODELAY issue was with kernel rbd *not* with OSD. Ceph messenger code base is setting it by default. BTW, I doubt TCP_NODELAY has anything to do with it. Thanks Regards Somnath From: Jan Schermer [mailto:j...@schermer.cz] Sent: Wednesday, June 03, 2015 1:37 AM To: cameron.scr...@solnet.co.nz Cc: Somnath Roy; ceph-users@lists.ceph.com; ceph-users Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Interface and switch should have the same MTU and that should not cause any issues (setting switch MTU higher is always safe, though). Aren’t you encapsulating the mon communication in some SDN like openwswitch? Is that a straight L2 connection? I think this is worth investigating. For example are mons properly setting TCP_NODELAY on the sockets that are latency sensitive? (I just tried finding out and lsof/netstat doesn’t report that to me, I’d need to restart and strace it… I vaguely remember there was an issue with NODELAY that was fixed on OSD side.) Jan On 03 Jun 2015, at 06:30, cameron.scr...@solnet.co.nz wrote: Seems to be something to do with our switch. If the interface MTU is too close to the switch MTU it stops working. Thanks for all your help :) Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From:Somnath Roy somnath@sandisk.com To:cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz Cc:ceph-users@lists.ceph.com ceph-users@lists.ceph.com, ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis j...@suse.de Date:03/06/2015 11:49 a.m. Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) I doubt it is anything to do with Ceph, hope you checked your switch is supporting Jumbo frames and you have set MTU 9000 to all the devices in between. It‘s better to ping your devices (all the devices participating in the cluster) like the way it mentioned in the following articles , just in case you are not sure. http://www.mylesgray.com/hardware/test-jumbo-frames-working/ http://serverfault.com/questions/234311/testing-whether-jumbo-frames-are-actually-working Hope this helps, Thanks Regards Somnath From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] Sent: Tuesday, June 02, 2015 4:32 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Setting the MTU
[ceph-users] ceph-deploy osd prepare/activate failing with journal on raid device.
I'm trying to set up some OSDs and if I try to use a raid device for the journal disk it fails: http://pastebin.com/mTw6xzNV The main issue I see is that the symlink in /dev/disk/by-partuuid is not being made correctly. When I make it manually and try to activate I still get errors, It seems to think that the journal and OSD are meant to be on the same drive.: http://pastebin.com/CEk8Teys Anyone had this issue before or any suggestions on how to fix it? When I use a non raided device it works perfectly. Thanks, Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz Attention: This email may contain information intended for the sole use of the original recipient. Please respect this when sharing or disclosing this email's contents with any third party. If you believe you have received this email in error, please delete it and notify the sender or postmas...@solnetsolutions.co.nz as soon as possible. The content of this email does not necessarily reflect the views of Solnet Solutions Ltd. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)
Thanks for the links, Jumbo frames are definitely working. Although we had to set the MTU to 8192 because one of the components doesn't support an MTU higher than that. Thanks for the help. Looks like we may just have to deal with jumbo frames being off. Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From: Somnath Roy somnath@sandisk.com To: cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com, ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis j...@suse.de Date: 03/06/2015 11:49 a.m. Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) I doubt it is anything to do with Ceph, hope you checked your switch is supporting Jumbo frames and you have set MTU 9000 to all the devices in between. It‘s better to ping your devices (all the devices participating in the cluster) like the way it mentioned in the following articles , just in case you are not sure. http://www.mylesgray.com/hardware/test-jumbo-frames-working/ http://serverfault.com/questions/234311/testing-whether-jumbo-frames-are-actually-working Hope this helps, Thanks Regards Somnath From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] Sent: Tuesday, June 02, 2015 4:32 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Setting the MTU to 1500 worked, monitors reach quorum right away. Unfortunately we really want Jumbo Frames to be on, any ideas on how to get ceph to work with them on? Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From:Somnath Roy somnath@sandisk.com To:cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz Cc:ceph-users@lists.ceph.com ceph-users@lists.ceph.com, ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis j...@suse.de Date:03/06/2015 10:34 a.m. Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) We have seen some communication issue with that, try to make all the server MTU 1500 and try out… From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] Sent: Tuesday, June 02, 2015 3:31 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) We are running with Jumbo Frames turned on. Is that likely to be the issue? Do I need to configure something in ceph? The mon maps are fine and after setting debug to 10 and debug ms to 1, I see probe timeouts in the logs: http://pastebin.com/44M1uJZc I just set probe timeout to 10 (up from 2) and it still times out. Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From:Somnath Roy somnath@sandisk.com To:Joao Eduardo Luis j...@suse.de, ceph-users@lists.ceph.com ceph-users@lists.ceph.com Date:03/06/2015 03:49 a.m. Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Sent by:ceph-users ceph-users-boun...@lists.ceph.com By any chance are you running with jumbo frame turned on ? Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao Eduardo Luis Sent: Tuesday, June 02, 2015 12:52 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote: I am trying to deploy a new ceph cluster and my monitors are not reaching quorum. SELinux is off, firewalls are off, I can see traffic between the nodes on port 6789 but when I use the admin socket to force a re-election only the monitor I send the request to shows the new election in its logs. My logs are filled entirely of the following two lines: 2015-06-02 11:31:56.447975 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch 2015-06-02 11:31:56.448272 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd
Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)
Setting the MTU to 1500 worked, monitors reach quorum right away. Unfortunately we really want Jumbo Frames to be on, any ideas on how to get ceph to work with them on? Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From: Somnath Roy somnath@sandisk.com To: cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com, ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis j...@suse.de Date: 03/06/2015 10:34 a.m. Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) We have seen some communication issue with that, try to make all the server MTU 1500 and try out… From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] Sent: Tuesday, June 02, 2015 3:31 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) We are running with Jumbo Frames turned on. Is that likely to be the issue? Do I need to configure something in ceph? The mon maps are fine and after setting debug to 10 and debug ms to 1, I see probe timeouts in the logs: http://pastebin.com/44M1uJZc I just set probe timeout to 10 (up from 2) and it still times out. Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From:Somnath Roy somnath@sandisk.com To:Joao Eduardo Luis j...@suse.de, ceph-users@lists.ceph.com ceph-users@lists.ceph.com Date:03/06/2015 03:49 a.m. Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Sent by:ceph-users ceph-users-boun...@lists.ceph.com By any chance are you running with jumbo frame turned on ? Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao Eduardo Luis Sent: Tuesday, June 02, 2015 12:52 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote: I am trying to deploy a new ceph cluster and my monitors are not reaching quorum. SELinux is off, firewalls are off, I can see traffic between the nodes on port 6789 but when I use the admin socket to force a re-election only the monitor I send the request to shows the new election in its logs. My logs are filled entirely of the following two lines: 2015-06-02 11:31:56.447975 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch 2015-06-02 11:31:56.448272 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished You are running on default debug levels, so you'll hardly get anything more than that. I suggest setting 'debug mon = 10' and 'debug ms = 1' for added verbosity and come back to us with the logs. There are many reasons for this, but the more common are due to the monitors not being able to communicate with each other. Given you see traffic between the monitors, I'm inclined to assume that the other two monitors do not have each other on the monmap or, if they do know each other, either 1) the monitor's auth keys do not match, or 2) the probe timeout is being triggered before they successfully manage to find enough monitors to trigger an election -- which may be due to latency. Logs will tells us more. -Joao Querying the admin socket with mon_status (the other two are the similar but with their hostnames and rank): { name: wcm1, rank: 0, state: probing, election_epoch: 1, quorum: [], outside_quorum: [ wcm1 ], extra_probe_peers: [], sync_provider: [], monmap: { epoch: 0, fsid: adb8c500-122e-49fd-9c1e-a99af7832307, modified: 2015-06-02 10:43:41.467811, created: 2015-06-02 10:43:41.467811, mons: [ { rank: 0, name: wcm1, addr: 10.1.226.64:6789\/0 }, { rank: 1, name: wcm2, addr: 10.1.226.65:6789\/0 }, { rank: 2, name: wcm3, addr: 10.1.226.66:6789\/0 } ] } } ___ ceph-users mailing list ceph
Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)
We are running with Jumbo Frames turned on. Is that likely to be the issue? Do I need to configure something in ceph? The mon maps are fine and after setting debug to 10 and debug ms to 1, I see probe timeouts in the logs: http://pastebin.com/44M1uJZc I just set probe timeout to 10 (up from 2) and it still times out. Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From: Somnath Roy somnath@sandisk.com To: Joao Eduardo Luis j...@suse.de, ceph-users@lists.ceph.com ceph-users@lists.ceph.com Date: 03/06/2015 03:49 a.m. Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Sent by:ceph-users ceph-users-boun...@lists.ceph.com By any chance are you running with jumbo frame turned on ? Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao Eduardo Luis Sent: Tuesday, June 02, 2015 12:52 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote: I am trying to deploy a new ceph cluster and my monitors are not reaching quorum. SELinux is off, firewalls are off, I can see traffic between the nodes on port 6789 but when I use the admin socket to force a re-election only the monitor I send the request to shows the new election in its logs. My logs are filled entirely of the following two lines: 2015-06-02 11:31:56.447975 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch 2015-06-02 11:31:56.448272 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished You are running on default debug levels, so you'll hardly get anything more than that. I suggest setting 'debug mon = 10' and 'debug ms = 1' for added verbosity and come back to us with the logs. There are many reasons for this, but the more common are due to the monitors not being able to communicate with each other. Given you see traffic between the monitors, I'm inclined to assume that the other two monitors do not have each other on the monmap or, if they do know each other, either 1) the monitor's auth keys do not match, or 2) the probe timeout is being triggered before they successfully manage to find enough monitors to trigger an election -- which may be due to latency. Logs will tells us more. -Joao Querying the admin socket with mon_status (the other two are the similar but with their hostnames and rank): { name: wcm1, rank: 0, state: probing, election_epoch: 1, quorum: [], outside_quorum: [ wcm1 ], extra_probe_peers: [], sync_provider: [], monmap: { epoch: 0, fsid: adb8c500-122e-49fd-9c1e-a99af7832307, modified: 2015-06-02 10:43:41.467811, created: 2015-06-02 10:43:41.467811, mons: [ { rank: 0, name: wcm1, addr: 10.1.226.64:6789\/0 }, { rank: 1, name: wcm2, addr: 10.1.226.65:6789\/0 }, { rank: 2, name: wcm3, addr: 10.1.226.66:6789\/0 } ] } } ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Attention: This email may contain information intended for the sole use of the original recipient. Please respect this when sharing or disclosing this email's contents with any third party. If you believe you have received this email in error, please delete it and notify the sender or postmas...@solnetsolutions.co.nz as soon as possible
Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)
Seems to be something to do with our switch. If the interface MTU is too close to the switch MTU it stops working. Thanks for all your help :) Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From: Somnath Roy somnath@sandisk.com To: cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com, ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis j...@suse.de Date: 03/06/2015 11:49 a.m. Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) I doubt it is anything to do with Ceph, hope you checked your switch is supporting Jumbo frames and you have set MTU 9000 to all the devices in between. It‘s better to ping your devices (all the devices participating in the cluster) like the way it mentioned in the following articles , just in case you are not sure. http://www.mylesgray.com/hardware/test-jumbo-frames-working/ http://serverfault.com/questions/234311/testing-whether-jumbo-frames-are-actually-working Hope this helps, Thanks Regards Somnath From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] Sent: Tuesday, June 02, 2015 4:32 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Setting the MTU to 1500 worked, monitors reach quorum right away. Unfortunately we really want Jumbo Frames to be on, any ideas on how to get ceph to work with them on? Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From:Somnath Roy somnath@sandisk.com To:cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz Cc:ceph-users@lists.ceph.com ceph-users@lists.ceph.com, ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis j...@suse.de Date:03/06/2015 10:34 a.m. Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) We have seen some communication issue with that, try to make all the server MTU 1500 and try out… From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] Sent: Tuesday, June 02, 2015 3:31 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) We are running with Jumbo Frames turned on. Is that likely to be the issue? Do I need to configure something in ceph? The mon maps are fine and after setting debug to 10 and debug ms to 1, I see probe timeouts in the logs: http://pastebin.com/44M1uJZc I just set probe timeout to 10 (up from 2) and it still times out. Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From:Somnath Roy somnath@sandisk.com To:Joao Eduardo Luis j...@suse.de, ceph-users@lists.ceph.com ceph-users@lists.ceph.com Date:03/06/2015 03:49 a.m. Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Sent by:ceph-users ceph-users-boun...@lists.ceph.com By any chance are you running with jumbo frame turned on ? Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao Eduardo Luis Sent: Tuesday, June 02, 2015 12:52 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote: I am trying to deploy a new ceph cluster and my monitors are not reaching quorum. SELinux is off, firewalls are off, I can see traffic between the nodes on port 6789 but when I use the admin socket to force a re-election only the monitor I send the request to shows the new election in its logs. My logs are filled entirely of the following two lines: 2015-06-02 11:31:56.447975 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch 2015-06-02 11:31:56.448272 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished You are running on default debug levels, so you'll hardly get anything more than
[ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)
I am trying to deploy a new ceph cluster and my monitors are not reaching quorum. SELinux is off, firewalls are off, I can see traffic between the nodes on port 6789 but when I use the admin socket to force a re-election only the monitor I send the request to shows the new election in its logs. My logs are filled entirely of the following two lines: 2015-06-02 11:31:56.447975 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch 2015-06-02 11:31:56.448272 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished Querying the admin socket with mon_status (the other two are the similar but with their hostnames and rank): { name: wcm1, rank: 0, state: probing, election_epoch: 1, quorum: [], outside_quorum: [ wcm1 ], extra_probe_peers: [], sync_provider: [], monmap: { epoch: 0, fsid: adb8c500-122e-49fd-9c1e-a99af7832307, modified: 2015-06-02 10:43:41.467811, created: 2015-06-02 10:43:41.467811, mons: [ { rank: 0, name: wcm1, addr: 10.1.226.64:6789\/0 }, { rank: 1, name: wcm2, addr: 10.1.226.65:6789\/0 }, { rank: 2, name: wcm3, addr: 10.1.226.66:6789\/0 } ] } } Any suggestions on what could be the issue? Regards, Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz Attention: This email may contain information intended for the sole use of the original recipient. Please respect this when sharing or disclosing this email's contents with any third party. If you believe you have received this email in error, please delete it and notify the sender or postmas...@solnetsolutions.co.nz as soon as possible. The content of this email does not necessarily reflect the views of Solnet Solutions Ltd. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com