Re: [ceph-users] Ceph Cluster Failures
Hello, On Fri, 17 Mar 2017 02:51:48 + Rich Rocque wrote: > Hi, > > > I talked with the person in charge about your initial feedback and questions. > The thought is to switch to a new setup and I was asked to pass it on and ask > for thoughts on whether this would be sufficient or not. > I assume from the new setup that the current problematic one is also on AWS, so I'd advice to do a proper analysis there before moving to something "new". If you search the ML archives you'll find (few) others that have done similar things and as far as I can recall none were particular successful. A virtualized Ceph is going to be harder to get "right" than a HW based one, doubly so when dealing with AWS network vagaries. I'm unsure if an AWS region can consist of multiple DCs, if so the latencies when doing writes would be bad, but then again it seems your use case is very read-heavy. That all said, the specs for your proposal look good from a (virtual) HW perspective. Christian > > Use case: > Overview: Need to provide shared storage/high-availability for (usually) > low-volume web server instances using distributed, POSIX-compliant > filesystem, running in Amazon Web Services. Database storage is not part of > the cluster. > Logic: We know Ceph is probably overkill for our current use (and probably > also for my future use), so why Ceph? It’s performance, when using CephFS, > and its ability to support RBD (if we ever move to a container approach for > web servers). I’ve tried Amazon EFS (NFS-as-a-service) and GlusterFS (both > NFS and native client), and because of the number of small files we’re > working with, something that takes ~15sec. in Ceph takes several minutes > using other NFS or GlusterFS solutions. > Current Load: ~100 connected clients accessing ~20GB data of e-commerce > related website source software. > Expected Future Load: ~5,000 connected clients access ~1TB data > > Ceph Clients: > Primary Role: Web server & load balancer w/ SSL termination > Hardware Configuration: 1vCPU, 512MB ram, Ubuntu 16.04 LTS (per > website/domain/subdomain: 2ea t2.nano instances, load balanced behind > haproxy, rarely manually-scaling up with new instances during expected load > spikes. After initial “hits,” most of the website stays in local cache, > resulting in generally-few iops against the Ceph cluster.) > > Ceph Clusters: > Overall: 3 Co-located Clusters across 9 servers, spanning 3 AWS Availability > Zones in a single region. 3 MDS per-cluster, 3 MON per cluster, 2 OSD per > cluster. > Hardware Configuration (MON/MDS): r4.large instance class, 2vCPU, ~15GB ram, > “up to 10Gbit” network (“Enhanced Networking” enabled), EBS / SSD for root > (not provisioned-IOPS), Ubuntu 16.04 LTS > Hardware Configuration (OSD): i3.large instance class, 2vCPU, ~15GB ram, “up > to 10Gbit” network (“Enhanced Networking” enabled), EBS/SSD for root (not > provisioned-IOPS, but “EBS optimized” for bandwidth), ~475GB NVMe attached, > ephemeral storage for OSD (co-locating journal and data) > > Proposed Layout: > AZ “A”: > > * Server A-MM (r4.large instance): > * Mon.A & MDS.A for Cluster X > * Mon.A & MDS.A for Cluster Y > * Mon.A & MDS.A for Cluster Z > * Server A-OSD-1 (i3.large instance): > * OSD.0 for Cluster X > * Server A-OSD-2 (i3.large instance): > * OSD.0 for Cluster Z > > > AZ “B”: > > * Server B-MM (r4.large instance): > * Mon.B & MDS.B for Cluster X > * Mon.B & MDS.B for Cluster Y > * Mon.B & MDS.B for Cluster Z > * Server B-OSD-1 (i3.large instance): > * OSD.1 for Cluster X > * Server B-OSD-2 (i3.large instance): > * OSD.0 for Cluster Y > > > AZ “C”: > > * Server C-MM (r4.large instance): > * Mon.B & MDS.B for Cluster X > * Mon.B & MDS.B for Cluster Y > * Mon.B & MDS.B for Cluster Z > * Server C-OSD-1 (i3.large instance): > * OSD.1 for Cluster Y > * Server C-OSD-2 (i3.large instance): > * OSD.1 for Cluster Z > > > Alternative Layout: > Split, by half, the NVMe storage between 2 OSDs, and provide 3ea OSDs per > cluster for higher availability at the expense of disk read-write > performance, and increase the number of clusters to 4. > > > Thank you for your time, > > Rich > > > From: Christian Balzer > Sent: Thursday, March 16, 2017 2:30:49 AM > To: Ceph Users > Cc: Robin H. Johnson; Rich Rocque > Subject: Re: [ceph-users] Ceph Cluster Failures > > > Hello, > > On Thu, 16 Mar 2017 02:44:29 + Robin H. Joh
Re: [ceph-users] Ceph Cluster Failures
Hi, I talked with the person in charge about your initial feedback and questions. The thought is to switch to a new setup and I was asked to pass it on and ask for thoughts on whether this would be sufficient or not. Use case: Overview: Need to provide shared storage/high-availability for (usually) low-volume web server instances using distributed, POSIX-compliant filesystem, running in Amazon Web Services. Database storage is not part of the cluster. Logic: We know Ceph is probably overkill for our current use (and probably also for my future use), so why Ceph? It’s performance, when using CephFS, and its ability to support RBD (if we ever move to a container approach for web servers). I’ve tried Amazon EFS (NFS-as-a-service) and GlusterFS (both NFS and native client), and because of the number of small files we’re working with, something that takes ~15sec. in Ceph takes several minutes using other NFS or GlusterFS solutions. Current Load: ~100 connected clients accessing ~20GB data of e-commerce related website source software. Expected Future Load: ~5,000 connected clients access ~1TB data Ceph Clients: Primary Role: Web server & load balancer w/ SSL termination Hardware Configuration: 1vCPU, 512MB ram, Ubuntu 16.04 LTS (per website/domain/subdomain: 2ea t2.nano instances, load balanced behind haproxy, rarely manually-scaling up with new instances during expected load spikes. After initial “hits,” most of the website stays in local cache, resulting in generally-few iops against the Ceph cluster.) Ceph Clusters: Overall: 3 Co-located Clusters across 9 servers, spanning 3 AWS Availability Zones in a single region. 3 MDS per-cluster, 3 MON per cluster, 2 OSD per cluster. Hardware Configuration (MON/MDS): r4.large instance class, 2vCPU, ~15GB ram, “up to 10Gbit” network (“Enhanced Networking” enabled), EBS / SSD for root (not provisioned-IOPS), Ubuntu 16.04 LTS Hardware Configuration (OSD): i3.large instance class, 2vCPU, ~15GB ram, “up to 10Gbit” network (“Enhanced Networking” enabled), EBS/SSD for root (not provisioned-IOPS, but “EBS optimized” for bandwidth), ~475GB NVMe attached, ephemeral storage for OSD (co-locating journal and data) Proposed Layout: AZ “A”: * Server A-MM (r4.large instance): * Mon.A & MDS.A for Cluster X * Mon.A & MDS.A for Cluster Y * Mon.A & MDS.A for Cluster Z * Server A-OSD-1 (i3.large instance): * OSD.0 for Cluster X * Server A-OSD-2 (i3.large instance): * OSD.0 for Cluster Z AZ “B”: * Server B-MM (r4.large instance): * Mon.B & MDS.B for Cluster X * Mon.B & MDS.B for Cluster Y * Mon.B & MDS.B for Cluster Z * Server B-OSD-1 (i3.large instance): * OSD.1 for Cluster X * Server B-OSD-2 (i3.large instance): * OSD.0 for Cluster Y AZ “C”: * Server C-MM (r4.large instance): * Mon.B & MDS.B for Cluster X * Mon.B & MDS.B for Cluster Y * Mon.B & MDS.B for Cluster Z * Server C-OSD-1 (i3.large instance): * OSD.1 for Cluster Y * Server C-OSD-2 (i3.large instance): * OSD.1 for Cluster Z Alternative Layout: Split, by half, the NVMe storage between 2 OSDs, and provide 3ea OSDs per cluster for higher availability at the expense of disk read-write performance, and increase the number of clusters to 4. Thank you for your time, Rich From: Christian Balzer Sent: Thursday, March 16, 2017 2:30:49 AM To: Ceph Users Cc: Robin H. Johnson; Rich Rocque Subject: Re: [ceph-users] Ceph Cluster Failures Hello, On Thu, 16 Mar 2017 02:44:29 + Robin H. Johnson wrote: > On Thu, Mar 16, 2017 at 02:22:08AM +, Rich Rocque wrote: > > Has anyone else run into this or have any suggestions on how to remedy it? > We need a LOT more info. > Indeed. > > After a couple months of almost no issues, our Ceph cluster has > > started to have frequent failures. Just this week it's failed about > > three times. > > > > The issue appears to be than an MDS or Monitor will fail and then all > > clients hang. After that, all clients need to be forcibly restarted. > - Can you define monitor 'failing' in this case? > - What do the logs contain? > - Is it running out of memory? > - Can you turn up the debug level? > - Has your cluster experienced continual growth and now might be > undersized in some regard? > A single MON failure should not cause any problems to boot. "ceph -s" , "ceph osd tree" and "ceph osd pool ls detail" as well. > > The architecture for our setup is: > Are these virtual machines? The overall specs seem rather like VM > instances rather than hardware. > There are small servers like that, but a valid question indeed. In particular, if it is dedicated HW, FULL specs. > > 3 ea MON, MDS instances (co-located) o
Re: [ceph-users] Ceph Cluster Failures
Hello, On Thu, 16 Mar 2017 02:44:29 + Robin H. Johnson wrote: > On Thu, Mar 16, 2017 at 02:22:08AM +, Rich Rocque wrote: > > Has anyone else run into this or have any suggestions on how to remedy it? > We need a LOT more info. > Indeed. > > After a couple months of almost no issues, our Ceph cluster has > > started to have frequent failures. Just this week it's failed about > > three times. > > > > The issue appears to be than an MDS or Monitor will fail and then all > > clients hang. After that, all clients need to be forcibly restarted. > - Can you define monitor 'failing' in this case? > - What do the logs contain? > - Is it running out of memory? > - Can you turn up the debug level? > - Has your cluster experienced continual growth and now might be > undersized in some regard? > A single MON failure should not cause any problems to boot. "ceph -s" , "ceph osd tree" and "ceph osd pool ls detail" as well. > > The architecture for our setup is: > Are these virtual machines? The overall specs seem rather like VM > instances rather than hardware. > There are small servers like that, but a valid question indeed. In particular, if it is dedicated HW, FULL specs. > > 3 ea MON, MDS instances (co-located) on 2cpu, 4GB RAM servers > What sort of SSD are the monitor datastores on? ('mon data' in the > config) > He doesn't mention SSDs in the MON/MDS context, so we could be looking at something even slower. FULL SPECS. 4GB RAM would be fine for a single MON, but combined with MDS it may be a bit tight. > > 12 ea OSDs (ssd), on 1cpu, 1GB RAM servers > 12 SSDs to a single server, with 1cpu/1GB RAM? That's absurdly low-spec. > How many OSD servers, what SSDs? > I think he means 12 individual servers. Again, there are micro servers like that around, like: https://www.supermicro.com.tw/products/system/2U/2015/SYS-2015TA-HTRF.cfm IF the SSDs are decent, CPU may be tight but 1GB RAM for a combination of OS _and_ OSD is way too little for my taste and experience. Christian > What is the network setup & connectivity between them (hopefully > 10Gbit). > -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Cluster Failures
On Thu, Mar 16, 2017 at 02:22:08AM +, Rich Rocque wrote: > Has anyone else run into this or have any suggestions on how to remedy it? We need a LOT more info. > After a couple months of almost no issues, our Ceph cluster has > started to have frequent failures. Just this week it's failed about > three times. > > The issue appears to be than an MDS or Monitor will fail and then all > clients hang. After that, all clients need to be forcibly restarted. - Can you define monitor 'failing' in this case? - What do the logs contain? - Is it running out of memory? - Can you turn up the debug level? - Has your cluster experienced continual growth and now might be undersized in some regard? > The architecture for our setup is: Are these virtual machines? The overall specs seem rather like VM instances rather than hardware. > 3 ea MON, MDS instances (co-located) on 2cpu, 4GB RAM servers What sort of SSD are the monitor datastores on? ('mon data' in the config) > 12 ea OSDs (ssd), on 1cpu, 1GB RAM servers 12 SSDs to a single server, with 1cpu/1GB RAM? That's absurdly low-spec. How many OSD servers, what SSDs? What is the network setup & connectivity between them (hopefully 10Gbit). -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com