Re: [ceph-users] Ceph Cluster Failures

2017-03-16 Thread Christian Balzer

Hello,

On Fri, 17 Mar 2017 02:51:48 + Rich Rocque wrote:

> Hi,
> 
> 
> I talked with the person in charge about your initial feedback and questions. 
> The thought is to switch to a new setup and I was asked to pass it on and ask 
> for thoughts on whether this would be sufficient or not.
>
I assume from the new setup that the current problematic one is also on
AWS, so I'd advice to do a proper analysis there before moving to
something "new".

If you search the ML archives you'll find (few) others that have done
similar things and as far as I can recall none were particular successful.

A virtualized Ceph is going to be harder to get "right" than a HW based
one, doubly so when dealing with AWS network vagaries. 
I'm unsure if an AWS region can consist of multiple DCs, if so the
latencies when doing writes would be bad, but then again it seems your use
case is very read-heavy.

That all said, the specs for your proposal look good from a (virtual) HW
perspective. 

Christian
 
> 
> Use case:
> Overview: Need to provide shared storage/high-availability for (usually) 
> low-volume web server instances using distributed, POSIX-compliant 
> filesystem, running in Amazon Web Services. Database storage is not part of 
> the cluster.
> Logic: We know Ceph is probably overkill for our current use (and probably 
> also for my future use), so why Ceph? It’s performance, when using CephFS, 
> and its ability to support RBD (if we ever move to a container approach for 
> web servers). I’ve tried Amazon EFS (NFS-as-a-service) and GlusterFS (both 
> NFS and native client), and because of the number of small files we’re 
> working with, something that takes ~15sec. in Ceph takes several minutes 
> using other NFS or GlusterFS solutions.
> Current Load: ~100 connected clients accessing ~20GB data of e-commerce 
> related website source software.
> Expected Future Load: ~5,000 connected clients access ~1TB data
> 
> Ceph Clients:
> Primary Role: Web server & load balancer w/ SSL termination
> Hardware Configuration: 1vCPU, 512MB ram, Ubuntu 16.04 LTS (per 
> website/domain/subdomain: 2ea t2.nano instances, load balanced behind 
> haproxy, rarely manually-scaling up with new instances during expected load 
> spikes. After initial “hits,” most of the website stays in local cache, 
> resulting in generally-few iops against the Ceph cluster.)
> 
> Ceph Clusters:
> Overall: 3 Co-located Clusters across 9 servers, spanning 3 AWS Availability 
> Zones in a single region. 3 MDS per-cluster, 3 MON per cluster, 2 OSD per 
> cluster.
> Hardware Configuration (MON/MDS): r4.large instance class, 2vCPU, ~15GB ram, 
> “up to 10Gbit” network (“Enhanced Networking” enabled), EBS / SSD for root 
> (not provisioned-IOPS), Ubuntu 16.04 LTS
> Hardware Configuration (OSD): i3.large instance class, 2vCPU, ~15GB ram, “up 
> to 10Gbit” network (“Enhanced Networking” enabled), EBS/SSD for root (not 
> provisioned-IOPS, but “EBS optimized” for bandwidth), ~475GB NVMe attached, 
> ephemeral storage for OSD (co-locating journal and data)
> 
> Proposed Layout:
> AZ “A”:
> 
>   *   Server A-MM (r4.large instance):
>  *   Mon.A & MDS.A for Cluster X
>  *   Mon.A & MDS.A for Cluster Y
>  *   Mon.A & MDS.A for Cluster Z
>   *   Server A-OSD-1 (i3.large instance):
>  *   OSD.0 for Cluster X
>   *   Server A-OSD-2 (i3.large instance):
>  *   OSD.0 for Cluster Z
> 
> 
> AZ “B”:
> 
>   *   Server B-MM (r4.large instance):
>  *   Mon.B & MDS.B for Cluster X
>  *   Mon.B & MDS.B for Cluster Y
>  *   Mon.B & MDS.B for Cluster Z
>   *   Server B-OSD-1 (i3.large instance):
>  *   OSD.1 for Cluster X
>   *   Server B-OSD-2 (i3.large instance):
>  *   OSD.0 for Cluster Y
> 
> 
> AZ “C”:
> 
>   *   Server C-MM (r4.large instance):
>  *   Mon.B & MDS.B for Cluster X
>  *   Mon.B & MDS.B for Cluster Y
>  *   Mon.B & MDS.B for Cluster Z
>   *   Server C-OSD-1 (i3.large instance):
>  *   OSD.1 for Cluster Y
>   *   Server C-OSD-2 (i3.large instance):
>  *   OSD.1 for Cluster Z
> 
> 
> Alternative Layout:
> Split, by half, the NVMe storage between 2 OSDs, and provide 3ea OSDs per 
> cluster for higher availability at the expense of disk read-write 
> performance, and increase the number of clusters to 4.
> 
> 
> Thank you for your time,
> 
> Rich
> 
> 
> From: Christian Balzer 
> Sent: Thursday, March 16, 2017 2:30:49 AM
> To: Ceph Users
> Cc: Robin H. Johnson; Rich Rocque
> Subject: Re: [ceph-users] Ceph Cluster Failures
> 
> 
> Hello,
> 
> On Thu, 16 Mar 2017 02:44:29 + Robin H. Joh

Re: [ceph-users] Ceph Cluster Failures

2017-03-16 Thread Rich Rocque
Hi,


I talked with the person in charge about your initial feedback and questions. 
The thought is to switch to a new setup and I was asked to pass it on and ask 
for thoughts on whether this would be sufficient or not.


Use case:
Overview: Need to provide shared storage/high-availability for (usually) 
low-volume web server instances using distributed, POSIX-compliant filesystem, 
running in Amazon Web Services. Database storage is not part of the cluster.
Logic: We know Ceph is probably overkill for our current use (and probably also 
for my future use), so why Ceph? It’s performance, when using CephFS, and its 
ability to support RBD (if we ever move to a container approach for web 
servers). I’ve tried Amazon EFS (NFS-as-a-service) and GlusterFS (both NFS and 
native client), and because of the number of small files we’re working with, 
something that takes ~15sec. in Ceph takes several minutes using other NFS or 
GlusterFS solutions.
Current Load: ~100 connected clients accessing ~20GB data of e-commerce related 
website source software.
Expected Future Load: ~5,000 connected clients access ~1TB data

Ceph Clients:
Primary Role: Web server & load balancer w/ SSL termination
Hardware Configuration: 1vCPU, 512MB ram, Ubuntu 16.04 LTS (per 
website/domain/subdomain: 2ea t2.nano instances, load balanced behind haproxy, 
rarely manually-scaling up with new instances during expected load spikes. 
After initial “hits,” most of the website stays in local cache, resulting in 
generally-few iops against the Ceph cluster.)

Ceph Clusters:
Overall: 3 Co-located Clusters across 9 servers, spanning 3 AWS Availability 
Zones in a single region. 3 MDS per-cluster, 3 MON per cluster, 2 OSD per 
cluster.
Hardware Configuration (MON/MDS): r4.large instance class, 2vCPU, ~15GB ram, 
“up to 10Gbit” network (“Enhanced Networking” enabled), EBS / SSD for root (not 
provisioned-IOPS), Ubuntu 16.04 LTS
Hardware Configuration (OSD): i3.large instance class, 2vCPU, ~15GB ram, “up to 
10Gbit” network (“Enhanced Networking” enabled), EBS/SSD for root (not 
provisioned-IOPS, but “EBS optimized” for bandwidth), ~475GB NVMe attached, 
ephemeral storage for OSD (co-locating journal and data)

Proposed Layout:
AZ “A”:

  *   Server A-MM (r4.large instance):
 *   Mon.A & MDS.A for Cluster X
 *   Mon.A & MDS.A for Cluster Y
 *   Mon.A & MDS.A for Cluster Z
  *   Server A-OSD-1 (i3.large instance):
 *   OSD.0 for Cluster X
  *   Server A-OSD-2 (i3.large instance):
 *   OSD.0 for Cluster Z


AZ “B”:

  *   Server B-MM (r4.large instance):
 *   Mon.B & MDS.B for Cluster X
 *   Mon.B & MDS.B for Cluster Y
 *   Mon.B & MDS.B for Cluster Z
  *   Server B-OSD-1 (i3.large instance):
 *   OSD.1 for Cluster X
  *   Server B-OSD-2 (i3.large instance):
 *   OSD.0 for Cluster Y


AZ “C”:

  *   Server C-MM (r4.large instance):
 *   Mon.B & MDS.B for Cluster X
 *   Mon.B & MDS.B for Cluster Y
 *   Mon.B & MDS.B for Cluster Z
  *   Server C-OSD-1 (i3.large instance):
 *   OSD.1 for Cluster Y
  *   Server C-OSD-2 (i3.large instance):
 *   OSD.1 for Cluster Z


Alternative Layout:
Split, by half, the NVMe storage between 2 OSDs, and provide 3ea OSDs per 
cluster for higher availability at the expense of disk read-write performance, 
and increase the number of clusters to 4.


Thank you for your time,

Rich


From: Christian Balzer 
Sent: Thursday, March 16, 2017 2:30:49 AM
To: Ceph Users
Cc: Robin H. Johnson; Rich Rocque
Subject: Re: [ceph-users] Ceph Cluster Failures


Hello,

On Thu, 16 Mar 2017 02:44:29 + Robin H. Johnson wrote:

> On Thu, Mar 16, 2017 at 02:22:08AM +, Rich Rocque wrote:
> > Has anyone else run into this or have any suggestions on how to remedy it?
> We need a LOT more info.
>
Indeed.

> > After a couple months of almost no issues, our Ceph cluster has
> > started to have frequent failures. Just this week it's failed about
> > three times.
> >
> > The issue appears to be than an MDS or Monitor will fail and then all
> > clients hang. After that, all clients need to be forcibly restarted.
> - Can you define monitor 'failing' in this case?
> - What do the logs contain?
> - Is it running out of memory?
> - Can you turn up the debug level?
> - Has your cluster experienced continual growth and now might be
>   undersized in some regard?
>
A single MON failure should not cause any problems to boot.

"ceph -s" , "ceph osd tree"  and "ceph osd pool ls detail" as well.

> > The architecture for our setup is:
> Are these virtual machines? The overall specs seem rather like VM
> instances rather than hardware.
>
There are small servers like that, but a valid question indeed.
In particular, if it is dedicated HW, FULL specs.

> > 3 ea MON, MDS instances (co-located) o

Re: [ceph-users] Ceph Cluster Failures

2017-03-16 Thread Christian Balzer

Hello,

On Thu, 16 Mar 2017 02:44:29 + Robin H. Johnson wrote:

> On Thu, Mar 16, 2017 at 02:22:08AM +, Rich Rocque wrote:
> > Has anyone else run into this or have any suggestions on how to remedy it?  
> We need a LOT more info.
>
Indeed.
 
> > After a couple months of almost no issues, our Ceph cluster has
> > started to have frequent failures. Just this week it's failed about
> > three times.
> >
> > The issue appears to be than an MDS or Monitor will fail and then all
> > clients hang. After that, all clients need to be forcibly restarted.  
> - Can you define monitor 'failing' in this case? 
> - What do the logs contain? 
> - Is it running out of memory?
> - Can you turn up the debug level?
> - Has your cluster experienced continual growth and now might be
>   undersized in some regard?
> 
A single MON failure should not cause any problems to boot.

"ceph -s" , "ceph osd tree"  and "ceph osd pool ls detail" as well.

> > The architecture for our setup is:  
> Are these virtual machines? The overall specs seem rather like VM
> instances rather than hardware.
>
There are small servers like that, but a valid question indeed.
In particular, if it is dedicated HW, FULL specs.
 
> > 3 ea MON, MDS instances (co-located) on 2cpu, 4GB RAM servers  
> What sort of SSD are the monitor datastores on? ('mon data' in the
> config)
> 
He doesn't mention SSDs in the MON/MDS context, so we could be looking at
something even slower. FULL SPECS. 

4GB RAM would be fine for a single MON, but combined with MDS it may
be a bit tight.

> > 12 ea OSDs (ssd), on 1cpu, 1GB RAM servers  
> 12 SSDs to a single server, with 1cpu/1GB RAM? That's absurdly low-spec.
> How many OSD servers, what SSDs?
> 
I think he means 12 individual servers. Again, there are micro servers
like that around, like:
https://www.supermicro.com.tw/products/system/2U/2015/SYS-2015TA-HTRF.cfm

IF the SSDs are decent, CPU may be tight but 1GB RAM for a combination of
OS _and_ OSD is way too little for my taste and experience.

Christian

> What is the network setup & connectivity between them (hopefully
> 10Gbit).
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Cluster Failures

2017-03-15 Thread Robin H. Johnson
On Thu, Mar 16, 2017 at 02:22:08AM +, Rich Rocque wrote:
> Has anyone else run into this or have any suggestions on how to remedy it?
We need a LOT more info.

> After a couple months of almost no issues, our Ceph cluster has
> started to have frequent failures. Just this week it's failed about
> three times.
>
> The issue appears to be than an MDS or Monitor will fail and then all
> clients hang. After that, all clients need to be forcibly restarted.
- Can you define monitor 'failing' in this case? 
- What do the logs contain? 
- Is it running out of memory?
- Can you turn up the debug level?
- Has your cluster experienced continual growth and now might be
  undersized in some regard?

> The architecture for our setup is:
Are these virtual machines? The overall specs seem rather like VM
instances rather than hardware.

> 3 ea MON, MDS instances (co-located) on 2cpu, 4GB RAM servers
What sort of SSD are the monitor datastores on? ('mon data' in the
config)

> 12 ea OSDs (ssd), on 1cpu, 1GB RAM servers
12 SSDs to a single server, with 1cpu/1GB RAM? That's absurdly low-spec.
How many OSD servers, what SSDs?

What is the network setup & connectivity between them (hopefully
10Gbit).

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com