[ceph-users] Older version repo

2015-10-23 Thread Logan Barfield
I'm currently working on deploying a new VM cluster using KVM + RBD.  I've
noticed through the list that the latest "Hammer" (0.94.4) release can
cause issues with librbd and caching.

We've worked around this issue in our existing clusters by only upgrading
the OSD & MON hosts, while leaving the hypervisor/client hosts on v0.94.3
as recommended by another user on the list.

Is there a archive repo somewhere for Ceph that we can use to install
0.94.3 on Ubuntu 14.04, or is building from source our only option?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Latency impact on RBD performance

2015-08-19 Thread Logan Barfield
Hi,

We are currently using 2 OSD hosts with SSDs to provide RBD backed volumes
for KVM hypervisors.  This 'cluster' is currently set up in 'Location A'.

We are looking to move our hypervisors/VMs over to a new location, and will
have a 1Gbit link between the two datacenters.  We can run Layer 2 over the
link, and it should have ~10ms of latency.  Call the new datacenter
'Location B'.

One proposed solution for the migration is to set up new RBD hosts in the
new location, set up a new pool, and move the VM volumes to it.

The potential issue with this solution is that we can end up in a scenario
where the VM is running on a hypervisor in 'Location A', but
writing/reading to a volume in 'Location B'.

My question is: what kind of performance impact should we expect when
reading/writing over a link with ~10ms of latency?  Will it bring I/O
intensive operations (like databases) to a halt, or will it be 'tolerable'
for a short period (a few days).  Most of the VMs are running database
backed e-commerce sites.

My expectation is that 10ms for every I/O operation will cause a
significant impact, but we wanted to verify that before ruling it out as a
solution.  We will also be doing some internal testing of course.


I appreciate any feedback the community has.

- Logan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Monitors and read/write latency

2015-01-07 Thread Logan Barfield
Do monitors have any impact on read/write latencies?  Everything I've read
says no, but since a client needs to talk to a monitor before reading or
writing to OSDs it would seem like that would introduce some overhead.

I ask for two reasons:
1) We are currently using SSD based OSD nodes for our RBD pools.  These
nodes are connected to our hypervisors over 10Gbit links for VM block
devices.  The rest of the cluster is on 1Gbit links, so the RBD nodes
contact the monitors across 1Gbit instead of 1Gbit.  I'm not sure if this
would degrade performance at all.

2) In a multi-datacenter cluster a client may end up contacting a monitor
located in a remote location (e.g., over a high latency WAN link).  I would
think the client would have to wait for a response from the monitor before
beginning read/write operations on the local OSDs.

I'm not sure exactly what the monitor interactions are.  Do clients only
pull the cluster map from the monitors (then ping it occasionally for
updates), or do clients talk to the monitors any time they write a new
object to determine what placement group / OSDs to write to or read from?


Thank You,

Logan Barfield
Tranquil Hosting
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: Multi-site deployment RBD and Federated Gateways

2015-01-07 Thread Logan Barfield
Hello,

I'm re-sending this message since I didn't see it picked up on the list
archives yesterday.  My apologies if it was received previously.

We are currently running a single datacenter Ceph deployment.  Our setup is
as follows:
- 4 HDD OSD nodes (primarily used for RadosGW/Object Storage)
- 2 SSD OSD nodes (used for RBD/VM block devices)
- 3 Monitor daemons running on 3 of the HDD OSD nodes
- The CRUSH rules are set to push all data to the HDD nodes except for the
RBD pool, which uses the SSD nodes.

Our goal is to have OSD nodes in 3 datacenters (US East, US West, Europe).
I'm thinking that we would want the following setup:
- RadosGW instance in each datacenter with geo-dns to direct clients to the
closest one.
- Same OSD configuration as our current location (HDD for RadosGW, SSD for
RBD)
- Separate RBD pool in each datacenter for VM block devices.
- CRUSH rules:
- RadosGW: 3 replicas, different OSD nodes, at least 1 off-site (e.g., 2
replicas on 2 OSD nodes in one datacenter, 1 replica on 1 OSD node in a
different datacenter).  I don't know if RadosGW is geo-aware enough to do
this efficiently
- RBD: 2 replicas across 2 OSD nodes in the same datacenter.

From the documentation it looks like the best way to accomplish this would
be to have a separate cluster in each datacenter, then use a federated
RadosGW configuration to keep geo-redundant replicas of objects.  The other
option would be to have one cluster spanning all 3 locations, but since
they would be connected over VPN/WAN links that doesn't seem ideal.

Concerns:
- With a federated configuration it looks like only one zone will be
writable, so if the master zone is on the east coast all of the west coast
clients would be uploading there as well.
- It doesn't appear that there is a way to only have 1 replica sent to the
secondary zone, rather all data written to the master is replicated to the
secondary (e.g., 3 replicas in each location).  Alternatively with multiple
regions both zones would be read/write, but only metadata would be synced.
- From the documentation I understand that there should be different pools
for each zone, and each cluster will need to have a different name.  Since
our current cluster is in production I don't know how safe it would be to
rename/move pools, or re-name the cluster.  We are using the default ceph
cluster name right now because different names add complexity (e.g,
requiring '--cluster' for all commands), and we noticed in testing that
some of the init scripts don't play well with custom cluster names.

It would seem to me that having a federated configuration would add a lot
of complexity. It wouldn't get us exactly what we'd like for replication
(one offsite copy), and doesn't allow for geo-aware writes.

I've seen a few examples of CRUSH maps that span multiple datacenters.
This would seem to be an easier setup, and would get us closer to what we
want with replication.  My only concern would be the WAN latency, setting
up site-to-site VPN (which I don't think is necessary for the federated
setup), and how well Ceph would handle losing a connection to one of the
remote sites for a few seconds or minutes.

Is there a recommended deployment for what we want to do, or any reference
guides beyond the official Ceph docs?  I know Ceph is being used for
multi-site deployments, but other than a few blog posts demonstrating
theoretical setups and vague Powerpoint slides I haven't seen any details
on it.  Unfortunately we are a very small company, so consulting with
Inktank/RedHat isn't financially feasible right now.

Any suggestions/insight would be much appreciated.


Thank You,

Logan Barfield
Tranquil Hosting
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Multi-site deployment RBD and Federated Gateways

2015-01-07 Thread Logan Barfield
Hello,

We are currently running a single datacenter Ceph deployment.  Our setup is
as follows:
- 4 HDD OSD nodes (primarily used for RadosGW/Object Storage)
- 2 SSD OSD nodes (used for RBD/VM block devices)
- 3 Monitor daemons running on 3 of the HDD OSD nodes
- The CRUSH rules are set to push all data to the HDD nodes except for the
RBD pool, which uses the SSD nodes.

Our goal is to have OSD nodes in 3 datacenters (US East, US West, Europe).
I'm thinking that we would want the following setup:
- RadosGW instance in each datacenter with geo-dns to direct clients to the
closest one.
- Same OSD configuration as our current location (HDD for RadosGW, SSD for
RBD)
- Separate RBD pool in each datacenter for VM block devices.
- CRUSH rules:
- RadosGW: 3 replicas, different OSD nodes, at least 1 off-site (e.g., 2
replicas on 2 OSD nodes in one datacenter, 1 replica on 1 OSD node in a
different datacenter).  I don't know if RadosGW is geo-aware enough to do
this efficiently
- RBD: 2 replicas across 2 OSD nodes in the same datacenter.

From the documentation it looks like the best way to accomplish this would
be to have a separate cluster in each datacenter, then use a federated
RadosGW configuration to keep geo-redundant replicas of objects.  The other
option would be to have one cluster spanning all 3 locations, but since
they would be connected over VPN/WAN links that doesn't seem ideal.

Concerns:
- With a federated configuration it looks like only one zone will be
writable, so if the master zone is on the east coast all of the west coast
clients would be uploading there as well.
- It doesn't appear that there is a way to only have 1 replica sent to the
secondary zone, rather all data written to the master is replicated to the
secondary (e.g., 3 replicas in each location).  Alternatively with multiple
regions both zones would be read/write, but only metadata would be synced.
- From the documentation I understand that there should be different pools
for each zone, and each cluster will need to have a different name.  Since
our current cluster is in production I don't know how safe it would be to
rename/move pools, or re-name the cluster.  We are using the default ceph
cluster name right now because different names add complexity (e.g,
requiring '--cluster' for all commands), and we noticed in testing that
some of the init scripts don't play well with custom cluster names.

It would seem to me that having a federated configuration would add a lot
of complexity. It wouldn't get us exactly what we'd like for replication
(one offsite copy), and doesn't allow for geo-aware writes.

I've seen a few examples of CRUSH maps that span multiple datacenters.
This would seem to be an easier setup, and would get us closer to what we
want with replication.  My only concern would be the WAN latency, setting
up site-to-site VPN (which I don't think is necessary for the federated
setup), and how well Ceph would handle losing a connection to one of the
remote sites for a few seconds or minutes.

Is there a recommended deployment for what we want to do, or any reference
guides beyond the official Ceph docs?  I know Ceph is being used for
multi-site deployments, but other than a few blog posts demonstrating
theoretical setups and vague Powerpoint slides I haven't seen any details
on it.  Unfortunately we are a very small company, so consulting with
Inktank/RedHat isn't financially feasible right now.

Any suggestions/insight would be much appreciated.


Thank You,

Logan Barfield
Tranquil Hosting
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com