Re: distributed cluster

2012-05-30 Thread Tommi Virtanen
On Wed, May 30, 2012 at 4:47 AM, Jerker Nyberg jer...@update.uu.se wrote:
 I am waiting as fast as I can for it to be production ready. :-)

I feel like starting a quote of the week collection ;)

One more thing I remember is worth mentioning: Ceph doesn't place
objects near you, CRUSH is completely deterministic based on the
object name. Hence, your worst case may actually look like this:

sites: west, east
servers: a,b in west; c,d in east
client: x in west

Write from client, with bad luck, will go
x-d, replication: d-a, d-b

Now you've used 3x bandwidth on the WAN.


Currently, the only way to work around this is with pools, and there's
nothing automatic about that.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: distributed cluster

2012-05-29 Thread Quenten Grasso
This is also something I'm very interested in as well from a Power outage or 
some other Data centre issue.

I assume the main issue here would be our friend latency however there is a 
bloke on the mailing list who is currently running a 2 site cluster setup as 
well.

I've been thinking about a setup with 2 replica level (1 replica per site) with 
the sites only 2-3km apart latency shouldn't be much of an issue but the 
obvious bottleneck will be the 10gbe link between sites and split brain isn't 
an issue if the RBD Vol is only mounted at a single site anyway.

If the data is sitting on a BTRFS/ZFS raid (or raid6 until BTRFS is ready) this 
would be reasonable level of risk. As for data integrity/availability of only 
having 2 replicas because the likely hood of having a complete server failure 
and a link outage at the same time would be fairly minimal.

Regards,
Quenten 


-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Jimmy Tang
Sent: Monday, 28 May 2012 11:48 PM
To: Jerker Nyberg
Cc: ceph-devel@vger.kernel.org
Subject: Re: distributed cluster

Hi All,

On 28 May 2012, at 12:28, Jerker Nyberg wrote:

 
 This may not really be a subject ceph-devel mailinglist but rather a 
 potential ceph-users? I hope it is ok to write here. I would like to discuss 
 the if it sounds reasonable to run a Ceph cluster distributed over a metro 
 (city) network.
 
 Let us assume we have a couple of sites distributed over a metro network with 
 at least gigabit interconnect. The demands for storage capacity and speed at 
 our sites are increasing together with the demands for reasonably stable 
 storage. May Ceph be a port of a solution?
 
 One idea is to set up Ceph distributed over this metro network. A public 
 service network is announced at all sites, anycasted from the storage 
 SMB/NFS/RGW(?)-to-Ceph gateway. (for stateless connections). Statefull 
 connections (iSCSI?) has to contact the individual storage gateways and 
 redundancy is handled at the application level (dual path). Ceph kernel 
 clients contact the storage servers directly.
 
 Hopefully this means that clients at the sites with a storage gateway will 
 contact it. Clients at a site without a local storage gateway, or when the 
 local gateway is down, will contact a storage gateway at another site.
 
 Hopefully not all power and network at the whole city will go down at once!
 
 Does this sound reasonable? It should be easy to scale up with more storage 
 nodes with Ceph. Or is it better to put all servers in the same server room?
 
Internet
 |   |
Routers
 |   |
   Metro network  =
  | | | |||
   Sites  R R R RRR
  | | | |
   Servers  Ceph1 Ceph2 Ceph3 Ceph4
 
 


I'm also interested in this type of use case, I would be interested in running 
a ceph cluster across a metropolitan area network. Has anyone tried running 
ceph in a WAN/MAN environment across a city/state/country?

Regards,
Jimmy Tang

--
Senior Software Engineer, Digital Repository of Ireland (DRI)
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: distributed cluster

2012-05-29 Thread Tommi Virtanen
On Mon, May 28, 2012 at 4:28 AM, Jerker Nyberg jer...@update.uu.se wrote:
 This may not really be a subject ceph-devel mailinglist but rather a
 potential ceph-users? I hope it is ok to write here.

It's absolutely ok to talk on this mailing list about using Ceph. We
may create a separate ceph-users later on, but right now this list is
where the conversation should go.

 Let us assume we have a couple of sites distributed over a metro network
 with at least gigabit interconnect. The demands for storage capacity and
 speed at our sites are increasing together with the demands for reasonably
 stable storage. May Ceph be a port of a solution?

Ceph was designed to work within a single data center. If parts of the
cluster reside in remote locations, you essentially suffer the worst
combination of their latency and bandwidth limits. A write that gets
replicated to three different data centers is not complete until the
data has been transferred to all three, and an acknowledgement has
been received.

For example: with data replicated over data centers A, B, C, connected
at 1Gb/s, the fastest all of A will ever handle writes is 0.5Gb/s --
it'll need to replicate everything to B and C, over that single pipe.

I am aware of a few people building multi-dc Ceph clusters. Some have
shared their network latency, bandwidth and availability numbers with
me (confidentially), and at first glance their wide-area network
performs better than many single-dc networks. They are far above a 1
gigabit interconnect.

I would really recommend you embark on a project like this only if you
are able to understand the Ceph replication model, and do the math for
yourself and figure out what your expected service levels for Ceph
operations would be. (Naturally, Inktank Professional Services will
help you in your endeavors, though their first response should be
that's not a recommended setup.)

 One idea is to set up Ceph distributed over this metro network. A public
 service network is announced at all sites, anycasted from the storage
 SMB/NFS/RGW(?)-to-Ceph gateway. (for stateless connections). Statefull
 connections (iSCSI?) has to contact the individual storage gateways and
 redundancy is handled at the application level (dual path). Ceph kernel
 clients contact the storage servers directly.

The Ceph Distributed File System is not considered production ready yet.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: distributed cluster

2012-05-29 Thread Sam Zaydel
I could see a lot of use-cases for BC and DR tiers where performance
may not be as much an issue, but availability being critical above
all.

Most options in use today rely on some async rep. and are in most
cases quite expensive and still do not view performance as their
primary concern.

On Tue, May 29, 2012 at 9:44 AM, Tommi Virtanen t...@inktank.com wrote:
 On Mon, May 28, 2012 at 4:28 AM, Jerker Nyberg jer...@update.uu.se wrote:
 This may not really be a subject ceph-devel mailinglist but rather a
 potential ceph-users? I hope it is ok to write here.

 It's absolutely ok to talk on this mailing list about using Ceph. We
 may create a separate ceph-users later on, but right now this list is
 where the conversation should go.

 Let us assume we have a couple of sites distributed over a metro network
 with at least gigabit interconnect. The demands for storage capacity and
 speed at our sites are increasing together with the demands for reasonably
 stable storage. May Ceph be a port of a solution?

 Ceph was designed to work within a single data center. If parts of the
 cluster reside in remote locations, you essentially suffer the worst
 combination of their latency and bandwidth limits. A write that gets
 replicated to three different data centers is not complete until the
 data has been transferred to all three, and an acknowledgement has
 been received.

 For example: with data replicated over data centers A, B, C, connected
 at 1Gb/s, the fastest all of A will ever handle writes is 0.5Gb/s --
 it'll need to replicate everything to B and C, over that single pipe.

 I am aware of a few people building multi-dc Ceph clusters. Some have
 shared their network latency, bandwidth and availability numbers with
 me (confidentially), and at first glance their wide-area network
 performs better than many single-dc networks. They are far above a 1
 gigabit interconnect.

 I would really recommend you embark on a project like this only if you
 are able to understand the Ceph replication model, and do the math for
 yourself and figure out what your expected service levels for Ceph
 operations would be. (Naturally, Inktank Professional Services will
 help you in your endeavors, though their first response should be
 that's not a recommended setup.)

 One idea is to set up Ceph distributed over this metro network. A public
 service network is announced at all sites, anycasted from the storage
 SMB/NFS/RGW(?)-to-Ceph gateway. (for stateless connections). Statefull
 connections (iSCSI?) has to contact the individual storage gateways and
 redundancy is handled at the application level (dual path). Ceph kernel
 clients contact the storage servers directly.

 The Ceph Distributed File System is not considered production ready yet.
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


distributed cluster

2012-05-28 Thread Jerker Nyberg


This may not really be a subject ceph-devel mailinglist but rather a 
potential ceph-users? I hope it is ok to write here. I would like to 
discuss the if it sounds reasonable to run a Ceph cluster distributed over 
a metro (city) network.


Let us assume we have a couple of sites distributed over a metro network 
with at least gigabit interconnect. The demands for storage capacity and 
speed at our sites are increasing together with the demands for reasonably 
stable storage. May Ceph be a port of a solution?


One idea is to set up Ceph distributed over this metro network. A public 
service network is announced at all sites, anycasted from the storage 
SMB/NFS/RGW(?)-to-Ceph gateway. (for stateless connections). Statefull 
connections (iSCSI?) has to contact the individual storage gateways and 
redundancy is handled at the application level (dual path). Ceph kernel 
clients contact the storage servers directly.


Hopefully this means that clients at the sites with a storage gateway will 
contact it. Clients at a site without a local storage gateway, or when the 
local gateway is down, will contact a storage gateway at another site.


Hopefully not all power and network at the whole city will go 
down at once!


Does this sound reasonable? It should be easy to scale up with more 
storage nodes with Ceph. Or is it better to put all servers in the same 
server room?


Internet
 |   |
Routers
 |   |
   Metro network  =
  | | | |||
   Sites  R R R RRR
  | | | |
   Servers  Ceph1 Ceph2 Ceph3 Ceph4


--jerker
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html