Re: multi datacenter replication

Patrick Christopher Thu, 16 May 2013 11:55:00 -0700

Hi Adron,
Thank you for the reply.  The licensing is a concern, but that was for
later.


I think there will have to be a cluster of some sort on the remote dcs as
the connection between the edge dcs and the central dc is limited to a
small 1-2mbps link that is shared with many other higher priority
applications.  The remote dc spends most of its time (70-80%ish) with only
the small link.  When the small link is all we have, we would want the edge
dc to reliably store information locally until it can connect to the large
link.

This is probably not a great fit for riak right now.  If you do have any
other suggestions, I'd love to hear them and I'll keep an eye out for how
replication grows in the coming months.

Pat


On Wed, May 15, 2013 at 12:15 PM, Adron Hall <[email protected]> wrote:

> Currently, as the replication exists today I don't believe the replication
> service would do exactly that. (anyone else on list, plz correct me if I'm
> wrong here).  However in the coming months we have that capability in the
> road map.
>
> However I'm just a little hesitant to suggest committing an entire Riak
> Cluster at each remote point solely for replication. There's also the
> licensing that comes into play with the multi-data center replication also.
> Ideally we'd have PS (Professional Services) or your team work together to
> build clients to connect to a main Riak Cluster.
>
> Being that each cluster should have a minimal of 5 nodes, having a cluster
> of 5 nodes at each remote point would seem like overkill - however it would
> replicate, it would just be that a lot of nodes & a lot of clusters is a
> lot for the volume of data. A client could be dramatically more
> minimalistic, it wouldn't require 5 nodes for each remote cluster, and
> could prospectively be dramatically cheaper & more efficient in the end.
>
> I'll loop you in with some others that could elaborate on this
> architecture and see which direction to aim for.
>
> Cheers,
> -Adron
>
>
> On Mon, May 13, 2013 at 5:20 PM, Patrick Christopher <
> [email protected]> wrote:
>
>> Hi Adron,
>> Thanks for the reply!
>>
>> The architectural thought that you pose is the idea I'm going for but
>> having riak do all of the replication, not a new client.  The model would
>> be:
>>   - single riak cluster
>>   - bucket a replicates to data center remoteA
>>   - bucket b replicates to data center remoteB
>>   - remoteB will never access data from a
>>   - remoteA will never access data from b
>>
>> Does riak support that?  I've not seen any database that supports that
>> model on its own.
>>
>> Pat
>>
>>
>> On Mon, May 13, 2013 at 12:42 PM, Adron Hall <[email protected]> wrote:
>>
>>> Hey Pat,
>>>
>>>  A few answers, thoughts and questions.
>>>
>>> 1. Each bucket allows (if after v1.1/1.2) replication. In 1.1 and above
>>> there is a repl value that accepts a true or false value. True turns on
>>> realtime and fullsync abilities. In 1.2 above has additional boolean
>>> parameters of realtime, fullsync, or both. Enabling the property via
>>> command line:
>>>
>>> curl -v -XPUT -H "Content-Type: application/json" \
>>> -d '{"props":{"repl":true}}' \
>>> http://127.0.0.1:8091/riak/my_bucket
>>>
>>> 2. Running both styles of replication should be fine. They're on by
>>> default to start. In the particular situation you describe - using realtime
>>> on all the time should work well and then only using the fullsync when the
>>> ship docks and connects at a higher speed - using something to trigger that.
>>>
>>> A few additional questions:
>>>
>>>    - I recall we spoke about data sizes of 3-5k per ship, but then
>>>    there was all of the data that would go along with each client, could you
>>>    provide more elaboration around what count, sizes, connections to other
>>>    elements and related information data? Why type/pieces would need to go 
>>> to
>>>    each ship, etc.
>>>    - For the connections to each ship during satellite link what does
>>>    the bandwidth, latency and other characteristics look like? Latency times
>>>    of 800, 1500, 6000 or possibly higher 8000, 10000?
>>>
>>> Another Architectural Thought:
>>>
>>>    - One idea that stands out would be to use Riak as the primary
>>>    cluster but to implement a client that does replication itself 
>>> specifically
>>>    for a bucket (or buckets). It seems like, from my understanding so far,
>>>    that the client might be the key mechanism to control any type of
>>>    replication - with or without MDC being used. Basically following a
>>>    standard hub-and-spoke server & client model.
>>>
>>> Hope that helps, cheers!
>>>
>>> -Adron
>>>
>>>
>>>
>>>
>>> On Fri, May 10, 2013 at 9:45 AM, Patrick Christopher <
>>> [email protected]> wrote:
>>>
>>>> Hi,
>>>> I’m working on an application that will be spread across many (150-200)
>>>> data centers.  I had a great chat with Adron at the Seattle Riak
>>>> Office Hours and I think that Riak can provide the backbone of the
>>>> solution.  Adron is a great help but I have come away with (or have come up
>>>> with) two more questions.
>>>>
>>>> 1.    Does riak support specifying bucket level multi data center
>>>> replication? There is a single master data center that has all of the data,
>>>> and that central cluster replicates a single different bucket to each of
>>>> the remote data centers.  Its a hub/spoke model where something at the hub
>>>> has a view of the full data set and something at a spoke end only has a
>>>> view of a single, unique bucket.
>>>>
>>>> 2.       What would be the best way to setup a priority replication
>>>> strategy?  There is always a link between the main dc and the spoke
>>>> dcs, but sometimes its a big fast link and we'd want to do a full
>>>> replication and sometimes its the equivalent of a 56k modem and we only
>>>> want to replicate time critical data.  I think riak can handle this by
>>>> using the real-time sync for critical data and the full-sync for a full
>>>> sync.  Will that work or is that asking for trouble running both styles?
>>>>
>>>>
>>>> And there was a small note in the 
>>>> docs<http://docs.basho.com/riakee/latest/cookbooks/Multi-Data-Center-Replication-Architecture/>
>>>>  that
>>>> says, " ...there are two primary modes of operation..." are there
>>>> other, secondary modes of replication or is this me over-parsing the docs?
>>>>
>>>>
>>>> Thank you,
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [email protected]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>>
>>>
>>>
>>> --
>>> *Adron B Hall*
>>>  Blog <http://compositecode.com/>, Adron.Me <http://adron.me/>, 
>>> @adron<http://twitter.com/adron>
>>> with Basho <http://basho.com/> @Basho <https://twitter.com/basho>
>>>
>>
>>
>
>
> --
> *Adron B Hall*
> Blog <http://compositecode.com/>, Adron.Me <http://adron.me/>, 
> @adron<http://twitter.com/adron>
> with Basho <http://basho.com/> @Basho <https://twitter.com/basho>
>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: multi datacenter replication

Reply via email to