
I appreciate you taking the time to answer my questions since I am late joining 
this conversation.  I wanted to make sure I understand the underlying design 
and its assumptions before commenting in depth. My detailed followups are 
in-line below.  TL;DR I am concerned that the design does not properly address 
the myriad of network partition or conflict resolution scenarios that will 
inevitably arise in real world operation.  Furthermore, you did not answer my 
biggest question — why isn’t something like Percona master-master replication 
[1] not sufficient for this capability?  

I apologize again for missing the original proposal thread as we could have 
addressed these issues before code was cut.  I would imagine you are likely 
frustrated by the length of time it has taken to get this patch into master.  I 
would like to say you experience is atypical, and my goal is to help find the 
best design/solution for multi-region data sync.  Finally, please don’t take my 
feedback (or anyone else’s) as a lack of appreciate for your efforts.  I come 
at every review from the perspective that I will be one of the people 
responsible for supporting/maintaining it in future releases.  Therefore, I 
want to ensure that new work does not incur any technical debt that would 
hobble the community’s long term development efforts.


1. out of spec. we can add this later if necessary.
2. out of spec. we can add this later if necessary.
3. out of spec. we can add this later if necessary.
I want to understand why these data elements were not included.  To my mind, 
this feature is functionally incomplete without the inclusion of project and 
event data.  One of the primary use cases for this capability will be 
geographical distribution of applications/system for which sync’ed project data 
will be required.  Without the synchronization of the event data, it is 
impossible to gain a complete operational picture of the infrastructure being 
managed.  Template data I could accept as being deferred to a later release 
though I think this omission will disappoint many multi-region users.

Today, we sidestep these issues because we don’t sync anything — each region 
runs an independent CloudStack instance owned and operated by the same 
organization.  However, once we start syncing data between regions, we need to 
ensure that the data set is logically complete.  If we do not, the feature 
will, at best, be cruft the community must maintain and frustrate users.

4. Whenever there are changes in the records, the time stamps are logged and 
the later change wins.
Timestamps are, perhaps, the most unreliable approach to conflict resolution.  
(As an side, Riak defaults to last write wins, and we regret that decision on a 
daily basis.  So much so that it won’t be the default in 2.0.)  It requires 
complete time sync across all regions which is notoriously difficult to 
achieve. In practice, it typically requires GPS receivers in each data center.  
When clocks fall out of sync, data gets silently corrupted — no errors occur. 
Therefore, I do not believe we can accept timestamp based conflict resolution 
due to high likelihood of data corruption.

I highly suggest reading Lamport’s classic Time, Clocks, and the Ordering of 
Events in a Distributed System [2] paper for a deeper examination of the issues 
with using wall clocks for data synchronization and approaches to achieving 
partial event ordering.  Thankfully, there are safe approaches to distributed 
conflict resolution such as vector clocks[3][4][5], version vectors[6], and 
CRDTs [7].  

5. It relies on the order of events, so if the order is reversed with some 
reason, the creations will fail, but they will be covered by FullScan.
First, there is no guaranteed order of message delivery unless the 
synchronization mechanism uses a single consumer thread.  While this approach 
would assure ordered processing, it not would scale sufficiently (i.e. realtime 
data sync wouldn’t be on the very long end of eventually consistent).  How does 
the FullScan operation exchange data between regions?  Also, do you intend for 
this feature to be a master-master or master-slave replication model?  If it is 
master-master, a final reconciliation step will be required by the region 
initiating the FullScan operation which a tricky bit to properly implement 
since changes may occur in the data between the time the FullScan is initiated 
and the reconciliation begins.  How does the mechanism handle an interruption 
of the FullScan operation (e.g. management server or database crash or network 
partition during sync operation)?

In terms of handling the referential integrity issue, one approach that could 
work would be to resubmit the message when a referential integrity error occurs 
— assuming that the message for a parent record is either waiting in the queue 
or being processed concurrently.  Such an approach must include a retry count 
and limit to protect against scenarios where the parent-child relationships can 
not be resolved and the management server simply needs to give up.

6. It sounds like not related with this project.
Partition tolerance is absolutely critical to any data synchronization 
operation.  In CAP terms, you are proposing a available/partition tolerant 
mechanism.  There will inevitably be network partitions when synchronizing data 
across WAN links (as there will be inside a datacenter). How does this design 
provide the partition tolerance to ensure correct and complete sync following 
periods of network unavailability between regions?  I suggest reading Kyle 
Kingsbury’s excellent Call Me Maybe [8] series on this subject.  To give away a 
bit of the ending, RabbitMQ does not provide proper partition tolerance 
[9][10].  Without understanding how this attribute will be fulfilled, the 
design is, in my view, incomplete.

7. The interval for FullScan processing is configurable in the global setting, 
The interval does not address the problem that the system may be be under high 
load when the FullScan starts.  Without back pressure, this mechanism could 
cause an internal denial of service since it may perform full scans of 
potentially large tables.  Therefore, there should be check in the FullScan 
that the system is not too busy to perform the operation.  If it is, then it 
should skip the FullScan and try again at the next interval.  Also, how does 
the FullScan operation prevent memory explosion as data sets grow?

[3]: (yes, Wikipedia has a good, 
straightforward explanation)

I apologize for joining this conversation late.  I understand that this patch 
was submitted back in February.  Around this time, my family had a significant 
medical event, and I was disengaged from all work activities — missing the 
original conversation.

Reading through the specification, and briefly reviewing the code, I would like 
to understand the following assumptions/design decisions:

   1. Why aren’t projects being sync’ed?  It seems very likely that users would 
want to have projects span data centers for redundancy/DR purposes.
   2. Why aren’t events being sync’ed?  I can imagine a number of scenarios 
where I would want to examine the operation of an logical application or system 
across both regions. Without the sync of event data, I would be forced to 
either perform that interleave visually with two browser tabs or dump the data 
into another datastore to be merged.
   3. Why isn’t template metadata being sync’ed?  When spanning an 
application/system across regions, it would seem to follow that I would want to 
use the same templates.
   4. How does this design deal with modifications to a record in two or more 
regions during a network partition?
   5. Given that messages can/will be processed out of order, how is 
referential integrity maintained when a parent and a set of children are 
created (e.g. creation of a new account and a set of users rapidly through the 
   6. Is RabbitMQ being relied upon to provide partition tolerance?
   7. Is there a back pressure mechanism to throttle the full sync operation 
when the database/management server is under heavy load?

Finally, I would like to understand why we are taking on multi-datacenter data 
replication in CloudStack, and not deferring to underlying datastore.  Speaking 
as someone whose $dayjob involves delivering such a system (at Basho for Riak), 
it is a very hard thing to get right (there literally thousands of corner 
cases).  The design document does not speak to this decision, and I would like 
understand how CloudStack could not leverage existing, mature mechanisms at the 

I apologize if some of these questions have been answered already.  I attempt 
to look back in the archives, but given the span of this conversation, it was 
difficult to piece together retroactively.


> I did logic review according to the FS assuming that the FS (and the
> design described there) was approved on the [PROPOSAL] stage, BEFORE the
> code was put it to the review board. Was it approved at that stage?
> Alex, the feature is not small, and considering that it raised so many
> questions and arguing, I would really like to get a final design/logic
> review + “ship it” from people having expertise on the topic, and/or who
> originally participated in review/discussion: Chiradeep, Kishan, Murail.
> Thank you,
> Alena.
>> Alex, sorry to hear that it took so long to get on the review process.
>> The question still remains – before you started working on implementation,
>> and posted your plugin’s code, was the FS approved/reviewed as a part of
>> [PROPOSAL] discussion? We should never start the development until you get
>> the input from the community on the FS and confirm that the design is valid
>> and the feature can contribute to CS. Only after the proposal is accepted,
>> you can request the Reviewboard ticket review. So I did assume that the
>> [PROPOSAL] phase was finished, and the FS was validated as a part of it,
>> when I was asked by Daan to review the Reviewboard ticket.
>> I’ve also looked at the history. I can see that Chiradeep contributed
>> to the design/plugin logic discussion as well as pointed to the changes
>> that need to be done to the code structure. I helped to review the second.
>> Lets wait for the update from Kishan. Kishan, in addition to answering
>> Alex’s questions, please go over the plugin design once again.
>> Thank you,
>> Alena.
>> On Thu, Jun 26, 2014 at 12:57 PM, Alena Prokharchyk <
>>> wrote:
>>> Alex,
>>> By “huge” I’ve meant that there was a lot of repetitive hardcoded
>>> things, lot of unnecessary changes to the CS orchestration layer. If you
>>> compare a number of changes now and originally, you can see that it reduced
>>> almost twice.
>>> But lets discuss the complains about lack of initial review as its
>>> more important question.
>>> Review of the design spec should happen before you start
>>> designing/coding. As I jumped on review much later, after you’ve submitted
>>> the entire plugin code, so I I didn’t participate in “Feature Request”
>>> discussion review that might have happened earlier. And I do assume that
>>> the reviews/emails exchanges were done at that initial phase? You should
>>> have contacted the people participating in the initial phase, and ask them
>>> for the review as well.
>>> As a part of my review, I’ve made sure to cover the things I’m certain
>>> should have been changed. I’ve reviewed the feature logic as well,
>>> consulting the FS you’ve written. I’m not saying that there is anything
>>> wrong with your initial design, but asking for a second opinion from the
>>> guys who have more expertise in Regions.
>>> Kishan, please help to do the final review the Alex’s plugin design
>>> Thank you,
>>> Alena.
>>> On Wed, Jun 25, 2014 at 10:41 PM, Alena Prokharchyk <
>>>> wrote:
>>>> Alex,
>>>> In the beginning the code was not very well organazied, didn't match
>>>> coding standarts (no use of spring, misleading names, not segregated to its
>>>> own plugin), and the code base was unneccessary huge.
>>>> All of the above it very hard to review and understand the code logic
>>>> from the beginning and engage more people to the review. Therefore
>>>> Chiradeep pointed it in his original review that the code needs to match CS
>>>> standarts first, and be better organized. I helped to review the fixes, and
>>>> did logic review as well after the code came into “reviewable” shape.
>>>> I'm asking Kishan/Murali to look at it to see if anything is missing
>>>> or incorrect in the final review, not to make you override or change
>>>> everything you've already put in.
>>>> Thank you,
>>>> Alena.
>>>> On Wed, Jun 25, 2014 at 9:45 PM, Alena Prokharchyk <
>>>>> wrote:
>>>>> Alex,
>>>>> I did my best to review the code, made sure it came in shape with
>>>>> the CS guidelines and java code style There was no way to anticipate all
>>>>> the things to fix originally, as every subsequent review update added more
>>>>> things to fix as the review code was new/refactored.
>>>>> And I don’t see anything wrong about asking for a FINAL opinion from
>>>>> other people on the mailing list, considering some of them participated in
>>>>> the review process along the way already (Kishan). Anybody can review the
>>>>> review ticket till its closed, and point to the items that other reviewers
>>>>> might have missed.
>>>>> Thank you,
>>>>> Alena.
>>>>> On Wed, Jun 25, 2014 at 7:44 PM, Alena Prokharchyk <
>>>>>> wrote:
>>>>>> Alex, looks fine to me. Make sure that you put the regionId
>>>>>> validation as our in-built API validation won’t work in this case because
>>>>>> there is no UUID field support for the Region object. You can check how
>>>>>> validation is begin done in updateRegion/deleteRegion scenarios.
>>>>>> Kishan/Murali, can you please spend some time doing the final
>>>>>> review for Alex’s tickets? As you are the original developers for Region,
>>>>>> and probably have the most expertise on the topic. I don’t want to commit
>>>>>> the fixes before I hear “ship it” from both of you, guys.
>>>>>> Thanks,
>>>>>> Alena.
>>>>>> On Wed, Jun 25, 2014 at 11:03 AM, Kishan Kavala <
>>>>>>> wrote:
>>>>>>> Alex,
>>>>>>> You can refer to the code from initDataSource method in
>>>>>>> Properties file can be loaded using the following:
>>>>>>> *File dbPropsFile = PropertiesUtil.findConfigFile(propsFileName);*
>>>>>>> On Wed, Jun 25, 2014 at 2:25 AM, Kishan Kavala <
>>>>>>>> wrote:
>>>>>>> Alex,
>>>>>>> As Alena mentioned, it is admin’s responsibility to keep ids same
>>>>>>> across Regions. Ids should be used as unique identifier. Region name is
>>>>>>> merely descriptive name and its mostly associated with geographic 
>>>>>>> location.
>>>>>>> Also note that region name can be updated anytime using updateRegion
>>>>>>> API.
>>>>>>> Unlike, other internal Ids in CS, region Ids are assigned by admin.
>>>>>>> So exposing region Id to admin should not be an issue.
>>>>>>> Id of the local region cannot be guaranteed to be “1” always. Region
>>>>>>> Id has to be unique across all regions. While creating new region admin
>>>>>>> will provide unique region id to *cloud-setup-databases* script. Id
>>>>>>> of the local region is stored in To identify a Local 
>>>>>>> region
>>>>>>> you can use one of the following options:
>>>>>>> 1. Look up in
>>>>>>> 2. Add a new column in region table
>>>>>>> On Tue, Jun 24, 2014 at 9:59 PM, Alex Ough <>
>>>>>>> wrote:
>>>>>>> I agree with that the ids are unique identifier, but they are
>>>>>>> usually internal purpose not exposed to the users. So it is a little
>>>>>>> strange to ask users to assign ids when they add new regions. And if we 
>>>>>>> do
>>>>>>> not allow duplicated names, I'm not sure why it is not good to use 
>>>>>>> names as
>>>>>>> a unique identifier.
>>>>>>> It's been a long way to come this far with several reasons, so I
>>>>>>> really want to wrap this up as soon as possible, and this doesn't seem 
>>>>>>> to
>>>>>>> be a major obstacle, so let me just use 'id' as a parameter if there is 
>>>>>>> no
>>>>>>> one with a different thought until tomorrow morning.
>>>>>>> Thanks
>>>>>>> Alex Ough
>>>>>>> On Tue, Jun 24, 2014 at 8:52 PM, Alena Prokharchyk <
>>>>>>>> wrote:
>>>>>>> Alex, id is used as a unique identifier for CS objects. And it is
>>>>>>> the CS requirement to refer to the object by id if the id is present. 
>>>>>>> Look
>>>>>>> at all the other APIs. We nowhere refer to the network/vpc/vm by name 
>>>>>>> just
>>>>>>> because its more human readable. The id is used by Api layer when 
>>>>>>> parameter
>>>>>>> validation is done, by lots of Dao methods (findById is one of them), 
>>>>>>> etc.
>>>>>>> Even look at updateRegion/deleteRegion – we don’t refer to them by name,
>>>>>>> but by the id.
>>>>>>> The reason why Kishan added the support for controlling the id by
>>>>>>> adding it to the createRegion call (and making it unique) is exactly 
>>>>>>> that –
>>>>>>> region administrator can decide what id to set on the region, and to
>>>>>>> introduce the region with the same id to the other regions’ db.
>>>>>>> So I would still suggest on using the id of the region in the API
>>>>>>> calls you are modifying. Unless developers who worked on regions 
>>>>>>> feature –
>>>>>>> Kishan/Murali – come up with the valid objection.
>>>>>>> Thanks,
>>>>>>> Alena.
>>>>>>> On Tue, Jun 24, 2014 at 8:33 PM, Alena Prokharchyk <
>>>>>>>> wrote:
>>>>>>> Aren’t we supposed to sync the regions across the multiple regions
>>>>>>> Dbs? Because that’s what region FS states:
>>>>>>> “Adding 2nd region” paragraph, bullet #4:
>>>>>>> 1. Install a 2nd CS instance.
>>>>>>> 2. While installing database set region_id using -r option in
>>>>>>> cloud-setup-databases script (Make sure *database_key* is same
>>>>>>> across all regions).
>>>>>>> *cloud-setup-databases cloud:**<**dbpassword**>**@localhost
>>>>>>> --deploy-as=root:**<**password**>** -e **<**encryption_type**>*
>>>>>>> * -m **<**management_server_key**>** -k **<**database_key**> -r
>>>>>>> <region_id>*
>>>>>>> 3. Start mgmt server
>>>>>>> 4. *Using addRegion API, add region 1 to region 2 and also region 2
>>>>>>> to region 1.*
>>>>>>> I assume that we expect the admin to add the region with the same
>>>>>>> name and the same id to ALL regions Dbs (both id and name should be 
>>>>>>> passed
>>>>>>> to createRegion call). So they are all in sync. Isn’t it the 
>>>>>>> requirement?
>>>>>>> If so, we should rely on the fact that all regions Dbs will have the 
>>>>>>> same
>>>>>>> set of regions having the same ids and names cross regions.
>>>>>>> Thanks,
>>>>>>> Alena.
>>>>>>> On Tue, Jun 24, 2014 at 7:35 PM, Alena Prokharchyk <
>>>>>>>> wrote:
>>>>>>> Alex, thank you for summarizing.
>>>>>>> I still don’t see why id can’t be unique across regions as you can
>>>>>>> control the id assignment – id is required when createRegion call is 
>>>>>>> made.
>>>>>>> And that’s how the region should be represented in other region’s Dbs – 
>>>>>>> by
>>>>>>> its id that is unique across the regions. Kishan/Murali, please confirm.
>>>>>>> Thank you,
>>>>>>> Alena.
