[Amdatu-developers] Use Cases

Ivo Ladage-van Doorn Mon, 25 Oct 2010 10:05:40 +0200

Hi All,

Reading the discussions I have the following remarks/questions;


-1- The extinction of the 'system administrator' role. 
In his current job, Leon is responsible for server management from hardware to 
software. He is responsible for the physical hardware and all 'basic' software 
that runs on it, except the actual application that runs on the server. His 
responsibility includes managing the database server, the web server, the OS, 
the (J)VM, etc. etc. 
In his new position however, Leon is responsible for ... for what? If Amazon 
EC2 is used, he is probably only responsible for the Amazon contract to be 
continued and paid each year. He doesn't do maintenance on the physical 
hardware and he does not manage cluster nodes. He doesn't have shell access to 
any machine. So in fact, his job ceased to exist.
I'm not saying this is a bad thing, but I wonder who takes over his tasks and 
responsibilities in the new picture. Seems to me that these tasks are taken 
over partially by Amazon (hardware, OS) and partially by Amdatu (provisioning, 
fail over). But what if something bad happens? What if a database server goes 
down? What if one application goes out of memory? What if the JVM crashes? 
Traditionally, the answer is "call Leon". In the new picture however, his 
answer will be "well, that's not my responsibility anymore, is it?". In an 
Amazon cloud it would probably mean that a server needs to be restarted and/or 
re-provisioned. As long as there is no relevant persistent data stored on the 
server, that should be OK.
Anyway, I think we should add a use case about all bad stuff that can happen; 
OOM, crashes, BSOD, hardware failure, power outage, etc. etc. and describe who 
is responsible for recovering from the errors.


-2- Schema updates
Marcel stated "I'm actually wondering why you're stating that these rolling 
updates are hard. What makes them hard? I can agree that doing schema updates 
on relational databases as part of updates can be hard or time consuming, 
especially if you need to be able to roll them back too, but since we're not 
using relational databases anymore, and having one central database that is 
used by all components is a bad idea from a component perspective anyway, I 
don't see big problems"

I don't think we don't do relational databases anymore. It is true that in the 
current Amdatu implementation no RDBMS is used, but I think Amdatu should 
support many storage engines and it's up to the application to decide what 
storage is actually used. Furthermore, schema updates are hard most of the 
times, not only with relational databases. For example, using Cassandra (NoSQL) 
it is extremely hard; it is not supported at all. In SQL you could write an 
update script, in Cassandra you just can't (CF's cannot be changed). Schema 
updates are always a tricky thing and much harder then software 
updates/rollbacks. I think in many cases schema rollbacks will even be 
impossible, assuming you don't want to lose any data. 
Thinking of it, the whole provisioning/update/rollback of an application is 
much easier then update/rollback of persistent data. We're focusing on making 
it extremely easy to manage distributed applications, where we should focus 
much more on persistent data. Reliability of persistent data is much more 
important than the application; software can always be reinstalled, lost data 
is lost forever.

So my point is that the use cases should focus more on dealing with persistent 
storage. On one hand we say that any storage can be used and it's up to the 
storage to support clustering, data synchronization, schema updates and schema 
rollbacks. On the other hand, if we require this from a storage engine, there 
are not many storage engines we support.


-3- Multiple composites
I agree that we should support running different components belonging to the 
same application on different servers. I think we should not think about an 
application "running on some server" anymore. An application doesn't run on a 
server, it runs in the cloud and each node in the cloud provides a part of it.

Regards, Ivo


From: amdatu-developers-bounces at amdatu.org 
[mailto:[email protected]] On Behalf Of Marcel Offermans
Sent: zaterdag 23 oktober 2010 17:52
To: amdatu-developers at amdatu.org
Subject: Re: [Amdatu-developers] Use Cases

Some feedback on the cloud deployment use case 
(http://amdatu.org/confluence/display/Amdatu/Amdatu+-+Cloud+Deployment+Use+Case):

1) I think at the lowest level, Amdatu should always be build on top of some 
"infrastructure as a service". In other words, some kind of cloud 
infrastructure. That can be a public cloud, such as Amazon's EC-2, or a private 
cloud (running Eucalypus on a set of your own servers).

2) The role of Leon should be the one in charge of that IAAS layer. Taking 1) 
into account, he should just be responsible for providing a cloud. He should 
not yet do any installation of OSGi + management agent in my opinion because 
that should be the responsibility of Marcel, who manages the Educa deployment 
and can make the trade-off between involving more hardware and running more 
stuff on one node. I'm assuming there is some relationship between managing the 
Educa deployment and actually getting paid by customers who want a certain 
level of service: in my opinion that is more Marcel's than Leon's role.

Consideration: do we also want to support a fixed, unmanageable cloud, in other 
words, just a fixed set of machines that do not run any cloud supporting 
software at all. This would be just a bunch of machines that run OSGi with a 
management agent directly. I'm hesitant about supporting this scenario. On the 
other hand, if Leon's role is only to provide a cloud infrastructure, having an 
unmanageable, fixed cloud simply means he's mostly out of work in this case.

In my view, initial deployment would look like this:
        . Leon has setup (or rented) a cloud.
        .?Marcel installs a provisioning server in the cloud.
        . Marcel creates a number of targets which automatically triggers the 
activation of the same number of cloud nodes which already have OSGi and a 
management agent running.
        . The interface of the provisioning server shows an overview of all 
connected targets, and shows that every target is ready but none of them is 
running a distribution.
        . In the provisioning server an overview of distributions is shown.?
        . Dion has uploaded a set of OSGi bundles and other resources. Those 
together define the?application version Educa 1.0.
        . Marcel chooses a target, and selects the distribution 'Educa 1.0' to 
install on that node.
        . The agent on the target retrieves all needed bundles and bootstraps 
itself.
        . Educa 1.0 is now running.


Adding or removing cluster nodes:
        . After setting up a single target Educa 1.0, Marcel wants to add a 
second target with the same?application in the cloud, that is clustered 
automatically with the first target.
        . Marcel navigates back to the target overview of the provisioning 
server.
        . Next to the already created Educa target, a 'copy' button is shown, 
Marcel presses the button.
        . A new target is started on a new node in the server cloud, with Educa 
1.0 automatically added.
        . The new target adds itself automatically to the other target to form 
a cluster of Educa targets (clusters can be identified with a name, all targets 
that share the same cluster name are in the same cluster)


Regarding clusters, I think there are a couple of options for managing them 
actually:
1) as described in this use case, each cluster target runs the same 
distribution and some external load balancer actually distributes the incoming 
requests between the cluster target. Each cluster target "reports for duty" 
with that load balancer.
2) there is actually some kind of cluster manager running somewhere (or 
everywhere) in the cluster and it decides what distributions are installed on 
which targets, so targets can run a subset of the whole set of components, or 
everthing, depending on decisions the cluster manager makes.

Removing a cluster node:
        . After some time Marcel wants to remove a target in the cluster to 
perform maintenance
        . Marcel goes back to the provisioning server and presses the '-'?button
        . The first target is now automatically a single instanced cluster?
        . The second target is gone
        . The second target is automatically removed from the load balancer

Updating:
        .?Dion has finished Educa 1.1 and uploads the new artifacts to the 
provisioning server.
        .?Marcel approves the update for one or more targets in the cluster 
(depending on whether he wants to do a rolling update, or simply update the 
cluster as quickly as possible).

Note: of course you can define a whole new distribution for every update, but 
updating a target actually creates a new version of the software going to that 
target anyway, and you can always roll back to a previous version, so there is 
no fundamental need to create a new distribution every time. Of course you can, 
if you want to, but the end result on the target is exactly the same.

On 15 Oct 2010, at 15:22 , Martijn van Berkum wrote:


Thanks for the feedback. I agree on splitting the administrator role in two 
roles: one generic sysop, and one per application (1 or more tenants). Based on 
that I added a third role: Marcel the Educa Administrator, and explained the 
various roles a little bit more.

Notes above.


About the multiple composites, from my point of view this is not a valid use 
case for Amdatu management/deployment.

That's interesting, so an application can never become bigger than one 
node/target? I think for big applications you do want to be able to partition 
the application, running parts of it on one target and parts on another. Just 
like multi tenancy is scaling in one direction, this is scaling in the other. 
It could, and I think should be the responsibility of a (possible application 
specific or agnostic) cluster manager to actually decide how to do this 
partitioning.

I do think we don't need to provide this right away, but it does not hurt 
discussing how we would implement it when we need to.


Just like on the generic Internet, every application should itself be prepared 
for unexpected updated REST APIs, services that are down, changed dependencies 
and other horrors. This should not be managed centrally, that is 'old' thinking 
from a viewpoint that everything can be controlled. 

Well, if you want single components to actually scale out by themselves instead 
of doing that at an application level, then you end up with cluster managers 
for each component. That does not fundamentally differ.

I totally agree that a component should simply try to satisfy its dependencies 
and deal with all the dynamics in this environment (which is actually hard to 
do if your dependencies are all REST APIs because they currently have no 
mechanism whatsoever for things like discovery and notifications based on that 
(like the OSGi service registry provides, even for remote services).


Really service oriented architecture means be very service oriented; if the 
other party decides not to show up, do something else, want something else; 
adapt, not demand another contract. Graceful degradation and design for failure 
are common architectural design goals following this philosophy.

Agreed.


Rolling restarts/updates/deployments; I put it in just as a check for us that 
this could be a very common use case, although I know this is really hard. Not 
only if you have only 2 nodes in a cluster and want to update one, but also 
when you want to update thousands of servers. For example the new twitter 
interface was gradually introduced in a few weeks, some users got it much 
earlier than others; apparently they have some kind of rolling update mechanism 
for that.

I'm actually wondering why you're stating that these rolling updates are hard. 
What makes them hard? I can agree that doing schema updates on relational 
databases as part of updates can be hard or time consuming, especially if you 
need to be able to roll them back too, but since we're not using relational 
databases anymore, and having one central database that is used by all 
components is a bad idea from a component perspective anyway, I don't see big 
problems.

Greetings, Marcel

[Amdatu-developers] Use Cases

Reply via email to