Hi All, Reading the discussions I have the following remarks/questions;
-1- The extinction of the 'system administrator' role. In his current job, Leon is responsible for server management from hardware to software. He is responsible for the physical hardware and all 'basic' software that runs on it, except the actual application that runs on the server. His responsibility includes managing the database server, the web server, the OS, the (J)VM, etc. etc. In his new position however, Leon is responsible for ... for what? If Amazon EC2 is used, he is probably only responsible for the Amazon contract to be continued and paid each year. He doesn't do maintenance on the physical hardware and he does not manage cluster nodes. He doesn't have shell access to any machine. So in fact, his job ceased to exist. I'm not saying this is a bad thing, but I wonder who takes over his tasks and responsibilities in the new picture. Seems to me that these tasks are taken over partially by Amazon (hardware, OS) and partially by Amdatu (provisioning, fail over). But what if something bad happens? What if a database server goes down? What if one application goes out of memory? What if the JVM crashes? Traditionally, the answer is "call Leon". In the new picture however, his answer will be "well, that's not my responsibility anymore, is it?". In an Amazon cloud it would probably mean that a server needs to be restarted and/or re-provisioned. As long as there is no relevant persistent data stored on the server, that should be OK. Anyway, I think we should add a use case about all bad stuff that can happen; OOM, crashes, BSOD, hardware failure, power outage, etc. etc. and describe who is responsible for recovering from the errors. -2- Schema updates Marcel stated "I'm actually wondering why you're stating that these rolling updates are hard. What makes them hard? I can agree that doing schema updates on relational databases as part of updates can be hard or time consuming, especially if you need to be able to roll them back too, but since we're not using relational databases anymore, and having one central database that is used by all components is a bad idea from a component perspective anyway, I don't see big problems" I don't think we don't do relational databases anymore. It is true that in the current Amdatu implementation no RDBMS is used, but I think Amdatu should support many storage engines and it's up to the application to decide what storage is actually used. Furthermore, schema updates are hard most of the times, not only with relational databases. For example, using Cassandra (NoSQL) it is extremely hard; it is not supported at all. In SQL you could write an update script, in Cassandra you just can't (CF's cannot be changed). Schema updates are always a tricky thing and much harder then software updates/rollbacks. I think in many cases schema rollbacks will even be impossible, assuming you don't want to lose any data. Thinking of it, the whole provisioning/update/rollback of an application is much easier then update/rollback of persistent data. We're focusing on making it extremely easy to manage distributed applications, where we should focus much more on persistent data. Reliability of persistent data is much more important than the application; software can always be reinstalled, lost data is lost forever. So my point is that the use cases should focus more on dealing with persistent storage. On one hand we say that any storage can be used and it's up to the storage to support clustering, data synchronization, schema updates and schema rollbacks. On the other hand, if we require this from a storage engine, there are not many storage engines we support. -3- Multiple composites I agree that we should support running different components belonging to the same application on different servers. I think we should not think about an application "running on some server" anymore. An application doesn't run on a server, it runs in the cloud and each node in the cloud provides a part of it. Regards, Ivo From: amdatu-developers-bounces at amdatu.org [mailto:[email protected]] On Behalf Of Marcel Offermans Sent: zaterdag 23 oktober 2010 17:52 To: amdatu-developers at amdatu.org Subject: Re: [Amdatu-developers] Use Cases Some feedback on the cloud deployment use case (http://amdatu.org/confluence/display/Amdatu/Amdatu+-+Cloud+Deployment+Use+Case): 1) I think at the lowest level, Amdatu should always be build on top of some "infrastructure as a service". In other words, some kind of cloud infrastructure. That can be a public cloud, such as Amazon's EC-2, or a private cloud (running Eucalypus on a set of your own servers). 2) The role of Leon should be the one in charge of that IAAS layer. Taking 1) into account, he should just be responsible for providing a cloud. He should not yet do any installation of OSGi + management agent in my opinion because that should be the responsibility of Marcel, who manages the Educa deployment and can make the trade-off between involving more hardware and running more stuff on one node. I'm assuming there is some relationship between managing the Educa deployment and actually getting paid by customers who want a certain level of service: in my opinion that is more Marcel's than Leon's role. Consideration: do we also want to support a fixed, unmanageable cloud, in other words, just a fixed set of machines that do not run any cloud supporting software at all. This would be just a bunch of machines that run OSGi with a management agent directly. I'm hesitant about supporting this scenario. On the other hand, if Leon's role is only to provide a cloud infrastructure, having an unmanageable, fixed cloud simply means he's mostly out of work in this case. In my view, initial deployment would look like this: . Leon has setup (or rented) a cloud. .?Marcel installs a provisioning server in the cloud. . Marcel creates a number of targets which automatically triggers the activation of the same number of cloud nodes which already have OSGi and a management agent running. . The interface of the provisioning server shows an overview of all connected targets, and shows that every target is ready but none of them is running a distribution. . In the provisioning server an overview of distributions is shown.? . Dion has uploaded a set of OSGi bundles and other resources. Those together define the?application version Educa 1.0. . Marcel chooses a target, and selects the distribution 'Educa 1.0' to install on that node. . The agent on the target retrieves all needed bundles and bootstraps itself. . Educa 1.0 is now running. Adding or removing cluster nodes: . After setting up a single target Educa 1.0, Marcel wants to add a second target with the same?application in the cloud, that is clustered automatically with the first target. . Marcel navigates back to the target overview of the provisioning server. . Next to the already created Educa target, a 'copy' button is shown, Marcel presses the button. . A new target is started on a new node in the server cloud, with Educa 1.0 automatically added. . The new target adds itself automatically to the other target to form a cluster of Educa targets (clusters can be identified with a name, all targets that share the same cluster name are in the same cluster) Regarding clusters, I think there are a couple of options for managing them actually: 1) as described in this use case, each cluster target runs the same distribution and some external load balancer actually distributes the incoming requests between the cluster target. Each cluster target "reports for duty" with that load balancer. 2) there is actually some kind of cluster manager running somewhere (or everywhere) in the cluster and it decides what distributions are installed on which targets, so targets can run a subset of the whole set of components, or everthing, depending on decisions the cluster manager makes. Removing a cluster node: . After some time Marcel wants to remove a target in the cluster to perform maintenance . Marcel goes back to the provisioning server and presses the '-'?button . The first target is now automatically a single instanced cluster? . The second target is gone . The second target is automatically removed from the load balancer Updating: .?Dion has finished Educa 1.1 and uploads the new artifacts to the provisioning server. .?Marcel approves the update for one or more targets in the cluster (depending on whether he wants to do a rolling update, or simply update the cluster as quickly as possible). Note: of course you can define a whole new distribution for every update, but updating a target actually creates a new version of the software going to that target anyway, and you can always roll back to a previous version, so there is no fundamental need to create a new distribution every time. Of course you can, if you want to, but the end result on the target is exactly the same. On 15 Oct 2010, at 15:22 , Martijn van Berkum wrote: Thanks for the feedback. I agree on splitting the administrator role in two roles: one generic sysop, and one per application (1 or more tenants). Based on that I added a third role: Marcel the Educa Administrator, and explained the various roles a little bit more. Notes above. About the multiple composites, from my point of view this is not a valid use case for Amdatu management/deployment. That's interesting, so an application can never become bigger than one node/target? I think for big applications you do want to be able to partition the application, running parts of it on one target and parts on another. Just like multi tenancy is scaling in one direction, this is scaling in the other. It could, and I think should be the responsibility of a (possible application specific or agnostic) cluster manager to actually decide how to do this partitioning. I do think we don't need to provide this right away, but it does not hurt discussing how we would implement it when we need to. Just like on the generic Internet, every application should itself be prepared for unexpected updated REST APIs, services that are down, changed dependencies and other horrors. This should not be managed centrally, that is 'old' thinking from a viewpoint that everything can be controlled. Well, if you want single components to actually scale out by themselves instead of doing that at an application level, then you end up with cluster managers for each component. That does not fundamentally differ. I totally agree that a component should simply try to satisfy its dependencies and deal with all the dynamics in this environment (which is actually hard to do if your dependencies are all REST APIs because they currently have no mechanism whatsoever for things like discovery and notifications based on that (like the OSGi service registry provides, even for remote services). Really service oriented architecture means be very service oriented; if the other party decides not to show up, do something else, want something else; adapt, not demand another contract. Graceful degradation and design for failure are common architectural design goals following this philosophy. Agreed. Rolling restarts/updates/deployments; I put it in just as a check for us that this could be a very common use case, although I know this is really hard. Not only if you have only 2 nodes in a cluster and want to update one, but also when you want to update thousands of servers. For example the new twitter interface was gradually introduced in a few weeks, some users got it much earlier than others; apparently they have some kind of rolling update mechanism for that. I'm actually wondering why you're stating that these rolling updates are hard. What makes them hard? I can agree that doing schema updates on relational databases as part of updates can be hard or time consuming, especially if you need to be able to roll them back too, but since we're not using relational databases anymore, and having one central database that is used by all components is a bad idea from a component perspective anyway, I don't see big problems. Greetings, Marcel

