All,

In capturing my thoughts on storage, my thinking backed into the driver model.  
While we have the beginnings of such a model today, I see the following 
deficiencies:

Multiple Models: The Storage, Hypervisor, and Security layers each have a 
slightly different model for allowing system functionality to be 
extended/substituted.  These differences increase the barrier of entry for 
vendors seeking to extend CloudStack and accrete code paths to be maintained 
and verified.
Leaky Abstraction:  Plugins are registered through a Spring configuration file. 
 In addition to being operator unfriendly (most sysadmins are not Spring 
experts nor do they want to be), we expose the core bootstrapping mechanism to 
operators.  Therefore, a misconfiguration could negatively impact the 
injection/configuration of internal management server components.  Essentially 
handing them a loaded shotgun pointed at our right foot.
Nondeterministic Load/Unload Model:  Because the core loading mechanism is 
Spring, the management has little control over the timing and order of 
component loading/unloading.  Changes to the Management Server's component 
dependency graph could break a driver by causing it to be started at an 
unexpected time.
Lack of Execution Isolation: As a Spring component, plugins are loaded into the 
same execution context as core management server components.  Therefore, an 
errant plugin can corrupt the entire management server.  

For next revision of the plugin/driver mechanism, I would like see us migrate 
towards a standard pluggable driver model that supports all of the management 
server's extension points (e.g. network devices, storage devices, hypervisors, 
etc) with the following capabilities:

Consolidated Lifecycle and Startup Procedure:  Drivers share a common state 
machine and categorization (e.g. network, storage, hypervisor, etc) that 
permits the deterministic calculation of initialization and destruction order 
(i.e. network layer drivers -> storage layer drivers -> hypervisor drivers).  
Plugin inter-dependencies would be supported between plugins sharing the same 
category.
In-process Installation and Upgrade: Adding or upgrading a driver does not 
require the management server to be restarted.  This capability implies a 
system that supports the simultaneous execution of multiple driver versions and 
the ability to suspend continued execution work on a resource while the 
underlying driver instance is replaced.
Execution Isolation: The deployment packaging and execution environment 
supports different (and potentially conflicting) versions of dependencies to be 
simultaneously used.  Additionally, plugins would be sufficiently sandboxed to 
protect the management server against driver instability. 
Extension Data Model: Drivers provide a property bag with a metadata descriptor 
to validate and render vendor specific data.  The contents of this property bag 
will provided to every driver operation invocation at runtime.  The metadata 
descriptor would be a lightweight description that provides a label resource 
key, a description resource key, data type (string, date, number, boolean), 
required flag, and optional length limit.
Introspection: Administrative APIs/UIs allow operators to understand the 
configuration of the drivers in the system, their configuration, and their 
current state.
Discoverability: Optionally, drivers can be discovered via a project repository 
definition (similar to Yum) allowing drivers to be remotely acquired and 
operators to be notified regarding update availability.  The project would also 
provide, free of charge, certificates to sign plugins.  This mechanism would 
support local mirroring to support air gapped management networks.

Fundamentally, I do not want to turn CloudStack into an erector set with more 
screws than nuts which is a risk with highly pluggable architectures.  As such, 
I think we would need to tightly bound the scope of drivers and their behaviors 
to prevent the loss system usability and stability.  My thinking is that 
drivers would be packaged into a custom JAR, CAR (CloudStack ARchive), that 
would be structured as followed:

META-INF
MANIFEST.MF
driver.yaml (driver metadata(e.g. version, name, description, etc) serialized 
in YAML format)
LICENSE (a text file containing the driver's license)
lib (driver dependencies)
classes (driver implementation)
resources (driver message files and potentially JS resources)

The management server would acquire drivers through a simple scan of a URL 
(e.g. file directory, S3 bucket, etc).  For every CAR object found, the 
management server would create an execution environment (likely a dedicated 
ExecutorService and Classloader), and transition the state of the driver to 
Running (the exact state model would need to be worked out).  To be really 
nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to create 
CARs.   I can also imagine an opportunities to add hooks to this model to 
register instrumentation information with JMX and authorization.

To keep the scope of this email confined, we would introduce the general notion 
of a Resource, and (hand wave hand wave) eventually compartmentalize the 
execution of work around a resource [1].  This (hand waved) 
compartmentalization would allow us the controls necessary to safely and 
reliably perform in-place driver upgrades.  For an initial release, I would 
recommend implementing the abstractions, loading mechanism, extension data 
model, and discovery features.  With these capabilities in place, we could 
attack the in-place upgrade model.

If we were to adopt such a pluggable capability, we would have the opportunity 
to decouple the vendor and CloudStack release schedules.  For example, if a 
vendor were introducing a new product that required a new or updated driver, 
they would no longer need to wait for a CloudStack release to support it.  They 
would also gain the ability to fix high priority defects in the same manner. 

I have hand waved a number of issues that would need to be resolved before such 
an approach could be implemented.  However, I think we need to decide, as a 
community, that it worth devoting energy and effort to enhancing the 
plugin/driver model and the goals of that effort before driving head first into 
the deep rabbit hole of design/implementation.  

Thoughts? (/me ducks)
-John

[1]: My opinions on the matter from CloudStack Collab 2013 -> 
http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to