Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

John Burwell Tue, 20 Aug 2013 15:32:41 -0700

Mike,

Before we can dig into timelines or implementations, I think we need to get 
consensus on the problem to solved and the goals.  Once we have a proper 
understanding of the scope, I believe we can chunk the across a set of 
development lifecycle.  The subject is vast, but it also has a far reaching 
impact to both the storage and network layer evolution efforts.  As such, I 
believe we need to start addressing it as part of the next release.


As a separate thread, we need to discuss the timeline for the next release.  I 
think we need to avoid the time compression caused by the overlap of the 4.1 
stabilization effort and 4.2 development.  Therefore, I don't think we should 
consider development of the next release started until the first 4.2 RC is 
released.  I will try to open a separate discuss thread for this topic, as well 
as, tying of the discussion of release code names.

Thanks,
-John

On Aug 20, 2013, at 6:22 PM, Mike Tutkowski <mike.tutkow...@solidfire.com> 
wrote:

> Hey John,
> 
> I think this is some great stuff. Thanks for the write up.
> 
> It looks like you have ideas around what might go into a first release of
> this plug-in framework. Were you thinking we'd have enough time to squeeze
> that first rev into 4.3. I'm just wondering (it's not a huge deal to hit
> that release for this) because we would only have about five weeks.
> 
> Thanks
> 
> 
> On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <jburw...@basho.com> wrote:
> 
>> All,
>> 
>> In capturing my thoughts on storage, my thinking backed into the driver
>> model.  While we have the beginnings of such a model today, I see the
>> following deficiencies:
>> 
>> 
>>   1. *Multiple Models*: The Storage, Hypervisor, and Security layers
>>   each have a slightly different model for allowing system functionality to
>>   be extended/substituted.  These differences increase the barrier of entry
>>   for vendors seeking to extend CloudStack and accrete code paths to be
>>   maintained and verified.
>>   2. *Leaky Abstraction*:  Plugins are registered through a Spring
>>   configuration file.  In addition to being operator unfriendly (most
>>   sysadmins are not Spring experts nor do they want to be), we expose the
>>   core bootstrapping mechanism to operators.  Therefore, a misconfiguration
>>   could negatively impact the injection/configuration of internal management
>>   server components.  Essentially handing them a loaded shotgun pointed at
>>   our right foot.
>>   3. *Nondeterministic Load/Unload Model*:  Because the core loading
>>   mechanism is Spring, the management has little control over the timing and
>>   order of component loading/unloading.  Changes to the Management Server's
>>   component dependency graph could break a driver by causing it to be started
>>   at an unexpected time.
>>   4. *Lack of Execution Isolation*: As a Spring component, plugins are
>>   loaded into the same execution context as core management server
>>   components.  Therefore, an errant plugin can corrupt the entire management
>>   server.
>> 
>> 
>> For next revision of the plugin/driver mechanism, I would like see us
>> migrate towards a standard pluggable driver model that supports all of the
>> management server's extension points (e.g. network devices, storage
>> devices, hypervisors, etc) with the following capabilities:
>> 
>> 
>>   - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
>>   common state machine and categorization (e.g. network, storage, hypervisor,
>>   etc) that permits the deterministic calculation of initialization and
>>   destruction order (i.e. network layer drivers -> storage layer drivers ->
>>   hypervisor drivers).  Plugin inter-dependencies would be supported between
>>   plugins sharing the same category.
>>   - *In-process Installation and Upgrade*: Adding or upgrading a driver
>>   does not require the management server to be restarted.  This capability
>>   implies a system that supports the simultaneous execution of multiple
>>   driver versions and the ability to suspend continued execution work on a
>>   resource while the underlying driver instance is replaced.
>>   - *Execution Isolation*: The deployment packaging and execution
>>   environment supports different (and potentially conflicting) versions of
>>   dependencies to be simultaneously used.  Additionally, plugins would be
>>   sufficiently sandboxed to protect the management server against driver
>>   instability.
>>   - *Extension Data Model*: Drivers provide a property bag with a
>>   metadata descriptor to validate and render vendor specific data.  The
>>   contents of this property bag will provided to every driver operation
>>   invocation at runtime.  The metadata descriptor would be a lightweight
>>   description that provides a label resource key, a description resource key,
>>   data type (string, date, number, boolean), required flag, and optional
>>   length limit.
>>   - *Introspection: Administrative APIs/UIs allow operators to
>>   understand the configuration of the drivers in the system, their
>>   configuration, and their current state.*
>>   - *Discoverability*: Optionally, drivers can be discovered via a
>>   project repository definition (similar to Yum) allowing drivers to be
>>   remotely acquired and operators to be notified regarding update
>>   availability.  The project would also provide, free of charge, certificates
>>   to sign plugins.  This mechanism would support local mirroring to support
>>   air gapped management networks.
>> 
>> 
>> Fundamentally, I do not want to turn CloudStack into an erector set with
>> more screws than nuts which is a risk with highly pluggable architectures.
>> As such, I think we would need to tightly bound the scope of drivers and
>> their behaviors to prevent the loss system usability and stability.  My
>> thinking is that drivers would be packaged into a custom JAR, CAR
>> (CloudStack ARchive), that would be structured as followed:
>> 
>> 
>>   - META-INF
>>      - MANIFEST.MF
>>      - driver.yaml (driver metadata(e.g. version, name, description,
>>      etc) serialized in YAML format)
>>      - LICENSE (a text file containing the driver's license)
>>   - lib (driver dependencies)
>>   - classes (driver implementation)
>>   - resources (driver message files and potentially JS resources)
>> 
>> 
>> The management server would acquire drivers through a simple scan of a URL
>> (e.g. file directory, S3 bucket, etc).  For every CAR object found, the
>> management server would create an execution environment (likely a dedicated
>> ExecutorService and Classloader), and transition the state of the driver to
>> Running (the exact state model would need to be worked out).  To be really
>> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to
>> create CARs.   I can also imagine an opportunities to add hooks to this
>> model to register instrumentation information with JMX and authorization.
>> 
>> To keep the scope of this email confined, we would introduce the general
>> notion of a Resource, and (hand wave hand wave) eventually compartmentalize
>> the execution of work around a resource [1].  This (hand waved)
>> compartmentalization would allow us the controls necessary to safely and
>> reliably perform in-place driver upgrades.  For an initial release, I would
>> recommend implementing the abstractions, loading mechanism, extension data
>> model, and discovery features.  With these capabilities in place, we could
>> attack the in-place upgrade model.
>> 
>> If we were to adopt such a pluggable capability, we would have the
>> opportunity to decouple the vendor and CloudStack release schedules.  For
>> example, if a vendor were introducing a new product that required a new or
>> updated driver, they would no longer need to wait for a CloudStack release
>> to support it.  They would also gain the ability to fix high priority
>> defects in the same manner.
>> 
>> I have hand waved a number of issues that would need to be resolved before
>> such an approach could be implemented.  However, I think we need to decide,
>> as a community, that it worth devoting energy and effort to enhancing the
>> plugin/driver model and the goals of that effort before driving head first
>> into the deep rabbit hole of design/implementation.
>> 
>> Thoughts? (/me ducks)
>> -John
>> 
>> [1]: My opinions on the matter from CloudStack Collab 2013 ->
>> http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management
>> 
> 
> 
> 
> -- 
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkow...@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the
> cloud<http://solidfire.com/solution/overview/?video=play>
> *™*

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Reply via email to