Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Darren Shepherd Tue, 20 Aug 2013 16:45:47 -0700

I know this isn't terribly useful, but I've been drawing a lot of squares and 
circles and lines that connect those squares and circles lately and I have a 
lot of architectural ideas for CloudStack.  At the rate I'm going it will take 
me about two weeks to put together a discussion/proposal for the community.  
What I'm thinking is a superset of what you've listed out and should align with 
your idea of a CAR.  The focus has a a lot to do with modularity and 
extensibility.


So more to come soon....  I will say one thing though, is with java you end up 
having a hard time doing dynamic load and unloading of modules.  There's plenty 
of frameworks that try really hard to do this right, like OSGI, but its darn 
near impossible to do it right because of class loading and GC issues (and 
that's why Eclipse has you restart after installing plugs even though it is 
OSGi).   

I do believe that CloudStack should be possible of zero downtime maintenance 
and have ideas around that, but at the end of the day, for plenty of practical 
reasons, you still need a JVM restart if modules change.   

Darren

On Aug 20, 2013, at 3:39 PM, Mike Tutkowski <mike.tutkow...@solidfire.com> 
wrote:

> I agree, John - let's get consensus first, then talk time tables.
> 
> 
> On Tue, Aug 20, 2013 at 4:31 PM, John Burwell <jburw...@basho.com> wrote:
> 
>> Mike,
>> 
>> Before we can dig into timelines or implementations, I think we need to
>> get consensus on the problem to solved and the goals.  Once we have a
>> proper understanding of the scope, I believe we can chunk the across a set
>> of development lifecycle.  The subject is vast, but it also has a far
>> reaching impact to both the storage and network layer evolution efforts.
>> As such, I believe we need to start addressing it as part of the next
>> release.
>> 
>> As a separate thread, we need to discuss the timeline for the next
>> release.  I think we need to avoid the time compression caused by the
>> overlap of the 4.1 stabilization effort and 4.2 development.  Therefore, I
>> don't think we should consider development of the next release started
>> until the first 4.2 RC is released.  I will try to open a separate discuss
>> thread for this topic, as well as, tying of the discussion of release code
>> names.
>> 
>> Thanks,
>> -John
>> 
>> On Aug 20, 2013, at 6:22 PM, Mike Tutkowski <mike.tutkow...@solidfire.com>
>> wrote:
>> 
>>> Hey John,
>>> 
>>> I think this is some great stuff. Thanks for the write up.
>>> 
>>> It looks like you have ideas around what might go into a first release of
>>> this plug-in framework. Were you thinking we'd have enough time to
>> squeeze
>>> that first rev into 4.3. I'm just wondering (it's not a huge deal to hit
>>> that release for this) because we would only have about five weeks.
>>> 
>>> Thanks
>>> 
>>> 
>>> On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <jburw...@basho.com>
>> wrote:
>>> 
>>>> All,
>>>> 
>>>> In capturing my thoughts on storage, my thinking backed into the driver
>>>> model.  While we have the beginnings of such a model today, I see the
>>>> following deficiencies:
>>>> 
>>>> 
>>>>  1. *Multiple Models*: The Storage, Hypervisor, and Security layers
>>>>  each have a slightly different model for allowing system
>> functionality to
>>>>  be extended/substituted.  These differences increase the barrier of
>> entry
>>>>  for vendors seeking to extend CloudStack and accrete code paths to be
>>>>  maintained and verified.
>>>>  2. *Leaky Abstraction*:  Plugins are registered through a Spring
>>>>  configuration file.  In addition to being operator unfriendly (most
>>>>  sysadmins are not Spring experts nor do they want to be), we expose
>> the
>>>>  core bootstrapping mechanism to operators.  Therefore, a
>> misconfiguration
>>>>  could negatively impact the injection/configuration of internal
>> management
>>>>  server components.  Essentially handing them a loaded shotgun pointed
>> at
>>>>  our right foot.
>>>>  3. *Nondeterministic Load/Unload Model*:  Because the core loading
>>>>  mechanism is Spring, the management has little control over the
>> timing and
>>>>  order of component loading/unloading.  Changes to the Management
>> Server's
>>>>  component dependency graph could break a driver by causing it to be
>> started
>>>>  at an unexpected time.
>>>>  4. *Lack of Execution Isolation*: As a Spring component, plugins are
>>>>  loaded into the same execution context as core management server
>>>>  components.  Therefore, an errant plugin can corrupt the entire
>> management
>>>>  server.
>>>> 
>>>> 
>>>> For next revision of the plugin/driver mechanism, I would like see us
>>>> migrate towards a standard pluggable driver model that supports all of
>> the
>>>> management server's extension points (e.g. network devices, storage
>>>> devices, hypervisors, etc) with the following capabilities:
>>>> 
>>>> 
>>>>  - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
>>>>  common state machine and categorization (e.g. network, storage,
>> hypervisor,
>>>>  etc) that permits the deterministic calculation of initialization and
>>>>  destruction order (i.e. network layer drivers -> storage layer
>> drivers ->
>>>>  hypervisor drivers).  Plugin inter-dependencies would be supported
>> between
>>>>  plugins sharing the same category.
>>>>  - *In-process Installation and Upgrade*: Adding or upgrading a driver
>>>>  does not require the management server to be restarted.  This
>> capability
>>>>  implies a system that supports the simultaneous execution of multiple
>>>>  driver versions and the ability to suspend continued execution work
>> on a
>>>>  resource while the underlying driver instance is replaced.
>>>>  - *Execution Isolation*: The deployment packaging and execution
>>>>  environment supports different (and potentially conflicting) versions
>> of
>>>>  dependencies to be simultaneously used.  Additionally, plugins would
>> be
>>>>  sufficiently sandboxed to protect the management server against driver
>>>>  instability.
>>>>  - *Extension Data Model*: Drivers provide a property bag with a
>>>>  metadata descriptor to validate and render vendor specific data.  The
>>>>  contents of this property bag will provided to every driver operation
>>>>  invocation at runtime.  The metadata descriptor would be a lightweight
>>>>  description that provides a label resource key, a description
>> resource key,
>>>>  data type (string, date, number, boolean), required flag, and optional
>>>>  length limit.
>>>>  - *Introspection: Administrative APIs/UIs allow operators to
>>>>  understand the configuration of the drivers in the system, their
>>>>  configuration, and their current state.*
>>>>  - *Discoverability*: Optionally, drivers can be discovered via a
>>>>  project repository definition (similar to Yum) allowing drivers to be
>>>>  remotely acquired and operators to be notified regarding update
>>>>  availability.  The project would also provide, free of charge,
>> certificates
>>>>  to sign plugins.  This mechanism would support local mirroring to
>> support
>>>>  air gapped management networks.
>>>> 
>>>> 
>>>> Fundamentally, I do not want to turn CloudStack into an erector set with
>>>> more screws than nuts which is a risk with highly pluggable
>> architectures.
>>>> As such, I think we would need to tightly bound the scope of drivers and
>>>> their behaviors to prevent the loss system usability and stability.  My
>>>> thinking is that drivers would be packaged into a custom JAR, CAR
>>>> (CloudStack ARchive), that would be structured as followed:
>>>> 
>>>> 
>>>>  - META-INF
>>>>     - MANIFEST.MF
>>>>     - driver.yaml (driver metadata(e.g. version, name, description,
>>>>     etc) serialized in YAML format)
>>>>     - LICENSE (a text file containing the driver's license)
>>>>  - lib (driver dependencies)
>>>>  - classes (driver implementation)
>>>>  - resources (driver message files and potentially JS resources)
>>>> 
>>>> 
>>>> The management server would acquire drivers through a simple scan of a
>> URL
>>>> (e.g. file directory, S3 bucket, etc).  For every CAR object found, the
>>>> management server would create an execution environment (likely a
>> dedicated
>>>> ExecutorService and Classloader), and transition the state of the
>> driver to
>>>> Running (the exact state model would need to be worked out).  To be
>> really
>>>> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to
>>>> create CARs.   I can also imagine an opportunities to add hooks to this
>>>> model to register instrumentation information with JMX and
>> authorization.
>>>> 
>>>> To keep the scope of this email confined, we would introduce the general
>>>> notion of a Resource, and (hand wave hand wave) eventually
>> compartmentalize
>>>> the execution of work around a resource [1].  This (hand waved)
>>>> compartmentalization would allow us the controls necessary to safely and
>>>> reliably perform in-place driver upgrades.  For an initial release, I
>> would
>>>> recommend implementing the abstractions, loading mechanism, extension
>> data
>>>> model, and discovery features.  With these capabilities in place, we
>> could
>>>> attack the in-place upgrade model.
>>>> 
>>>> If we were to adopt such a pluggable capability, we would have the
>>>> opportunity to decouple the vendor and CloudStack release schedules.
>> For
>>>> example, if a vendor were introducing a new product that required a new
>> or
>>>> updated driver, they would no longer need to wait for a CloudStack
>> release
>>>> to support it.  They would also gain the ability to fix high priority
>>>> defects in the same manner.
>>>> 
>>>> I have hand waved a number of issues that would need to be resolved
>> before
>>>> such an approach could be implemented.  However, I think we need to
>> decide,
>>>> as a community, that it worth devoting energy and effort to enhancing
>> the
>>>> plugin/driver model and the goals of that effort before driving head
>> first
>>>> into the deep rabbit hole of design/implementation.
>>>> 
>>>> Thoughts? (/me ducks)
>>>> -John
>>>> 
>>>> [1]: My opinions on the matter from CloudStack Collab 2013 ->
>> http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management
>>> 
>>> 
>>> 
>>> --
>>> *Mike Tutkowski*
>>> *Senior CloudStack Developer, SolidFire Inc.*
>>> e: mike.tutkow...@solidfire.com
>>> o: 303.746.7302
>>> Advancing the way the world uses the
>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>> *™*
> 
> 
> -- 
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkow...@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the
> cloud<http://solidfire.com/solution/overview/?video=play>
> *™*

Re: [DISCUSS/PROPOSAL] Upgrading Driver Model

Reply via email to