Hi,

I wanted to bring up a conversation in hopes that we as a community can create 
a specification for something I am calling module schemas.  Before I get into 
that I want to provide a little background info.

This all started a few years ago when hiera first came out. Data seperation in 
the form of parameters and auto hiera lookups quickly became the norm and 
reusable modules exploded into what the forge is today .  Because of the 
popularity of hiera, data validation is now a major problem though.  Without 
good data, excellent modules become useless. 

Puppet 4 and stdlib brought many new functions and ways to validate incoming 
data, and I consider puppet 4 to now be a loosely typed language now.   Hell, 
there was even this a long time ago: 
https://github.com/puppetlabs/puppetlabs-kwalify 
<https://github.com/puppetlabs/puppetlabs-kwalify>  But puppet only does so 
much, and while having validation reside in code might make troubleshooting a 
snap, there is still a delay in the feedback loop when the code is tightly 
coupled with an external “database” of data.  Data that is inserted by non 
puppet developers who don’t know YAML or data structures.  

So with that said I want to introduce something new to puppet module 
development, called module schemas.  A module schema is a specification that 
details the inner workings of a module.   For right now this means a detailed 
specification of all the parameters for classes and definitions used inside a 
module who’s goal is to make it impossible to insert a bad data structure.  But 
ideally, we can specify so much more (functions, types, providers, templates) 
even hiera calls in weird places like templates and functions, which are 
usually things that do not get documented and are hard to reference and usually 
requires looking at source code. 

What does such a schema look like?

Here is a example schema for the apache module which contains 446 parameters!.  
 
https://github.com/logicminds/puppet_module_schemas/blob/master/apache_schema.yaml

The most immediate use case for such a schema is hiera validation as I have 
outlined here: http://logicminds.github.io/blog/2016/01/16/testing-hiera-data 
<http://logicminds.github.io/blog/2016/01/16/testing-hiera-data>.  Which works 
AWESOME!.  We are validating hiera data and not YAML and doing it under 500 ms 
for every commit on every single file. 

As a community we need a solution for validating hiera data.  Its my belief 
that schemas are the way to go.   After all hiera data is now in modules with 
no way to easily validate. 

Other use cases that come to mind:

  - generating documentation (Many modules on the forge usually contain a 
static map of parameters used inside the module).   If a schema was present, we 
could just generate that same map automatically.
  
  - useful for other 3rd party tools like puppet strings 
  
  Parameter specification lookup
  - Imagine a  face that shows internal puppet module specifications.  I am not 
talking about puppet-strings, this would detail the parameters given a class, 
or an example parameter value given a parameter name.
    
    Scenario: 
      - puppet module puppetlabs/apache   (outputs all the parameters, classes 
for that module) in a specified format (json or yaml)
      - puppet module puppetlabs-apache::class_name (outputs all the parameters 
for the class in a specified format (json or yaml)
      - puppet module puppetlabs-apache::class_name::param1  (outputs an 
example value for that parameter, as well as the default value) in a specified 
format (json or yaml)

Foreman and Puppet Console need this level of detail as well.  Currently, both 
of these solutions spend quite a bit of time parsing code to show parameters 
for UI display.   It would be much easier if a schema was available that 
detailed this level of data.  Think of the speed improvements that could be had 
if this information was “cached” in a file.   These solutions currently load or 
intelligently scan all the puppet code for every puppet environment to get the 
parameters and defaults.   

Here is how we can create a schema 
http://logicminds.github.io/blog/2016/01/15/how-to-build-a-module-schema/ 
<http://logicminds.github.io/blog/2016/01/15/how-to-build-a-module-schema/>    
(which I even automated with retrospect-puppet 
(https://github.com/nwops/puppet-retrospec.git 
<https://github.com/nwops/puppet-retrospec.git>)

However,  we all need to agree on something before schemas can ever be a 
“thing”.  We need a schema for module schemas.  This is important because as 
soon as 3rd party tools or scripts start to use schemas and later we decide the 
schema needs changing, everything breaks.  Tools need a specification to work 
from. 

So with this in mind and an example schema here: 
https://github.com/logicminds/puppet_module_schemas/blob/master/apache_schema.yaml
 
<https://github.com/logicminds/puppet_module_schemas/blob/master/apache_schema.yaml>.
  How can this be improved?  What should we add?  

About the only change I was pondering was adding another object for the types 
themselves.   
https://github.com/logicminds/puppet_module_schemas/blob/master/specification_with_types.yaml
 
<https://github.com/logicminds/puppet_module_schemas/blob/master/specification_with_types.yaml>

What are your thoughts?  What steps do we need to take to make this a supported 
specification?  What would you desire in a module schema?

Am I the only one that thinks this is a killer solution?


Corey Osman








       
  

 

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/27236109-21A1-461F-B02D-10ACAB9D3118%40nwops.io.
For more options, visit https://groups.google.com/d/optout.

Reply via email to