Re: [Puppet-dev] RFC - A specification for module schemas

Corey Osman Sat, 30 Jan 2016 19:29:43 -0800


On Saturday, January 30, 2016 at 9:29:02 AM UTC-8, Gareth Rushgrove wrote:
>
> I think there are some interesting ideas here but I want to pull out 
> the problems, and then jot down a few thoughts. Lots of comments 
> inline. 
>
>
> On 30 January 2016 at 04:45, Corey Osman <co...@logicminds.biz 
> <javascript:>> wrote: 
> > Hi, 
> > 
> > I wanted to bring up a conversation in hopes that we as a community can 
> > create a specification for something I am calling module schemas. 
>  Before I 
> > get into that I want to provide a little background info. 
> > 
> > This all started a few years ago when hiera first came out. Data 
> seperation 
> > in the form of parameters and auto hiera lookups quickly became the norm 
> and 
> > reusable modules exploded into what the forge is today .  Because of the 
> > popularity of hiera, data validation is now a major problem though. 
>  Without 
> > good data, excellent modules become useless. 
> > 
> > Puppet 4 and stdlib brought many new functions and ways to validate 
> incoming 
> > data, and I consider puppet 4 to now be a loosely typed language now. 
> > Hell, there was even this a long time ago: 
> > https://github.com/puppetlabs/puppetlabs-kwalify  But puppet only does 
> so 
> > much, and while having validation reside in code might make 
> troubleshooting 
> > a snap, there is still a delay in the feedback loop when the code is 
> tightly 
> > coupled with an external “database” of data.  Data that is inserted by 
> non 
> > puppet developers who don’t know YAML or data structures. 
> > 
>
> This appears to be the core problem, and I think it's worth spelling 
> out separate from the proposed implementation. 
>


I think everyone at every level has fat fingered a typo or inserted invalid 
data at some point. 
While this is a problem, it also speaks highly of puppet, because I can 
have my grandmother flip a feature flag to install some complicated thing 
on a bunch of systems by editing some simple text. ;)
 

>
> Given a set of hiera data, and given a set of puppet modules, how can 
> I tell if my hiera data is valid? 
>
> > So with that said I want to introduce something new to puppet module 
> > development, called module schemas.  A module schema is a specification 
> that 
> > details the inner workings of a module. 
>
> Just to be a little pedantic, this isn't the inner-workings, but the 
> interface the module presents to the user. 
>

Yea, that is much better. 
 

>
> > For right now this means a 
> > detailed specification of all the parameters for classes and definitions 
> > used inside a module who’s goal is to make it impossible to insert a bad 
> > data structure.  But ideally, we can specify so much more (functions, 
> types, 
> > providers, templates) even hiera calls in weird places like templates 
> and 
> > functions, which are usually things that do not get documented and are 
> hard 
> > to reference and usually requires looking at source code. 
> > 
>
> A clarifying question. Are you imagining this as something that is 
> created and maintained by hand, and kept as a concrete thing (i.e a 
> file in git alongside the code) or as a serialisation that is 
> generated as needed (and potentially cached by the consumer) from the 
> Puppet code? 
>

Yes, I do envision a static schema file that would be at the mercy of the 
developer to update.  The reason behind this
is that puppet (from my eyes and definitely 3.x) cannot tell me schema of a 
given parameter.  Keep in mind I haven't used 4.3 yet. If using 4.3.x I 
think the serialization would be the better route since puppet can spit 
that info out.  Having a external file in the codebase does allow for 
easier consumption at the fate that it might be out of date.  It also keeps 
us from having to load puppet and parse out that information everytime. 
 

>
> > What does such a schema look like? 
> > 
> > Here is a example schema for the apache module which contains 446 
> > parameters!. 
> > 
> https://github.com/logicminds/puppet_module_schemas/blob/master/apache_schema.yaml
>  
> > 
> > The most immediate use case for such a schema is hiera validation as I 
> have 
> > outlined here: 
> > http://logicminds.github.io/blog/2016/01/16/testing-hiera-data.  Which 
> works 
> > AWESOME!.  We are validating hiera data and not YAML and doing it under 
> 500 
> > ms for every commit on every single file. 
> > 
> > As a community we need a solution for validating hiera data.  Its my 
> belief 
> > that schemas are the way to go.   After all hiera data is now in modules 
> > with no way to easily validate. 
> > 
> > Other use cases that come to mind: 
> > 
> >   - generating documentation (Many modules on the forge usually contain 
> a 
> > static map of parameters used inside the module).   If a schema was 
> present, 
> > we could just generate that same map automatically. 
> > 
>
> Strings is looking at documentation generation from Puppet code. You 
> don't actually need a schema as an intermediary format here too. 
>
> >   - useful for other 3rd party tools like puppet strings 
> > 
> >   Parameter specification lookup 
> >   - Imagine a  face that shows internal puppet module specifications.  I 
> am 
> > not talking about puppet-strings, this would detail the parameters given 
> a 
> > class, or an example parameter value given a parameter name. 
> > 
>
> I think in both this and the above case what you're really saying is 
> that there should be a high-level API/library for parsing Puppet code 
> and extracting information in a useful format? Or similar to the above 
> are you seeing this as something that is managed separately from the  

code? 
>

This would be extremely useful. Currently every third party tool 
(puppet-lint, retrospec, strings, foreman, console)
implements their own way to get the same information.  This would 
definitely make it easier for all of us is there was a higher level API. 
Seems like this could be done with a new face that is not part of the core. 


>     Scenario: 
> >       - puppet module puppetlabs/apache   (outputs all the parameters, 
> > classes for that module) in a specified format (json or yaml) 
> >       - puppet module puppetlabs-apache::class_name (outputs all the 
> > parameters for the class in a specified format (json or yaml) 
> >       - puppet module puppetlabs-apache::class_name::param1  (outputs an 
> > example value for that parameter, as well as the default value) in a 
> > specified format (json or yaml) 
> > 
> > Foreman and Puppet Console need this level of detail as well. 
>  Currently, 
> > both of these solutions spend quite a bit of time parsing code to show 
> > parameters for UI display.   It would be much easier if a schema was 
> > available that detailed this level of data.  Think of the speed 
> improvements 
> > that could be had if this information was “cached” in a file. 
>
> Caching could be part of the API, but it's probably more useful for 
> caching to be part of the consumer (ie. Foreman or whatever), because 
> knowing when to bust the cache is often context specific. 
>
> > These 
> > solutions currently load or intelligently scan all the puppet code for 
> every 
> > puppet environment to get the parameters and defaults. 
> > 
> > Here is how we can create a schema 
> > 
> http://logicminds.github.io/blog/2016/01/15/how-to-build-a-module-schema/ 
> > (which I even automated with retrospect-puppet 
> > (https://github.com/nwops/puppet-retrospec.git) 
> > 
> > However,  we all need to agree on something before schemas can ever be a 
> > “thing”.  We need a schema for module schemas.  This is important 
> because as 
> > soon as 3rd party tools or scripts start to use schemas and later we 
> decide 
> > the schema needs changing, everything breaks.  Tools need a 
> specification to 
> > work from. 
> > 
> > So with this in mind and an example schema here: 
> > 
> https://github.com/logicminds/puppet_module_schemas/blob/master/apache_schema.yaml.
>  
>
> > How can this be improved?  What should we add? 
> > 
> > About the only change I was pondering was adding another object for the 
> > types themselves. 
> > 
> https://github.com/logicminds/puppet_module_schemas/blob/master/specification_with_types.yaml
>  
> > 
> > What are your thoughts?  What steps do we need to take to make this a 
> > supported specification?  What would you desire in a module schema? 
> > 
> > Am I the only one that thinks this is a killer solution? 
> > 
>
> I think I agree with RI that we want to have a standard way to 
> "introspect the classes and extract this metadata 
> automagically" and from there build those tools. 
>
> For more relevant context. My points above are mainly about a concern 
> that maintaining a schema by hand separate to the puppet code itself 
> (which describes the same thing) is a perilous path. RI's point was 
> that you already have the Puppet code. 
>

Yes, very true. But only if your puppet code is written in 4.x does this 
make sense.
  

>
> The other option is to flip this on it's head, and generate Puppet 
> code from a schema. As luck would have it I've been doing some of this 
> with surprising success recently with the Kubernetes module. 
>

Yea, thats pretty dope.  I thought swagger was just for making REST APIs. 
 

>
> https://github.com/garethr/garethr-kubernetes 
> https://github.com/garethr/puppet-swagger-generator 
>
> Kubernetes has a Swagger schema 
> (https://github.com/garethr/garethr-kubernetes/blob/master/v1.json), 
> which describes lots of information about the Kubernetes API, 
> including about the resources and properties of resources. 
>
> In this case it's worth it, mainly as a time saving mechanism to both 
> create and to maintain the code. The Kubernetes module has about ~200 
> lines of ruby written by me, and about ~16000 lines of Ruby written by 
> the generator. 
>
> It would be totally possible to generate pure Puppet code from a 
> schema in a similar way. However, I'm not sure it would solve many 
> actual problems outside maybe shorthands for thinks like complex 
> types. Ultimately you'd be taking a pure data format (likely with lots 
> of repetition and all the fun of pure data) and generating in Puppet 
> something that's actually more succinct. It's worth noting in the 
> Kubernetes examples that the schema itself is generated from the 
> Kubernetes Go source code, it's not hand crafted. 
>
> So, an API that gives you an intermediary format describing the code 
> would likely be a good thing, and several tools could use it. I have a 
> sneaking suspicion this might already be in Puppet Strings :) 
>
> running strings with --emit-json-stdout gives you a JSON schema which 
> is defined here: 
>
>
> https://github.com/puppetlabs/puppetlabs-strings/blob/master/json_dom.md#defined-types
>  
>
> And Strings is now available to install as a gem, so should be able to 
> be used as a library for a Hiera data validator. 
>
> We could probably take a run at something at the Contributor Summit if 
> a few people are interested? 
>
> Phew, finished 
>
> Gareth 
>
>
>
> > 
> > Corey Osman 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "Puppet Developers" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to puppet-dev+...@googlegroups.com <javascript:>. 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/puppet-dev/27236109-21A1-461F-B02D-10ACAB9D3118%40nwops.io.
>  
>
> > For more options, visit https://groups.google.com/d/optout. 
>
>
>
> -- 
> Gareth Rushgrove 
> @garethr 
>
> devopsweekly.com 
> morethanseven.net 
> garethrushgrove.com 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/0ef31929-c304-4d0b-ac84-58da9b1f0da9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Puppet-dev] RFC - A specification for module schemas

Reply via email to