[Puppet-dev] Re: Has anyone already developed an Elasticsearch backend to Hiera?

nick . george Wed, 04 Apr 2018 20:35:13 -0700

Thanks for your tips Reid, especially the bit about "data_hash". I'll be 
sure to keep that in mind if I end up writing such a backend. Unfortunately 
there's no budget for this, so would definitely be an 'in-house' job. It's 
possible that I might be able to use the http_data_hash plugin you 
mentioned with Elasticsearch as it talks HTTP.


Cheers, 
Nick

On Tuesday, April 3, 2018 at 2:32:25 AM UTC+10, Reid Vandewiele wrote:
>
> Hey Nick,
>
> A particular phrase you used caught my attention: "Elasticsearch holds the 
> Hiera config for a number of nodes."
>
> There's a lot about putting together the words "elasticsearch" and "hiera 
> backend" that can sound scary if it's done wrong, but I have seen backends 
> built to solve the "config for individual nodes" problem in a way that 
> complements Hiera's default yaml backend system, without noticeably 
> sacrificing performance, by using a carefully limited number of calls to 
> the external backend per catalog compile. Most generalized data that 
> doesn't need to change frequently or programmatically is still stored in 
> yaml files alongside the code.
>
> When that's done, the implementing hiera.yaml file may look something like 
> this:
>
> hierarchy:
>   - name: 'Per-node data'
>     data_hash: elasticsearch_data
>     uri: 'http://localhost:9200'
>     path: %{trusted.certname}"  
>   - name: 'Yaml data'    data_hash: yaml_data    paths:      - 
> "role/%{trusted.extensions.pp_role}"      - 
> "datacenter/%{trusted.extensions.pp_datacenter}"      - "common"
>
>
> The most important bit showcased here is that for performance, the 
> *data_hash* backend type is used. Hiera can make thousands of lookup 
> calls per catalog compile, so something like lookup_key can get expensive 
> over an API. data_hash front-loads all the work, returning a batch of data 
> from one operation which is then cached and consulted for the numerous 
> lookups that'll come from automatic parameter lookup.
>
> There's an example of how to do that in 
> https://github.com/uphillian/http_data_hash.
>
> To John's point, I wouldn't hesitate to run your use case by an expert if 
> you have the option.
>
> Cheers,
> ~Reid
>
> On Monday, April 2, 2018 at 7:47:37 AM UTC-7, John Bollinger wrote:
>>
>>
>>
>> On Saturday, March 31, 2018 at 5:59:12 AM UTC-5, nick....@countersight.co 
>> wrote:
>>>
>>> Thanks for your response John, 
>>>
>>> I appreciate you taking a quick look around to see if anyone else has 
>>> already done this. I had come to the same conclusion, that if someone has 
>>> already, they mostly likely haven't shared it. 
>>>
>>> You raise valid points about EL being generally pretty unsuitable as a 
>>> Hiera backend. However, the project I am working on already has an 
>>> Elasticsearch instance running in it, so there would be next to no 
>>> performance overhead for me. It uses a web interface to write out YAML 
>>> files that are fed into a Hiera for a 'puppet apply' run which configures 
>>> various aspects of the system. By using Elastic instead of YAML files, I 
>>> can eliminate some of the issues surrounding concurrent access, it also 
>>> means backups are simplified, as I'd just need to backup ES.
>>>
>>
>>
>> With an ES instance already running, I agree that you have negligible 
>> additional *memory* overhead to consider, but that doesn't do anything 
>> about *performance* overhead.  Nevertheless, the (speculative) 
>> performance impact is not necessarily big; you might well find it entirely 
>> tolerable, especially for the kind of usage you describe.  It will depend 
>> in part on how, exactly, you implement the details.
>>
>>
>>> Is writing a proof-of-concept Hiera backend something that someone with 
>>> reasonable coding skills be able to knock out in a few hours? 
>>>
>>>
>> It depends on what degree of integration you want to achieve.  If you 
>> start with the existing YAML back end, and simply hack it to retrieve its 
>> target YAML objects from ES instead of from the file system, then yes, I 
>> think that could be done in a few hours.  It would mean ES offering up 
>> relatively few, relatively large chunks of YAML, which I am supposing would 
>> be stored as whole objects in the database.  I think that would meet your 
>> concurrency and backup objectives.
>>
>> If you want a deeper integration, such as having your back end performing 
>> individual key lookups in ES, then you might hack up an initial 
>> implementation in a few hours, but I would want a lot longer to test it 
>> out. I would want someone with detailed knowledge of Hiera and its 
>> capabilities to oversee the testing, too, or at least to review it.  Even 
>> more so to whatever extent you have in mind to implement Hiera 
>> prioritization, merging behavior, interpolations, and / or other operations 
>> affecting what data Hiera presents to callers.  If there is an actual 
>> budget for this then I believe Puppet, Inc. offers consulting services, or 
>> I'm sure you could find a third-party consultant if you prefer.
>>
>>
>> John
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/f06d9780-9d86-4986-9714-a10dbb33e1d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[Puppet-dev] Re: Has anyone already developed an Elasticsearch backend to Hiera?

Reply via email to