Thanks for your tips Reid, especially the bit about "data_hash". I'll be 
sure to keep that in mind if I end up writing such a backend. Unfortunately 
there's no budget for this, so would definitely be an 'in-house' job. It's 
possible that I might be able to use the http_data_hash plugin you 
mentioned with Elasticsearch as it talks HTTP. 

Cheers, 
Nick

On Tuesday, April 3, 2018 at 2:32:25 AM UTC+10, Reid Vandewiele wrote:
>
> Hey Nick,
>
> A particular phrase you used caught my attention: "Elasticsearch holds the 
> Hiera config for a number of nodes."
>
> There's a lot about putting together the words "elasticsearch" and "hiera 
> backend" that can sound scary if it's done wrong, but I have seen backends 
> built to solve the "config for individual nodes" problem in a way that 
> complements Hiera's default yaml backend system, without noticeably 
> sacrificing performance, by using a carefully limited number of calls to 
> the external backend per catalog compile. Most generalized data that 
> doesn't need to change frequently or programmatically is still stored in 
> yaml files alongside the code.
>
> When that's done, the implementing hiera.yaml file may look something like 
> this:
>
> hierarchy:
>   - name: 'Per-node data'
>     data_hash: elasticsearch_data
>     uri: 'http://localhost:9200'
>     path: %{trusted.certname}"  
>   - name: 'Yaml data'    data_hash: yaml_data    paths:      - 
> "role/%{trusted.extensions.pp_role}"      - 
> "datacenter/%{trusted.extensions.pp_datacenter}"      - "common"
>
>
> The most important bit showcased here is that for performance, the 
> *data_hash* backend type is used. Hiera can make thousands of lookup 
> calls per catalog compile, so something like lookup_key can get expensive 
> over an API. data_hash front-loads all the work, returning a batch of data 
> from one operation which is then cached and consulted for the numerous 
> lookups that'll come from automatic parameter lookup.
>
> There's an example of how to do that in 
> https://github.com/uphillian/http_data_hash.
>
> To John's point, I wouldn't hesitate to run your use case by an expert if 
> you have the option.
>
> Cheers,
> ~Reid
>
> On Monday, April 2, 2018 at 7:47:37 AM UTC-7, John Bollinger wrote:
>>
>>
>>
>> On Saturday, March 31, 2018 at 5:59:12 AM UTC-5, [email protected] 
>> wrote:
>>>
>>> Thanks for your response John, 
>>>
>>> I appreciate you taking a quick look around to see if anyone else has 
>>> already done this. I had come to the same conclusion, that if someone has 
>>> already, they mostly likely haven't shared it. 
>>>
>>> You raise valid points about EL being generally pretty unsuitable as a 
>>> Hiera backend. However, the project I am working on already has an 
>>> Elasticsearch instance running in it, so there would be next to no 
>>> performance overhead for me. It uses a web interface to write out YAML 
>>> files that are fed into a Hiera for a 'puppet apply' run which configures 
>>> various aspects of the system. By using Elastic instead of YAML files, I 
>>> can eliminate some of the issues surrounding concurrent access, it also 
>>> means backups are simplified, as I'd just need to backup ES.
>>>
>>
>>
>> With an ES instance already running, I agree that you have negligible 
>> additional *memory* overhead to consider, but that doesn't do anything 
>> about *performance* overhead.  Nevertheless, the (speculative) 
>> performance impact is not necessarily big; you might well find it entirely 
>> tolerable, especially for the kind of usage you describe.  It will depend 
>> in part on how, exactly, you implement the details.
>>
>>
>>> Is writing a proof-of-concept Hiera backend something that someone with 
>>> reasonable coding skills be able to knock out in a few hours? 
>>>
>>>
>> It depends on what degree of integration you want to achieve.  If you 
>> start with the existing YAML back end, and simply hack it to retrieve its 
>> target YAML objects from ES instead of from the file system, then yes, I 
>> think that could be done in a few hours.  It would mean ES offering up 
>> relatively few, relatively large chunks of YAML, which I am supposing would 
>> be stored as whole objects in the database.  I think that would meet your 
>> concurrency and backup objectives.
>>
>> If you want a deeper integration, such as having your back end performing 
>> individual key lookups in ES, then you might hack up an initial 
>> implementation in a few hours, but I would want a lot longer to test it 
>> out. I would want someone with detailed knowledge of Hiera and its 
>> capabilities to oversee the testing, too, or at least to review it.  Even 
>> more so to whatever extent you have in mind to implement Hiera 
>> prioritization, merging behavior, interpolations, and / or other operations 
>> affecting what data Hiera presents to callers.  If there is an actual 
>> budget for this then I believe Puppet, Inc. offers consulting services, or 
>> I'm sure you could find a third-party consultant if you prefer.
>>
>>
>> John
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/f06d9780-9d86-4986-9714-a10dbb33e1d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to