Thanks for your tips Reid, especially the bit about "data_hash". I'll be sure to keep that in mind if I end up writing such a backend. Unfortunately there's no budget for this, so would definitely be an 'in-house' job. It's possible that I might be able to use the http_data_hash plugin you mentioned with Elasticsearch as it talks HTTP.
Cheers, Nick On Tuesday, April 3, 2018 at 2:32:25 AM UTC+10, Reid Vandewiele wrote: > > Hey Nick, > > A particular phrase you used caught my attention: "Elasticsearch holds the > Hiera config for a number of nodes." > > There's a lot about putting together the words "elasticsearch" and "hiera > backend" that can sound scary if it's done wrong, but I have seen backends > built to solve the "config for individual nodes" problem in a way that > complements Hiera's default yaml backend system, without noticeably > sacrificing performance, by using a carefully limited number of calls to > the external backend per catalog compile. Most generalized data that > doesn't need to change frequently or programmatically is still stored in > yaml files alongside the code. > > When that's done, the implementing hiera.yaml file may look something like > this: > > hierarchy: > - name: 'Per-node data' > data_hash: elasticsearch_data > uri: 'http://localhost:9200' > path: %{trusted.certname}" > - name: 'Yaml data' data_hash: yaml_data paths: - > "role/%{trusted.extensions.pp_role}" - > "datacenter/%{trusted.extensions.pp_datacenter}" - "common" > > > The most important bit showcased here is that for performance, the > *data_hash* backend type is used. Hiera can make thousands of lookup > calls per catalog compile, so something like lookup_key can get expensive > over an API. data_hash front-loads all the work, returning a batch of data > from one operation which is then cached and consulted for the numerous > lookups that'll come from automatic parameter lookup. > > There's an example of how to do that in > https://github.com/uphillian/http_data_hash. > > To John's point, I wouldn't hesitate to run your use case by an expert if > you have the option. > > Cheers, > ~Reid > > On Monday, April 2, 2018 at 7:47:37 AM UTC-7, John Bollinger wrote: >> >> >> >> On Saturday, March 31, 2018 at 5:59:12 AM UTC-5, [email protected] >> wrote: >>> >>> Thanks for your response John, >>> >>> I appreciate you taking a quick look around to see if anyone else has >>> already done this. I had come to the same conclusion, that if someone has >>> already, they mostly likely haven't shared it. >>> >>> You raise valid points about EL being generally pretty unsuitable as a >>> Hiera backend. However, the project I am working on already has an >>> Elasticsearch instance running in it, so there would be next to no >>> performance overhead for me. It uses a web interface to write out YAML >>> files that are fed into a Hiera for a 'puppet apply' run which configures >>> various aspects of the system. By using Elastic instead of YAML files, I >>> can eliminate some of the issues surrounding concurrent access, it also >>> means backups are simplified, as I'd just need to backup ES. >>> >> >> >> With an ES instance already running, I agree that you have negligible >> additional *memory* overhead to consider, but that doesn't do anything >> about *performance* overhead. Nevertheless, the (speculative) >> performance impact is not necessarily big; you might well find it entirely >> tolerable, especially for the kind of usage you describe. It will depend >> in part on how, exactly, you implement the details. >> >> >>> Is writing a proof-of-concept Hiera backend something that someone with >>> reasonable coding skills be able to knock out in a few hours? >>> >>> >> It depends on what degree of integration you want to achieve. If you >> start with the existing YAML back end, and simply hack it to retrieve its >> target YAML objects from ES instead of from the file system, then yes, I >> think that could be done in a few hours. It would mean ES offering up >> relatively few, relatively large chunks of YAML, which I am supposing would >> be stored as whole objects in the database. I think that would meet your >> concurrency and backup objectives. >> >> If you want a deeper integration, such as having your back end performing >> individual key lookups in ES, then you might hack up an initial >> implementation in a few hours, but I would want a lot longer to test it >> out. I would want someone with detailed knowledge of Hiera and its >> capabilities to oversee the testing, too, or at least to review it. Even >> more so to whatever extent you have in mind to implement Hiera >> prioritization, merging behavior, interpolations, and / or other operations >> affecting what data Hiera presents to callers. If there is an actual >> budget for this then I believe Puppet, Inc. offers consulting services, or >> I'm sure you could find a third-party consultant if you prefer. >> >> >> John >> >> -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/f06d9780-9d86-4986-9714-a10dbb33e1d7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
