Luke Kanies <[email protected]> writes:
> On Aug 28, 2010, at 3:44 AM, Daniel Pittman wrote:
>> Luke Kanies <[email protected]> writes:
>>
>> G'day. Having finally gotten free of some crash-priority engineering I have
>> a
>> chance to look this over.
>>
>>> Rein, Paul, and I had a call today discussing whether we should produce a
>>> 1.6 (I said no, unless there are high priority tickets that really need to
>>> be worked on), and then what the design goals of 2.0 should be. I took
>>> notes on our discussion and atempted to produce a doc capturing it all:
>>>
>>> http://projects.puppetlabs.com/projects/facter/wiki/ArchitectureForTwoDotOh
>>>
>>> Comments appreciated.
>>
>> It looks pretty good to me, and the subsequent discussion has clarified some
>> of the bits I was uncertain about when it came to our internal use of the
>> facts.
>>
>> From my PoV one of the big gains would be making adding a new fact more like
>> writing a munin plugin[1] than it currently is, although it is fairly simple
>> and direct right now.
>
> Yeah, I want to make this easier, but especially easier for sysadmins to do
> in a way they're familiar with, which usually means not writing ruby and not
> knowing special Facterisms.
*nod* FWIW, stealing the scoped names below this would work for me:
#!/bin/sh
echo -n "net.rimspace.memtotal="; awk '/^MemTotal:/ { print $2 }'
/proc/meminfo
exit 0
In other words: when invoked emit your facts as "(qualified.)name=value" on
STDOUT, exit 0 for success, exit above for failure.
Allowing JSON output would also be nice, perhaps akin to:
#!/bin/sh
# an unlikely JSON result source, and hand written JSON, yay!
echo "com.puppetlabs.encoding=text/x-json" # future proof!
echo '{ "net.rimspace.memtotal": +inf }'
exit 0
>> The main thing missing from this documentation, and which has bitten us in
>> practice, is a lack of "community standards" for how facts should be
>> presented.
>>
>> For example, we have a "mem_in_mb" fact to work around the human-focused
>> values being returned from the default memory fact, or the difficulty in
>> returning a boolean fact to puppet. (0 and Ruby false are both "true",
>> apparently. :)
>
> Hmm. Yeah, this looks to be missing. What's the best way to fix that?
Given that false.to_s == "false", and 0.to_s == "0", I would suggest that
puppet make the gross assumption that something returning a string matching
/^(true|false)$/ means "puppet false", and matching /^[0-9]+$/ means a puppet
integer, as though they wrote:
$boolean_fact = false # or = true
$integer_fact = 0 # float is left as an exercise for the reader
>> I would also be very happy to see more explicitness than is mentioned here
>> about what sort of data types Facter handles: It sounds like y'all are
>> thinking of something akin to JSON-level "rich" data structures, which I
>> would be very happy with, rather than YAML-with-Ruby-classes "rich" data
>> structures. (...or even plain "any Ruby object is fine" results. :)
>
> Yeah - this is definitely raw data, not ruby objects. We already support
> providing Facter output as a YAML hash, but we're going to support hashes of
> hashes of hashes of arrays of... You get the idea. And it'll all be in
> multiple formats.
*nod* FWIW, in Perl YAML is literally an order of magnitude slower in either
direction than JSON, mostly (I think) in the encoder startup. So, allowing
JSON there would make Perl plugins a universe nicer to write.
>> WRT the point about grouping of facts, and resolution: to my mind, this would
>> be a nice place to use a qualified name, and a search path:
>>
>> com.puppetlabs.memory
>> net.rimspace.memory
>>
>> facter search path: rimspace.net, puppetlabs.com
[...]
> Hmm. I've never thought of this. I've always kind of hated the
> reverse-domain style naming, but it's certainly common, and my opinions have
> often been wrong on this. Anyone else desirous of this?
I don't much like it either, because it is cumbersome. What I would propose
instead would be:
> It wouldn't translate all that well to Puppet variables -
> $com::puppetlabs::ipaddress::eth0?
Given a search path of 'rimspace.net, puppetlabs.com', the transformation
would be:
net.rimspace.example1
net.rimspace.example2
com.puppetlabs.example1
com.puppetlabs.nested.eth0
com.puppetlabs.nested.eth1
$example1 is the 'net.rimspace.example1' value, because we search my namespace
before yours. $example2 is 'net.rimspace.example2', and $nested is a hash:
{ eth0 => com.puppetlabs.nested.eth0, eth1 => ... }
Anything outside the search path wouldn't get shortened:
com.example.whatever => $com => { example => { whatever => $value } } }
...or even where things in the fact search path get imported to variables, but
nothing else does:
# We don't search for com.example.whatever, but we can sure use it!
$alias = $facts["com.example.whatever"] # is this fatal for unset keys?
As long as the result was predictable, rather than too DWIMish then I think
folks would work it out. Some magic around modules having their own facts
in an automatically generated and searched namespace might also help.
> I think it would also make overrides and things a bit complicated - is a
> fact in com.puppetlabs a fact that we shipped the code for, or a fact about
> a Puppet Labs project?
IMO, it shouldn't matter a lick, although nothing prevents an arbitrary
declaration that 'core.' is a reserved namespace or so.
> Should a Solaris IP address have a different path than a Red Hat IP? What
> if you override the default IP fact with a custom one?
I would encourage qualification based on the *author* of the fact, not the
target of it — but I would also suggest that folks already do have this
problem.
($sd_virtual, ugly hack that I wrote to replace the previous ugly hack, I am
looking at you here ;)
> However, I can totally see naming the resolution mechanisms that way - we
> need a unique way of naming and finding resolutions, both for logging and
> testing, and this is probably a good option. That way we can track where
> the code is from (i.e., com.puppetlabs fact resolutions are part of the
> core).
*nod* Obviously the idea needs some polish, but I think it covers the common
use cases reasonably well and sanely, and without introducing too many new
sharp edges for developers to cut themselves on.
[...]
>> FWIW, I would be worried about any fact that was "refresh once per boot": an
>> awful lot of things can change dynamically, including hostname, memory
>> capacity, disk capacity, number of CPUs, and a bunch of other frequently
>> static things about a host.
>>
>> (In fact, much of the engineering we did was to give us more capacity to
>> dynamically change many of those aspects, and part of it done by renaming
>> hosts as we rebuild them on the fly. Gotta love emergencies.)
>
> I have the same worry, but we need some ability to store fact data over time.
*nod*
>> The only other capability that would be interesting would be to be able to
>> dynamically query facts from other nodes: at the moment we use mcollective to
>> query facter facts dynamically on a manual basis, but anything we do manually
>> is usually a pointer to something that we want to automate eventually...
>
> We'll be having a central inventory (fact) store very soon, which will give
> you this, but can you provide more about what you do with it?
Right now? Nothing. This is a "future" bit of work, not a right now bit of
work. I can tell you about the problem I have and why I think this is the
right solution for it, however:
My problem is that I have a whole bunch of virtual machines on the network,
and we are preferentially using virtual to physical services now.
These live in a set of places: two storage servers, plus one execution node
that runs a KVM virtual machine, which then hosts Linux running OpenVZ to
provide the actual instances of the machine.
I want my Nagios monitoring to reflect that set of dependencies so that when a
KVM OpenVZ host, or the hardware under the KVM, goes down I only get the
lowest level alerts, and the rest stay silent.
Using mcollective I can hook into the virtualization system and trigger a
puppet run on the Nagios servers when these configurations change, so I have
that part sorted out.
What I want is for the Nagios host to query a fact about another machine:
Given host foo, find the host that contains it. (none, or bar)
Given host bar, find the host that contains it. (baz)
...and so forth. Ideally no more than three layers deep and all. :)
foo and bar can't tell me which machine they are hosted on, because they don't
have that visibility, but their host can tell me it contains them.
So, what I kind of need is to refresh the facts about the host when the
contained machine is moved, in a way that doesn't depend on prior knowledge
about which machines are involved.
(Theoretically I could trigger a puppet refresh on every OpenVZ host, or every
KVM host, and do it that way, but that ... costs.)
> How concerned are you about security?
Not enormously, because this is not security critical information. If it
makes a security difference that I publish this data to an attacker then
I have already lost, I just don't know it yet.
However, I can see an advantage to an ACL system for this sort of information
because some of it might actually be security critical — information about
memory or CPU use could be a side-channel for inferring crypto keys, for
example.
So, if I was developing this as part of puppet I would look to allow ACL
specification based on at least IP ranges vs regexp / glob fact name matches.
> I've been really hesitant to build a system that allows anyone to read and
> write anyone else's facts.
Write would make we worry enormously, but reading them on demand is quite
attractive to me.
Daniel
--
✣ Daniel Pittman ✉ [email protected] ☎ +61 401 155 707
♽ made with 100 percent post-consumer electrons
--
You received this message because you are subscribed to the Google Groups
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/puppet-dev?hl=en.