Re: [Puppet-dev] Alternative to facts-in-query

Luke Kanies Sat, 28 Nov 2009 10:30:38 -0800

On Nov 26, 2009, at 12:23 PM, Markus Roberts wrote:

>>> I'm thinking that it might be a better idea to solve this problem  
>>> than
>>> to hack around it.  The main solution I'm thinking of is essentially
>>> requiring some kind of shared back-end or requiring a shared cache
>>> such as memcached.
>>
>> My concern here is that we don't want to cache the facts, we want to
>> have them at disposal to several masters. The issue with caching is  
>> that
>> you're never sure the content will be there (that's even the whole  
>> point
>> of cache).
>> So what happens if memcached decides to purge the facts for a given  
>> host
>> and said host asks for a catalog?
>
> Technically, yes, but Luke's broader point stands; there are a number
> of solutions (e.g. memcachedb and tokyo tyrant) that use the memcached
> protocol but are persistent.


Yep.  Additionally, though, there really is a good bit of caching  
going on on the server now, and doing so with memcached makes a lot  
more sense in many cases.

The complication is that most of that caching is a traditional cache  
-- we can get new data if it goes stale -- but the fact information  
isn't really a cache in that sense.

>> What we need is a (more) persistent shared storage for this. And the
>> only one we have at the moment is storeconfigs/thin_storeconfigs.
>
> As Luke noted, memcached (and other system that use the protocol)
> should be quite doable; there are also a slew of other options (such
> as Maglev and the other nosql systems).  This would be a prime
> candidate for plugins.
>
>> Granted those are performance suckers (less of course for
>> thin_storeconfigs), so that might not be usefull for large sites  
>> (which
>> of course needs several masters).
>
> I suspect that the performance issues are resolvable.

Probably, although I'm not convinced it's possible to do so without  
changing technologies (Brice, how's your research into TokyoCabinet et  
al going?).

Really, though, to store the Fact information, the performance won't  
be nearly as big a problem.

>>> A shared cache with memcached should be pretty close to trivial -  
>>> just
>>> another terminus type.  This obviously adds another dependency, but
>>> only in those cases where you 1) have multiple masters, 2) don't  
>>> have
>>> client binding to an individual master, and 3) aren't using some
>>> common back-end (one of which will be available from us with this
>>> information by the next major release).
>
> Having an additional dependency for an optional feature seems quite  
> reasonable.
>
>> * we bend the REST model to POST the facts and get the catalog as a
>> result (ie one transaction like now, but posted)
>
> Sure.  I mean, if you're willing to contort HTTP and pretend it's a
> RPC system (which is what REST is), a little extra bending to make it
> actually work shouldn't be that objectionable.  Are there any "REST
> purest" on this bus, and if so have you thought about how paradoxical
> that is?  If we're all pragmatist here, this may be the simplest/most
> reliable solution.

AFAIK there haven't been any purist arguments in the group, and as you  
say, that would be pretty silly.

My only concern is how much it requires a change to the existing  
architecture or a one-off solution that's painful to maintain over time.

>> * we HTTP pipeline the facts posting and the catalog get in the same
>> stream (not sure LB wouldn't split the request in piece and direct  
>> those
>> to different upstream masters).
>
> I have low confidence in this working, if only based on the number of
> ways I can imagine it going wrong.

I concur.

>> * we ask users to use a LB with an client ip hash load balancing  
>> scheme
>> (ie client are sent always to the same master).
>
> That could work, though it makes rollover more granular and may
> require a more sophisticated LB setup.  If puppetmasters can come on
> and off line without requiring a system-wide hiatus, the LB is going
> to have to be pretty savvy.
>
>> * we implement a master to master protocol (or a ring ala spread or
>> using a message queue/topic). If one client asks for a catalog to  
>> master
>> A, A contacts the other masters for the one having the freshest  
>> facts,
>> compiles and sends back the catalog.
>
> Message bus style, perhaps (they all listen and cache everything), but
> any time you tell your systems to form a committee performance
> plummets--if we assume that each puppetmaster can serve up to k
> clients we've taken it from something that scales O(n) or O(n log n)
> to something that's O(n^2) or worse.  How do you deal with
> slow-responding peers?  Does A keep a copy?  How do you deal with
> server failure (e.g. the machine with the most current copy goes
> down)?

I think this is a complicated solution to what should not be a  
complicated problem.

>> * we don't care and ask users wanting to have multiple masters to  
>> use a
>> shared filesystem (whatever it is) to share the yaml dumped facts.
>
> It could work.  It could also fail due to various race conditions.


This basically says that multiple masters is really complicated and  
you shouldn't do it, which is not where we want to end up.

IMO, the right approach is to have a node manager capable of  
functioning as an inventory server (holdiing all fact/node data), and  
then have the servers query that (with the same kind of caching  
they're doing now).

This gets you essentially everything you need, and all it says is:  If  
you want multimaster, you have to have an inventorying node manager.

Which, conveniently, we're about to put a 0.1 out of this coming  
week.  Well, technically it's just the node manager part, but we'll be  
quickly adding the inventory bits.

-- 
Education is when you read the fine print. Experience is what you get
if you don't. -- Pete Seeger
---------------------------------------------------------------------
Luke Kanies | http://reductivelabs.com | http://madstop.com

--

You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

Re: [Puppet-dev] Alternative to facts-in-query

Reply via email to