Re: [Puppet-dev] Alternative to facts-in-query

Christian Hofstaedtler Mon, 30 Nov 2009 02:32:34 -0800

* Luke Kanies <[email protected]> [091130 08:10]:
> On Nov 29, 2009, at 10:45 AM, Nigel Kersten wrote:
> 
> >
> >
> > On Sun, Nov 29, 2009 at 6:43 AM, Ohad Levy <[email protected]> wrote:
> >
> >
> > On Sun, Nov 29, 2009 at 6:31 AM, Christian Hofstaedtler <[email protected]>  
> > wrote:
> >  load sharing
> > and/or
> >  fault tolerance
> >
> > And both of these things are currently very complicated to do as
> > long as the client has only one hostname to talk to.
> >
> > Why not change this?
> >
> > +1 - I think its acceptable for each client to connect to one server  
> > and keep on using that server for its whole "puppet run".
> >
> > I think it's perfectly reasonable, and would make failover a lot  
> > simpler.
> >
> > Do we have all the information internally to tell when an error  
> > indicates that a server is unavailable?
> 
> Hah, I doubt it.
> 
> > Would we give up and consider it a failure every time we can't find  
> > a puppet:/// file resource? So we'd be changing behavior when  
> > someone typos a puppet URI ? Should the behavior be different if we  
> > time out retrieving rather than not being able to find it at all?
> 
> Urgh, no way.  The connection itself needs to fail, not just have some  
> random exception - probably, anything other than a timeout wouldn't  
> constitute a failure.


> > However, does this really help load balancing?
> >
> > Say one server in a pair is overloaded and timing out on file  
> > resources... do clients simply start their run all over again with  
> > the other server? That seems kind of inefficient.... given that they  
> > may have all progressed quite far into their run.
> 
> With the 'retry' functionality in ruby, the caller never knows of a  
> problem unless none of the servers work.  You pick a new server,  
> reconnect, and keep going.  I can't think of anything that would  
> reasonably restart the whole run itself.

I had a simpler mechanism in mind:

Do the server selection only once per run. See if it "works" or
outright fails, and then stick to this server. ***

If the server fails in the middle of the current run, you get a
failed run, and the next one will (hopefully) work again.

This way you also don't need to worry _that much_ that all manifests
and files are completely in sync across all servers _all the time_ -
which is another problem not easily solved.


*** If there is an "intelligent" node manager in between, the server 
selection is done twice, in this order:

 * client picks an initial server to talk to (FT)
 * client asks this server (which is really the node manager), whom
   to talk to (LB)
 * client sends facts to this server (and sticks to it 'til the 
   end/failure of the run)

> > I'd really like to be able to combine both. Shared state for load  
> > balanced pairs, multiple servers in the client config for failover  
> > and restarting the current run.
> >
> >
> > a simple solution might be to implement a DNS SRV record (e.g. like  
> > LDAP)  which allows the client to decide to which puppetmaster he  
> > would like to connect to.
> > this in time could be enhanced to get the server load etc (so it  
> > could try to use another server or to wait for a while).
> >
> > This is essentially what we're doing now. We have simple monitoring  
> > in place so all our clients can check the load of the puppet server  
> > their DNS view points to, and fall back to an alternate server if  
> > the load is too high.
> >
> >
> > I would be happy not to add any additional depedencies (even though  
> > memcache is acceptable) - a specially a database, e.g. if I have 5  
> > locations where i need HA + load sharing, i don't want to end up  
> > maintaining 5 set of clusters.
> >
> > ++
> >
> > I can't see a way around shared state for efficient load balancing,  
> > but think that being able to provide a list of puppet servers to the  
> > clients would greatly help with failover.

You don't really need shared state for LB, as long as the client
sticks to a server _and_ it does not reconnect to another server
during the same run.


  Christian

-- 
christian hofstaedtler

--

You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

Re: [Puppet-dev] Alternative to facts-in-query

Reply via email to