Re: [Linux-HA] time to fork heartbeat?

David Lang Wed, 11 Aug 2010 17:23:56 -0700

On Thu, 12 Aug 2010, Dejan Muhamedagic wrote:

> On Wed, Aug 11, 2010 at 03:59:34PM -0700, David Lang wrote:
>> On Thu, 12 Aug 2010, Dejan Muhamedagic wrote:
>>
>>> On Wed, Aug 11, 2010 at 02:44:36PM -0700, David Lang wrote:
>> I currently manage over a hundred
>> clusters of machines. with v1 style configs this is easy to integrate into 
>> the
>> other server management tools, if changes had to be done strictly via the crm
>> shell, this is much more complicated.
>
> Why would that be? If you do
>
> crm configure edit
>
> it will take you straight to the editor and the saved changes are
> going to be applied to the cluster. There's also (in v1.1)
>
> crm configure filter
>
> which can be used with say sed. There's also a way to load/save
> configurations from/to regular files.


if I can load/save configs from regular files, and the format of those files is 
documented so that I can edit them (either manually or programatically), that 
is 
roughly equivalent to having the configs in plain text (and at that point I 
start to question why not just use the plain text version :-)

> As for the management, how do you make a node standby now?
> hb_standby, right? How's that different from "crm node standby"?

that's not the type of thing that's a problem.

the type of thing that's a problem is changing the config.

sometimes I do this with vi, sometimes I do this with scripts to build the 
haresources line from scratch, sometimes I use sed on haresources, etc. having 
to make the changes by interacting with a manu/gui is a major step backwards.

I can very much understand how a good menu/gui tool could make it easyier for a 
beginner to get started, or to explore the possibilities, but for widespread 
production use the ability to have plain text files to manipulate is critical.

>>>> This is really starting to sound like we need to fork heartbeat back to the
>>>> 2.x or thereabouts when it could work for simple things easily.
>>>
>>> I can understand the way you feel. But I don't think that there
>>> is a need to maintain the Heartbeat v1 bits separately. With
>>> Heartbeat 3.x you need to install in addition just the
>>> cluster-glue package (perhaps named differently in various
>>> distributions).
>>
>> what would that do? would it let us use v1 style configs where they are
>> suffient?
>
> Yes. I doubt very much that v1 functionality got broken with the
> split.

good, so to use the v1 functionality I need to install heartbeat + cluster-glue 
?

>>>> does anyone have a good handle on where we should start and what bugs have 
>>>> been
>>>> fixed since then (as opposed to new features added, components split out, 
>>>> etc)?
>>>
>>> The mercurial repository is the ultimate source.
>>
>> yes, that is the ultimate source, but it's far more painful to have to start
>> from scratch than if someone who is familar with the codebase can provide a 
>> map.
>
> The heartbeat codebase as well as the libraries (clplumbing),
> i.e. the parts which are relevant to v1, haven't changed much in
> the last few years.

that's what I figured, and the reason I was asking the question.

>>>> I've been watching things get more and more complicated over time, and I
>>>> recognise that to solve complex problems you sometimes need that 
>>>> complexity, but
>>>> there are a LOT of problems that aren't that complex. Heartbeat has been 
>>>> making
>>>> it harder and harder to do simple things, and with the difficulty in 
>>>> figuring
>>>> out what version 3.0.2 is doing that Igor is experiancing, and the 
>>>> inability to
>>>> take a simple config and convert it to the new format, it is sounding like 
>>>> it
>>>> may be time to fork.
>>>
>>> I completely agree that increased complexity is a problem and
>>> particularly in HA solutions. And it is possible to create very
>>> complex configurations with Pacemaker, and at the same time make
>>> it hard (or impossible) for humans to understand what does the
>>> cluster do.
>>
>> and sometimes such complexity is needed, but sometimes it's not.
>
> I'd say that running something one can't understand is at least
> unmaintainable.

but if all I'm doing is the simple stuff, I don't need to understand all the 
complex stuff, I just need to learn the part that I'm using.

>>> However, if you want to run a configuration comparable to v1,
>>> i.e. a simple active-passive or active-active setup, a Pacemaker
>>> cluster is quite manageable.  Right now it has all the tools to
>>> make it much easier to manage than a haresources based cluster.
>>> Once you give it a try, you probably won't look back.
>>
>> the problem is that the learning curve has been made so steep that even 
>> people
>> who are familar with clusters (and earlier versions of heartbeat) have 
>> problems
>> setting up these simple clusters.
>
> I hope that the situation got a bit better recently. One still
> needs quite a bit of time to devote to learn it, but simple
> clusters should really not be a problem anymore.
>
>> the fact that we are on day 2 or 3 of Igor's problem and can't even figure 
>> out
>> what's happening because the logs aren't showing anything is a very bad sign.
>
> Those logs have always been the same.

Could you please take a look at what Igor has been posting and see if you can 
figure out why the logs stop within a minute or so of heartbeat starting 
(before 
it starts/stops any resources) and doesn't log _anything_ for a long time (at 
least 40 min)

the logs are not showing stuff that I (and others who have responded) are used 
to seeing in the 2.x versions that we have deployed, so I assumed that this was 
due to logging changes (I have never used logd, so I didn't know what changes 
it 
had for example)

>> I really don't want to have heartbeat fork, but as the project has grown new
>> features and then split off the resource management stuff, the difficulty in
>> getting the simple things working has been growing.
>>
>> most of us who didn't need that complexity just ignored it as long as the
>> haresources configs continued to work.
>
> And, for the time being, they should work. Don't know what will
> the future bring, didn't notice much interest in supporting that.
> Perhaps somebody from Linbit can comment too.
>
>> at this point it seems like either the haresources configs need to be
>> un-depriciated and supported, or something else. but the current situation is
>> getting unreasonable.
>
> If there are enough shops interested in running v1, then somebody
> will probably support it too.

I think there is, and that's why I started this thread. The ideal result would 
be to not fork, and have the v1 style configs supported in the latest version.

it's not that people want to run the v1 code, but the v1 configs are very 
minimalist, and pretty easy to understand. If that satisfies your needs (which 
it does for a lot of people), going to all the added complication of the newer 
stuff is a lot of cost for very little return.

Yes, there are times when you need to run clusters of more than 2 machines, 
need 
to load balance, need to shift processes around to keep a lot of different 
applications running on one cluster without overloading any one box, etc.

but most people start out with things running on a single machine, and then 
need 
to make that thing HA. going to a 2-machine cluster with simple failover is all 
they (initially) need. It's only after people run such clusters for while do 
they start looking at larger clusters and more complex tasks.

David Lang
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] time to fork heartbeat?

Reply via email to