On Thu, 12 Aug 2010, Dejan Muhamedagic wrote:

> On Wed, Aug 11, 2010 at 05:22:56PM -0700, David Lang wrote:
>> On Thu, 12 Aug 2010, Dejan Muhamedagic wrote:
>>
>>> On Wed, Aug 11, 2010 at 03:59:34PM -0700, David Lang wrote:
>>>> On Thu, 12 Aug 2010, Dejan Muhamedagic wrote:
>>>>
>>>>> On Wed, Aug 11, 2010 at 02:44:36PM -0700, David Lang wrote:
>>
>>>>>> I've been watching things get more and more complicated over time, and I
>>>>>> recognise that to solve complex problems you sometimes need that 
>>>>>> complexity, but
>>>>>> there are a LOT of problems that aren't that complex. Heartbeat has been 
>>>>>> making
>>>>>> it harder and harder to do simple things, and with the difficulty in 
>>>>>> figuring
>>>>>> out what version 3.0.2 is doing that Igor is experiancing, and the 
>>>>>> inability to
>>>>>> take a simple config and convert it to the new format, it is sounding 
>>>>>> like it
>>>>>> may be time to fork.
>>>>>
>>>>> I completely agree that increased complexity is a problem and
>>>>> particularly in HA solutions. And it is possible to create very
>>>>> complex configurations with Pacemaker, and at the same time make
>>>>> it hard (or impossible) for humans to understand what does the
>>>>> cluster do.
>>>>
>>>> and sometimes such complexity is needed, but sometimes it's not.
>>>
>>> I'd say that running something one can't understand is at least
>>> unmaintainable.
>>
>> but if all I'm doing is the simple stuff, I don't need to understand all the
>> complex stuff, I just need to learn the part that I'm using.
>
> Well, you said it. I'm not sure what does "complex stuff" exactly
> refer to.

more than two machines, active-active to start with.

the simple haresources config (when you start have box X default to running the 
following resources) covers a LOT of ground, especially if one of those 
resources can be control of a shared drive (either physically shared or logical 
via drbd)

>>>> the fact that we are on day 2 or 3 of Igor's problem and can't even figure 
>>>> out
>>>> what's happening because the logs aren't showing anything is a very bad 
>>>> sign.
>>>
>>> Those logs have always been the same.
>>
>> Could you please take a look at what Igor has been posting and see if you can
>> figure out why the logs stop within a minute or so of heartbeat starting 
>> (before
>> it starts/stops any resources) and doesn't log _anything_ for a long time (at
>> least 40 min)
>>
>> the logs are not showing stuff that I (and others who have responded) are 
>> used
>> to seeing in the 2.x versions that we have deployed, so I assumed that this 
>> was
>> due to logging changes (I have never used logd, so I didn't know what 
>> changes it
>> had for example)
>
> Unfortunately, I forgot almost everything about v1 and can't
> provide any useful input. Don't know what kind of logging is
> missing.

he's running 3.0.x

he has one sample in e-mail where he started heartbeat manually and it did

on the box that auto-failback pointed to

initialization
stop all services
notice that it needed to be active
start all services
received an external kill signal
stop all services
exit

on the other box
initialization
stop all services
received an external kill signal
stop all services
exit


what he's getting normally is

initialization

with nothing else unless one of the boxes shuts down (at which point the other 
takes over, but he hasn't posted logs from that scenerio)

so what _should_ be happening after the first few seconds of startup? when 
initdead expires something _should_ happen, but we don't see anything in the 
logs.

David Lang
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to