Lars Marowsky-Bree wrote:
> On 2007-02-22T06:55:37, Alan Robertson <[EMAIL PROTECTED]> wrote:
> 
>> It doesn't mean that AT ALL.  Failover time is normally 90% dominated by
>> resource agent time.  And increasing CPU time in a multi-process,
>> multi-processor situation where networking delays and scheduling delays
>> are typically higher than CPU time means that even the wall-clock delay
>> that isn't due to resource agents isn't probably more than 5% at most.
>> If resource agents are 90% of that time, then the delay is probably more
>> like .5%.
> 
> That much is certainly true.
> 
>> Really?  It has caught dozens of bugs.  Different tools find different
>> bugs.  None are perfect.  Somehow you're saying that we should take
>> weapons for finding bugs out of our arsenal because when Andrew disables
>> them, they don't find any bugs for him.
> 
> Has it caught bugs recently?

Andrew's been writing most of the newer code.  Newer code has more bugs
than older code.  Andrew has it disabled.  What a surprise that it isn't
finding any bugs in his code.

> And no, I'm not saying to rip it out. I'm saying to disable it for
> production shipments, so the question becomes: Has it caught any bugs on
> production systems?
> 
> For CTS runs and stuff, the malloc safeguards are good. One might even
> consider making/leaving them as the default.
> 
> I totally don't see the point of our own allocator, given that glibc's
> one is obviously - that much the numbers clearly show - significantly
> faster, even if the overall impact may be small. So, why maintain it?
> It's pointless by now - it was useful once, but the system libraries
> have become better.
> 
> I don't object to the safeguards code. I object to the allocator. That
> bit no longer makes sense. 
> 
> And, for production systems, even the safeguards are questionable,
> because those bugs have all been caught during debugging.
> 
> I know this is your code, and you're attached to it, but please at least
> address the matter of the safeguards and having our own allocator
> separately.

They are intimately tied together - and probably the cause of the
inefficiency you're complaining about.

Any time you're talking about turning off a safeguard for what is at
best a very small improvement in performance, I don't see the value.  I
like to be able to debug things when they go wrong.

The relevant acronym is RAS:
        Reliability - aided by using this during debugging
        Availability - (improving R improves A)
        Servicability - the ability to debug things in the field

So, the patches help all three letters of RAS - for a small performance
penalty.

Let's see what the web site says:
The basic goal of the High Availability Linux project is to:

    *Provide a high-availability (clustering) solution for Linux
     which promotes reliability, availability, and serviceability
     (RAS) through a community development effort.

So, the goal listed on the web site seems to indicate that RAS is
important.  Much more important than a small performance hit.

If Linux-HA consumed lots of CPU, and the first paragraph of the web
site said "enhance performance" or something similar, I'd certainly
agree.  This isn't about it being my code.  It's about something much
simpler - the reason the project exists.

It's about the right perspective for the task at hand.


-- 
    Alan Robertson <[EMAIL PROTECTED]>

"Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to