RE: [Full-Disclosure] Windoze almost managed to 200x repeat 9/11

joe Fri, 24 Sep 2004 09:54:28 -0700

I read that article differently than you. 

It seems you read it that a system backup (i.e. something backing up data)
failed. I read that an operator didn't reboot the system and the software
designed to catch that and handle it failed.

"An improperly trained employee failed to reset the system, leading it to
shut down without warning, the official said. Backup systems failed because
of a software failure,"

Note "Backup systemS". I.E. Operator was first line of defense, an automated
system was the backup and it didn't fire. Probably due to failure to test
it. 

> The article implied (though didn't outright state it) that the Unix
systems did not include regular reboots.

That is stretching I think what they wrote, but it is probably accurate
though several large companies I know do UNIX reboots every Sunday maint
window right along with Windows reboots. Anyway, what your statement of
implication implies to me is the vendor knew how to code UNIX apps and
didn't know how to code Windows apps.

I think you are absolutely incorrect on why the reboot was needed. It wasn't
to clear memory, it was to reset the system counter so that gettickcount
doesn't overflow the DWORD.

  joe

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Barry
Fitzgerald
Sent: Friday, September 24, 2004 11:15 AM
To: Frank Knobbe
Cc: [EMAIL PROTECTED]
Subject: Re: [Full-Disclosure] Windoze almost managed to 200x repeat 9/11

Frank Knobbe wrote:

>On Fri, 2004-09-24 at 09:15, Barry Fitzgerald wrote:
>  
>
>>The article doesn't make the situation entirely clear.  Did the app 
>>intentionally restart the system and foul it?  Did the restart occur 
>>because the app crashed?
>>    
>>
>
>No, no, the problem was "human error" because a tech didn't reboot the 
>system. It's clearly operator error, not a problem with any systems at 
>all.
>
>  
>
I disagree - if the system were engineered properly, a reboot would not be
necessary to keep the system from falling on it's face.

The article implied (though didn't outright state it) that the Unix systems
did not include regular reboots.  I don't know enough about the engineering
of the system to state whether this was caused by the app, the OS, or some
dependancy issue.

But, in a critical system of this nature, relying on scheduled reboots for
operation sends a signal to me that there's a problem in the system.

>Unfortunately, there is some truth in this. We (and not just the media) 
>are starting to put blame on humans far too quickly. Is this justified?
>On one hand, they are only tools for us to do our job. On the other 
>hand, they are products that we should be able to rely on. Who do we 
>blame? Operators or products?
>
>
>  
>
That depends on the situation.  If a system can be engineered to operate
properly on it's own, then it should be.  All else is operator error.  I
think it most depends on the rationality of the automated requirement.

If the backup fails because said user forgets to change the backup tapes,
then the problem is human error.
If the backup fails because said product doesn't properly flush its buffers
and sends all data to /dev/null, then the issue is software error, even if
it's a known condition that has had procedure put in place to work around
it.  The argument for automation is rational and supposed to be in the
system, and thus it's an error in the engineering.

The second scenario is similar to what we had here.  All a reboot does is
ensure that the memory has been cleared.  If their developers don't know how
to do this in code, or if they choose OS' that can't reliably do this, then
either fire the developers and/or the decision makers, because they didn't
do their jobs and people could have died because of that. 

             -Barry

_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.netsys.com/full-disclosure-charter.html

_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.netsys.com/full-disclosure-charter.html

RE: [Full-Disclosure] Windoze almost managed to 200x repeat 9/11

Reply via email to