Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-09 Thread Valeri Galtsev

On Tue, September 9, 2014 9:33 am, Mark Tinberg wrote:
>
> On Sep 8, 2014, at 10:25 AM, Valeri Galtsev 
> wrote:
>
>> Mark Tinberg wrote:
>>>
 A lack of updates can also mean that there is a lack of effort or
>>> competence
 is tracking down and fixing bugs, or not a large enough customer base
 with
 the same bugs to generate sufficient, actionable, bug reports, it is
 not
 necessarily or even primarily a signal of quality.
>>
>> You may be right. But in many cases you may be wrong. I'm stealing
>> someone's else example (Hm.., maybe about 5-7 years old): ATI releases
>> driver for their boards as rarely as every 6 Months. Which confirms
>> careful work on debugging each released one. NVIDIA to the contrary
>> releases drives as often as every other Month, so they don't seem to put
>> enough effort into debugging each of them. Indeed, they are buggy in my
>> experience. You, the customer, do at least part of their job: by
>> discovering and reporting bugs ("artefacts" etc).
>
> While that is an interesting point I think that graphics drivers and
> firmware are sufficiently different in development practices that you may
> not be able to generalize from one to the other, graphics drivers are
> about cutting edge software features and performance, firmware is about
> long term stability and low level hardware details.
>

Naturally... I never said mine is not a layman's opinion ;-) Hence
layman's comparison...

Valeri


Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-09 Thread Valeri Galtsev

On Tue, September 9, 2014 9:37 am, Mark Tinberg wrote:
>
> On Sep 8, 2014, at 11:57 AM, Valeri Galtsev 
> wrote:
>
>>
>> On Mon, September 8, 2014 9:19 am, Mark Tinberg wrote:
>>>
>>> On Sep 6, 2014, at 3:42 PM, Valeri Galtsev 
>>> wrote:
>>>
 On Sat, September 6, 2014 2:27 pm, Jonathan Billings wrote:
>
> I choose vendors that make it relatively painless to apply the
> firmware
> updates under Linux.

 This is only so for either very rich, who can afford to have stand by
 hardware to replace bricked by flashing box, or very happy to the
 level
 they don't care that the box will not come back up in next 5 min. I am
 definitely neither of two…
>>>
>>> I’ve used mostly Dell and have done a thousand firmware updates in my
>>> time
>>> and I’ve never seen a piece of hardware bricked, their update system
>>> takes
>>> all due precaution, so the problem just isn’t as dire as you make it
>>> out
>>> to be, even anecdotally it is statistically improbable, either I am a
>>> massive outlier or you are way overestimating.
>>
>> Certainly the last: it is me who is scared to take 1:1 chance.
>> Speaking of Dell: we use only lowest end of their boxes (think 32 GB
>> quad
>> core CPU _Desktop_ today) which are en par with others price wise, yet
>> very reliable. Never had to flash BIOS on these, and never had one
>> failed
>> because of me not having BIOS updates flashed routinely... As far as
>> servers are concerned, these are tyan mostly. I do not re-flash their
>> BIOS
>> routinely. (And I doubt they release tons of BIOS upgrades, at the very
>> most one per board during its lifetime which never sounded "do it or
>> your
>> box is dead tomorrow".) So, I maybe flashed BIOS 3-4 times per maybe 200
>> machines... Never had box bricked due to flashing. (Still...) And never
>> had failure due to not doing "preventive" re-flashing. But after all:
>> maybe I'm just extremely lucky ;-) and at the same time awfully scared
>> (to
>> fix something that shouldn't be broken IMHO).
>>
>
> My sense that the bugs which are being fixed by the server firmware
> vendors updates are very very rare but they have big enough customers who
> demand fixes to spend the engineering effort in finding and fixing these
> subtle issues whereas the whitebox vendors don’t sell enough of a single
> model and don’t have the high-touch relationship with customers to really
> care, there are probably just as many subtle bugs in their designs but
> they are focused on getting the next motherboard manufactured, not fixing
> problems with last years model that they don’t sell anymore.
>

Maybe. Still I wouldn't place Tyan into "small volume" manufacturers. My
_feeling_ is that they work more on BIOS debugging, and they do not make
tremendous changes in BIOS from one model to its successor, only the
necessary ones. Not experimenting. Thus less chance to introduce new bugs.
After all, they are in a [small] server boards business forever. But that
is just my _feeling_ from my experience dealing with variety (their and
others) hardware. FWIW.

I'm really happy to see we are on the same page about (virtual absence of)
vital flaws necessary to be fixes yesterday in decent server boards...

Valeri


Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-09 Thread Mark Tinberg

On Sep 8, 2014, at 11:57 AM, Valeri Galtsev  wrote:

> 
> On Mon, September 8, 2014 9:19 am, Mark Tinberg wrote:
>> 
>> On Sep 6, 2014, at 3:42 PM, Valeri Galtsev 
>> wrote:
>> 
>>> On Sat, September 6, 2014 2:27 pm, Jonathan Billings wrote:
 
 I choose vendors that make it relatively painless to apply the firmware
 updates under Linux.
>>> 
>>> This is only so for either very rich, who can afford to have stand by
>>> hardware to replace bricked by flashing box, or very happy to the level
>>> they don't care that the box will not come back up in next 5 min. I am
>>> definitely neither of two…
>> 
>> I’ve used mostly Dell and have done a thousand firmware updates in my time
>> and I’ve never seen a piece of hardware bricked, their update system takes
>> all due precaution, so the problem just isn’t as dire as you make it out
>> to be, even anecdotally it is statistically improbable, either I am a
>> massive outlier or you are way overestimating.
> 
> Certainly the last: it is me who is scared to take 1:1 chance.
> Speaking of Dell: we use only lowest end of their boxes (think 32 GB quad
> core CPU _Desktop_ today) which are en par with others price wise, yet
> very reliable. Never had to flash BIOS on these, and never had one failed
> because of me not having BIOS updates flashed routinely... As far as
> servers are concerned, these are tyan mostly. I do not re-flash their BIOS
> routinely. (And I doubt they release tons of BIOS upgrades, at the very
> most one per board during its lifetime which never sounded "do it or your
> box is dead tomorrow".) So, I maybe flashed BIOS 3-4 times per maybe 200
> machines... Never had box bricked due to flashing. (Still...) And never
> had failure due to not doing "preventive" re-flashing. But after all:
> maybe I'm just extremely lucky ;-) and at the same time awfully scared (to
> fix something that shouldn't be broken IMHO).
> 

My sense that the bugs which are being fixed by the server firmware vendors 
updates are very very rare but they have big enough customers who demand fixes 
to spend the engineering effort in finding and fixing these subtle issues 
whereas the whitebox vendors don’t sell enough of a single model and don’t have 
the high-touch relationship with customers to really care, there are probably 
just as many subtle bugs in their designs but they are focused on getting the 
next motherboard manufactured, not fixing problems with last years model that 
they don’t sell anymore.

— 
Mark Tinberg
mtinb...@wisc.edu

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-09 Thread Mark Tinberg

On Sep 8, 2014, at 10:25 AM, Valeri Galtsev  wrote:

> Mark Tinberg wrote:
>> 
>>> A lack of updates can also mean that there is a lack of effort or
>> competence
>>> is tracking down and fixing bugs, or not a large enough customer base
>>> with
>>> the same bugs to generate sufficient, actionable, bug reports, it is not
>>> necessarily or even primarily a signal of quality.
> 
> You may be right. But in many cases you may be wrong. I'm stealing
> someone's else example (Hm.., maybe about 5-7 years old): ATI releases
> driver for their boards as rarely as every 6 Months. Which confirms
> careful work on debugging each released one. NVIDIA to the contrary
> releases drives as often as every other Month, so they don't seem to put
> enough effort into debugging each of them. Indeed, they are buggy in my
> experience. You, the customer, do at least part of their job: by
> discovering and reporting bugs ("artefacts" etc).

While that is an interesting point I think that graphics drivers and firmware 
are sufficiently different in development practices that you may not be able to 
generalize from one to the other, graphics drivers are about cutting edge 
software features and performance, firmware is about long term stability and 
low level hardware details.  

— 
Mark Tinberg
mtinb...@wisc.edu

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-08 Thread Valeri Galtsev

On Mon, September 8, 2014 2:45 pm, Keith Keller wrote:
> On 2014-09-08, Valeri Galtsev  wrote:
>>
>> I gave on the SiperMicro quite a while ago. Not because of BIOS, but
>> because of hardware engineering flaws. Which at least manifests itself
>> with system boards for AMD CPUs. These (AMD) boards work reliably for
>> only
>> 2-4 years, after that they die. Not all of them, but about 50% of
>> SuperMicro AMD server and mostly workstation boards (I have no
>> experience
>> with their low end desktop boards if they exist) are dead after 3-4
>> years
>
> Huh.  I have a bunch of SuperMicro boards with AMD CPUs, and have had
> only one die completely, and that was a DOA that I returned before
> putting into production.
>
> Are you saying dead-dead, like completely unusable, or sorta dead, where
> you get spurious and unexplained errors?

It begins with random occasional errors, and ends up totally dead in a
course of couple of weeks to couple of Months. You pull CPUs and RAM from
this dead one stick into another (I'm tempted to say "tyan this time" ;-),
and these work. At this point you can't flash BIOS - not in house. My
hunch is: this is engineering flaw, it looks like the board topology isn't
too good around one of the CPU sockets, so it's marginally works (without
much reserve) while system board is new, then with slight gradual
degradation of components... Maybe the ripple on the leads is below but
close to tolerable. Or capacitances and inductances [of the board leads]
involved are such. I can't offer [much] more detail on what I observed,
it's been some time since I banged my head around that. And you can
imagine how happy I was to forget about it after I gave up on them.
Anyway, there are still SiperMicro boards in our stalls which are still
kicking. So I'm not saying all of them, I just don't care to learn on my
hide which are and which are not.

Oh, BTW, all electrolytic capacitors on these strangely died SuperMicro
boards are OK. All of us have seen those around CPUs on dead system boards
(mostly manufactured during some period of time) mostly bulging and leaked
out - not in case of these strangely died boards. Some of capacitors can
loose capacitance to some extent without showing signs of anything, but
good engineering usually takes that into account, and uses to necessary
extent larger ones, so that doesn't even comes close to margin during
equipment life.

Valeri


Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-08 Thread m . roth
Keith Keller wrote:
> On 2014-09-08, Valeri Galtsev  wrote:
>>
>> I gave on the SiperMicro quite a while ago. Not because of BIOS, but
>> because of hardware engineering flaws. Which at least manifests itself
>> with system boards for AMD CPUs. These (AMD) boards work reliably for
>> only 2-4 years, after that they die. Not all of them, but about 50% of
>> SuperMicro AMD server and mostly workstation boards (I have no
>> experience with their low end desktop boards if they exist) are dead
after 3-4
>> years
>
> Huh.  I have a bunch of SuperMicro boards with AMD CPUs, and have had
> only one die completely, and that was a DOA that I returned before
> putting into production.
>
> Are you saying dead-dead, like completely unusable, or sorta dead, where
> you get spurious and unexplained errors?
>
I know I've ranted before, but on Penguin's high-end compute rack mount
servers, with, I think, an H8QG m/b, running 64-cores, we've gotten some
where a heavy compute user process has crashed the system, and we even
canned a script and put that on a fresh, basic CentOS install, and sent it
back, and Penguin's replaced m/b's. Several of them. Some more than once,
honestly.

And then there's the engineering, where for two of the DIMMs, I need to
unplug the main connector from the PSU, because that's the only way to
pull the ears down far enough to remove the DIMMS.

We won't mention all the DIMMs that they've replaced (he says, meaning to
call them about one system *again*)

  mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-08 Thread Keith Keller
On 2014-09-08, Valeri Galtsev  wrote:
>
> I gave on the SiperMicro quite a while ago. Not because of BIOS, but
> because of hardware engineering flaws. Which at least manifests itself
> with system boards for AMD CPUs. These (AMD) boards work reliably for only
> 2-4 years, after that they die. Not all of them, but about 50% of
> SuperMicro AMD server and mostly workstation boards (I have no experience
> with their low end desktop boards if they exist) are dead after 3-4 years

Huh.  I have a bunch of SuperMicro boards with AMD CPUs, and have had
only one die completely, and that was a DOA that I returned before
putting into production.

Are you saying dead-dead, like completely unusable, or sorta dead, where
you get spurious and unexplained errors?

--keith

-- 
kkel...@wombat.san-francisco.ca.us


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-08 Thread Valeri Galtsev

On Mon, September 8, 2014 9:19 am, Mark Tinberg wrote:
>
> On Sep 6, 2014, at 3:42 PM, Valeri Galtsev 
> wrote:
>
>> On Sat, September 6, 2014 2:27 pm, Jonathan Billings wrote:
>>>
>>> I choose vendors that make it relatively painless to apply the firmware
>>> updates under Linux.
>>
>> This is only so for either very rich, who can afford to have stand by
>> hardware to replace bricked by flashing box, or very happy to the level
>> they don't care that the box will not come back up in next 5 min. I am
>> definitely neither of two…
>
> I’ve used mostly Dell and have done a thousand firmware updates in my time
> and I’ve never seen a piece of hardware bricked, their update system takes
> all due precaution, so the problem just isn’t as dire as you make it out
> to be, even anecdotally it is statistically improbable, either I am a
> massive outlier or you are way overestimating.

Certainly the last: it is me who is scared to take 1:1 chance.
Speaking of Dell: we use only lowest end of their boxes (think 32 GB quad
core CPU _Desktop_ today) which are en par with others price wise, yet
very reliable. Never had to flash BIOS on these, and never had one failed
because of me not having BIOS updates flashed routinely... As far as
servers are concerned, these are tyan mostly. I do not re-flash their BIOS
routinely. (And I doubt they release tons of BIOS upgrades, at the very
most one per board during its lifetime which never sounded "do it or your
box is dead tomorrow".) So, I maybe flashed BIOS 3-4 times per maybe 200
machines... Never had box bricked due to flashing. (Still...) And never
had failure due to not doing "preventive" re-flashing. But after all:
maybe I'm just extremely lucky ;-) and at the same time awfully scared (to
fix something that shouldn't be broken IMHO).

Valeri


Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-08 Thread Valeri Galtsev

On Mon, September 8, 2014 9:48 am, m.r...@5-cent.us wrote:
> Mark Tinberg wrote:
>>
>>> On Sat, Sep 06, 2014 at 09:46:36AM -0500, Valeri Galtsev wrote:
 But that is exactly what I said: if the hardware was released and sold
 with this piece of crap BIOS, then you shouldn't be buying that junk
 in
 the first place. Or at least stop buying the crap made by _this_
 manufacturer in a future. I'm still not convinced. Any better reasons?
>>
>>> In my experience, all code has bugs.  Instead of trying to find some
>>> vendor that has magically released hardware with bug-free firmware, I
>>> choose vendors that make it relatively painless to apply the firmware
>>> updates under Linux.
>
>> A lack of updates can also mean that there is a lack of effort or
> competence
>> is tracking down and fixing bugs, or not a large enough customer base
>> with
>> the same bugs to generate sufficient, actionable, bug reports, it is not
>> necessarily or even primarily a signal of quality.

You may be right. But in many cases you may be wrong. I'm stealing
someone's else example (Hm.., maybe about 5-7 years old): ATI releases
driver for their boards as rarely as every 6 Months. Which confirms
careful work on debugging each released one. NVIDIA to the contrary
releases drives as often as every other Month, so they don't seem to put
enough effort into debugging each of them. Indeed, they are buggy in my
experience. You, the customer, do at least part of their job: by
discovering and reporting bugs ("artefacts" etc).

>
> I might also point out that there are really *not* a lot of BIOS
> manufacturers. AMI, and - is Phoenix still doing them? - and Dell claims
> it's got its own, but who knows what they've rebranded. Once you consider
> that, then you need to consider the board maker. Some seem to do a lot
> better job of qa/qc than others. For example, some folks here like
> Supermicro, where I *REALLY* don't - many of our Penguins, which are
> rebranded SuperMicro, have a *lot* of issues with the m/b.

I gave on the SiperMicro quite a while ago. Not because of BIOS, but
because of hardware engineering flaws. Which at least manifests itself
with system boards for AMD CPUs. These (AMD) boards work reliably for only
2-4 years, after that they die. Not all of them, but about 50% of
SuperMicro AMD server and mostly workstation boards (I have no experience
with their low end desktop boards if they exist) are dead after 3-4 years
- just my experience. Nothing like that with tyan boards; I may have seen
one out of 50 or 70 tyan boards died (which event I don't even care to
recollect) the rest keep working for 10+ years (during which time the box
is re-purposed twice, as I can not throw away something that still works,
so I have to find new use for now weaker machine). For Desktops we use
Dell, being same cheap on lower end as others they proved reliable for us.
And as somebody mentioned it can be any brand inside with Dell sticker on
top. One of my Linux friends who complains that they change chipsets
almost on a daily basis was calling it (not them, but what they do I
guess) D'hell ;-)

Valeri

>
>  mark
>
> mark
>
> ___
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>



Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-08 Thread Keith Keller
On 2014-09-08, Mark Tinberg  wrote:
>
> On Sep 7, 2014, at 1:35 AM, Keith Keller  
> wrote:
>
> This is why I would say that firmware updates are part of the preventative 
> maintenance in the same way kernel updates are, if the bug was already fixed 
> and if you had flashed this during a normal maintenance window you never 
> would have had an unplanned maintenance to fix or recover from the problem.  
> Manufacturers don?t tend to update firmware without real bug reports from the 
> field, why wait until you?ve had a failure due to some already-fixed corner 
> case.

I do actually read the release notes (sporadically) for my controller
firmware releases, at least (usually not BIOS, I admit).  If I think I
might hit one of their bug fixes then I will schedule an update.

--keith

-- 
kkel...@wombat.san-francisco.ca.us


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-08 Thread m . roth
Mark Tinberg wrote:
>
>> On Sat, Sep 06, 2014 at 09:46:36AM -0500, Valeri Galtsev wrote:
>>> But that is exactly what I said: if the hardware was released and sold
>>> with this piece of crap BIOS, then you shouldn't be buying that junk in
>>> the first place. Or at least stop buying the crap made by _this_
>>> manufacturer in a future. I'm still not convinced. Any better reasons?
>
>> In my experience, all code has bugs.  Instead of trying to find some
>> vendor that has magically released hardware with bug-free firmware, I
>> choose vendors that make it relatively painless to apply the firmware
>> updates under Linux.

> A lack of updates can also mean that there is a lack of effort or
competence
> is tracking down and fixing bugs, or not a large enough customer base with
> the same bugs to generate sufficient, actionable, bug reports, it is not
> necessarily or even primarily a signal of quality.

I might also point out that there are really *not* a lot of BIOS
manufacturers. AMI, and - is Phoenix still doing them? - and Dell claims
it's got its own, but who knows what they've rebranded. Once you consider
that, then you need to consider the board maker. Some seem to do a lot
better job of qa/qc than others. For example, some folks here like
Supermicro, where I *REALLY* don't - many of our Penguins, which are
rebranded SuperMicro, have a *lot* of issues with the m/b.

 mark

mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-08 Thread Mark Tinberg

On Sep 7, 2014, at 1:35 AM, Keith Keller  
wrote:

> On 2014-09-06, Valeri Galtsev  wrote:
>> 
>> ... I've mentined manufacturers in another reply: tyan, lsi, 3ware, ati...
> 
> Even 3ware has had buggy firmwares.  I once had to flash a 3ware card
> years into production because it was not until then that this particular
> bug was exposed by my configuration.

This is why I would say that firmware updates are part of the preventative 
maintenance in the same way kernel updates are, if the bug was already fixed 
and if you had flashed this during a normal maintenance window you never would 
have had an unplanned maintenance to fix or recover from the problem.  
Manufacturers don’t tend to update firmware without real bug reports from the 
field, why wait until you’ve had a failure due to some already-fixed corner 
case.

— 
Mark Tinberg
mtinb...@wisc.edu

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-08 Thread Mark Tinberg

On Sep 6, 2014, at 3:42 PM, Valeri Galtsev  wrote:

> On Sat, September 6, 2014 2:27 pm, Jonathan Billings wrote:
>> 
>> I choose vendors that make it relatively painless to apply the firmware
>> updates under Linux.
> 
> This is only so for either very rich, who can afford to have stand by
> hardware to replace bricked by flashing box, or very happy to the level
> they don't care that the box will not come back up in next 5 min. I am
> definitely neither of two…

I’ve used mostly Dell and have done a thousand firmware updates in my time and 
I’ve never seen a piece of hardware bricked, their update system takes all due 
precaution, so the problem just isn’t as dire as you make it out to be, even 
anecdotally it is statistically improbable, either I am a massive outlier or 
you are way overestimating.

— 
Mark Tinberg
mtinb...@wisc.edu

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-08 Thread Mark Tinberg

On Sep 6, 2014, at 2:27 PM, Jonathan Billings  wrote:

> On Sat, Sep 06, 2014 at 09:46:36AM -0500, Valeri Galtsev wrote:
>> But that is exactly what I said: if the hardware was released and sold
>> with this piece of crap BIOS, then you shouldn't be buying that junk in
>> the first place. Or at least stop buying the crap made by _this_
>> manufacturer in a future. I'm still not convinced. Any better reasons?
> 
> In my experience, all code has bugs.  Instead of trying to find some
> vendor that has magically released hardware with bug-free firmware, I
> choose vendors that make it relatively painless to apply the firmware
> updates under Linux.

A lack of updates can also mean that there is a lack of effort or competence is 
tracking down and fixing bugs, or not a large enough customer base with the 
same bugs to generate sufficient, actionable, bug reports, it is not 
necessarily or even primarily a signal of quality.

There is little churn in firmware updates, changes for changes sake, pretty 
much every published update fixes some real corner case that someone ran into, 
and I’d rather fix it on my system before running into it in production, there 
are fewer stupider feelings than having something go sideways due a bug that 
someone already fixed for you.

— 
Mark Tinberg
mtinb...@wisc.edu

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-07 Thread Valeri Galtsev

On Sun, September 7, 2014 8:55 pm, Keith Keller wrote:
> On 2014-09-07, Valeri Galtsev  wrote:
>>
>> I guess after that I should declare myself to be lucky. None out of more
>> than a couple of dozens of 3ware cards ever did harm for me. I did once
>> had one of them fried (my clumsiness most likely), which then just
>> didn't
>> come up (3ware just replaced card without a question asked). Could yours
>> be _slightly_ fried?
>
> The first card could have been slightly fried; it came back up after a
> reboot, and would kernel panic again within a few days.  Since I had
> what I thought was a good second card I never bothered to test the first
> one thoroughly.  After the second card ate the array I bailed on the old
> cards completely; had the 9650 been easy to obtain I would have, but it
> was pretty much EOL by then.
>
> The 9650 that died last month refused to be recognized on cold boot, so I
> think it's totally gone.  It's old enough that it's not worth my time
> trying to figure out whether it's revivable.
>

Indeed, lucky me. As of this moment I have 6 of 9650 in production boxes.
For at least 6 years. During which time none of them ever failed on me
(including any trouble with arrays). Knocking on wood. I must say though
that I do prefer the most reliable drives. And I always have arrays
checked at least once a week through 3ware scheduler (this causes walk
through the whole surface of each of drives, thus ensuring bad blocks if
any do not stay undiscovered...).

Valeri


Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-07 Thread Keith Keller
On 2014-09-07, Valeri Galtsev  wrote:
>
> I guess after that I should declare myself to be lucky. None out of more
> than a couple of dozens of 3ware cards ever did harm for me. I did once
> had one of them fried (my clumsiness most likely), which then just didn't
> come up (3ware just replaced card without a question asked). Could yours
> be _slightly_ fried?

The first card could have been slightly fried; it came back up after a
reboot, and would kernel panic again within a few days.  Since I had
what I thought was a good second card I never bothered to test the first
one thoroughly.  After the second card ate the array I bailed on the old
cards completely; had the 9650 been easy to obtain I would have, but it
was pretty much EOL by then.

The 9650 that died last month refused to be recognized on cold boot, so I
think it's totally gone.  It's old enough that it's not worth my time
trying to figure out whether it's revivable.

--keith


-- 
kkel...@wombat.san-francisco.ca.us


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-07 Thread Valeri Galtsev

On Sun, September 7, 2014 1:04 pm, Keith Keller wrote:
> On 2014-09-07, Valeri Galtsev  wrote:
>>
>> It doesn't sound like you are flashing all 3ware cards you have in
>> production every time new firmware release it out. It doesn't sound
>> either
>> like you had fatal failure of production box because of bug in 3ware
>> firmware. Correct me if I'm wrong, otherwise I see you on the same page
>> with me: i.e. not flashing new firmware as a part of "routine update" of
>> production machine (together with system/software updates).
>
> Well, I think we are on the same page now.  I think I (and some other
> folks) interpreted your posts as "if you have to flash the firmware, it
> was a crappy firmware, and you should switch vendors" which (as someone
> else noted) would soon leave you with no vendors.

Great... and my fault, I'm often a bit extreme in expressions ;-(

>
> To summarize, I think our page says "update the firmware only when
> necessary on production-level hardware".

Yes. Of which during last one and a half decades I had none.

>
> FWIW, I did have a different 3ware card eat its array, though I do
> suspect some user (i.e., me) error.  I had a 9650 card which was having
> problems with kernel panics.  I suspected a hardware failure, so I moved
> the array to another 9650 in the same box, which may not have had a BBU.
> Unfortunately that card showed worse problems a few weeks later: not
> only did it kernel panic, but it also trashed the array pretty much
> completely.  (Of course I had backups, and this was a dev box, not
> public-facing, but it was still frustrating.)  At the time the 9650 was
> old enough that the 9750 series was out, and that card has been fairly
> solid.  (Also FWIW, my last 9650 card had the same issue a few weeks
> ago; fortunately it did not eat its array.)

I guess after that I should declare myself to be lucky. None out of more
than a couple of dozens of 3ware cards ever did harm for me. I did once
had one of them fried (my clumsiness most likely), which then just didn't
come up (3ware just replaced card without a question asked). Could yours
be _slightly_ fried? If its internal RAM controller chip that is slightly
fried (if you overheat it extremely it may become less high frequency due
to impurity diffusion in the chip messing up profile - I've seen things
like that, not in 3ware though) - then the card's internal computer (doing
RAID function) will produce total garbage occasionally thus potentially
causing anything. And kernel panics with that card would be likely
sometimes, as it will occasionally talk gibberish back to the kernel. Just
a shot in the dark.

Valeri

>
> So to add a page to our book, "always have backups even if you trust
> your hardware!"  :)
>
> --keith
>
> --
> kkel...@wombat.san-francisco.ca.us
>
>
> ___
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>



Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-07 Thread Keith Keller
On 2014-09-07, Valeri Galtsev  wrote:
>
> It doesn't sound like you are flashing all 3ware cards you have in
> production every time new firmware release it out. It doesn't sound either
> like you had fatal failure of production box because of bug in 3ware
> firmware. Correct me if I'm wrong, otherwise I see you on the same page
> with me: i.e. not flashing new firmware as a part of "routine update" of
> production machine (together with system/software updates).

Well, I think we are on the same page now.  I think I (and some other
folks) interpreted your posts as "if you have to flash the firmware, it
was a crappy firmware, and you should switch vendors" which (as someone
else noted) would soon leave you with no vendors.

To summarize, I think our page says "update the firmware only when
necessary on production-level hardware".

FWIW, I did have a different 3ware card eat its array, though I do
suspect some user (i.e., me) error.  I had a 9650 card which was having
problems with kernel panics.  I suspected a hardware failure, so I moved
the array to another 9650 in the same box, which may not have had a BBU.
Unfortunately that card showed worse problems a few weeks later: not
only did it kernel panic, but it also trashed the array pretty much
completely.  (Of course I had backups, and this was a dev box, not
public-facing, but it was still frustrating.)  At the time the 9650 was
old enough that the 9750 series was out, and that card has been fairly
solid.  (Also FWIW, my last 9650 card had the same issue a few weeks
ago; fortunately it did not eat its array.)

So to add a page to our book, "always have backups even if you trust
your hardware!"  :)

--keith

-- 
kkel...@wombat.san-francisco.ca.us


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-07 Thread Valeri Galtsev

On Sun, September 7, 2014 1:35 am, Keith Keller wrote:
> On 2014-09-06, Valeri Galtsev  wrote:
>>
>> ... I've mentined manufacturers in another reply: tyan, lsi, 3ware,
>> ati...
>
> Even 3ware has had buggy firmwares.  I once had to flash a 3ware card
> years into production because it was not until then that this particular
> bug was exposed by my configuration.

It doesn't sound like you are flashing all 3ware cards you have in
production every time new firmware release it out. It doesn't sound either
like you had fatal failure of production box because of bug in 3ware
firmware. Correct me if I'm wrong, otherwise I see you on the same page
with me: i.e. not flashing new firmware as a part of "routine update" of
production machine (together with system/software updates).

Valeri

>
> --keith
>
> --
> kkel...@wombat.san-francisco.ca.us
>
>
> ___
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>



Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-06 Thread Keith Keller
On 2014-09-06, Valeri Galtsev  wrote:
>
> ... I've mentined manufacturers in another reply: tyan, lsi, 3ware, ati...

Even 3ware has had buggy firmwares.  I once had to flash a 3ware card
years into production because it was not until then that this particular
bug was exposed by my configuration.

--keith

-- 
kkel...@wombat.san-francisco.ca.us


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-06 Thread John R Pierce

On 9/6/2014 4:02 PM, Valeri Galtsev wrote:

That doesn't mean that you have to flash firmware onto LSI controller
every so often after you placed controller into production because
original version of firmware is crap, and updated version will turn out to
be crap several Months after its release, and so on. You did have nice
thing before you flashed, which alas was different hardware from what you
needed. You flashed different version to modify hardware. And after that
the hardware was exactly what you needed it to be. And from this point on
you don't need to flash it, unless you decide to change its functions to
what they were with original firmware.


for some unfathomable reason, IT (initiator-terminator) internal SAS 
cards are nearly unobtanium.   The external ones cost stupid money, at 
least as expensive as high end SAS raid cards, and I really don't 
understand it.


ok, I do understand it...   MS Windows prefers using hardware raid since 
the built in storage management is dreadful.




--
john r pierce  37N 122W
somewhere on the middle of the left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-06 Thread Valeri Galtsev

On Sat, September 6, 2014 4:52 pm, John R Pierce wrote:
> On 9/6/2014 1:53 PM, Valeri Galtsev wrote:
>> ... I've mentioned manufacturers in another reply: tyan, lsi, 3ware,
>> ati...
>
> A few months ago, I had to flash the firmware on a LSI 2008 aka 9211-8i
> because I needed the card in "IT" (Initiator Target) mode rather than
> "IR" (Integrated Raid), and this requires different firmware AND card
> bios.This was surprisingly difficult to accomplish as the system had
> a UEFI BIOS, and on that the LSI logic firmware flasher wouldn't operate
> in MSDOS, I had to discover and utilize this bizarro-world known as the
> UEFI Shell to run the firmware flash utility.
>

That doesn't mean that you have to flash firmware onto LSI controller
every so often after you placed controller into production because
original version of firmware is crap, and updated version will turn out to
be crap several Months after its release, and so on. You did have nice
thing before you flashed, which alas was different hardware from what you
needed. You flashed different version to modify hardware. And after that
the hardware was exactly what you needed it to be. And from this point on
you don't need to flash it, unless you decide to change its functions to
what they were with original firmware.

This whole thing is way different from what I originally was displeased
(i.e. the "necessity" to apply updates to firmware to fix the thing that
appears to be broken with older crappy version of firmware).

So: LSI still is in my list of great hardware manufacturers. (Even though
my favorite is 3ware, I forgot to mention one other good one: areca, whose
place will be after lsi in my book). And I don't care how hard it is to
flash LSI card (which you had to do _before_ you placed it into
production). In worst case scenario you could hire someone to do it for
you. After doing it yourself you can become extremely proud of yourself:
now you know that you are worth of your salary. But certainly you knew it
before that ;-)

Valeri

>
>
> --
> john r pierce  37N 122W
> somewhere on the middle of the left coast
>
> ___
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>



Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-06 Thread John R Pierce

On 9/6/2014 1:53 PM, Valeri Galtsev wrote:

... I've mentined manufacturers in another reply: tyan, lsi, 3ware, ati...


A few months ago, I had to flash the firmware on a LSI 2008 aka 9211-8i 
because I needed the card in "IT" (Initiator Target) mode rather than 
"IR" (Integrated Raid), and this requires different firmware AND card 
bios.This was surprisingly difficult to accomplish as the system had 
a UEFI BIOS, and on that the LSI logic firmware flasher wouldn't operate 
in MSDOS, I had to discover and utilize this bizarro-world known as the 
UEFI Shell to run the firmware flash utility.




--
john r pierce  37N 122W
somewhere on the middle of the left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-06 Thread Valeri Galtsev

On Sat, September 6, 2014 2:16 pm, Keith Keller wrote:
> On 2014-09-06, Valeri Galtsev  wrote:
>> I get rackmount
>> ones assembled by small company (companies) and about 1/2 of cost of
>> similar hardware from Dell. Those are for the most part based on Tyan
>> barebones. And during last at least decade I never had a "must to" flash
>> newer BIOS situation with any of those boxes.
>
> You have been lucky, then.

... I've mentined manufacturers in another reply: tyan, lsi, 3ware, ati...

I agree that flashing the firmware should be
> a rare event, but expecting the rate to be exactly 0 is an unreasonable
> expectation.
>
> I have had to flash a BIOS once, and a BMC once, in about 10 years of
> buying server hardware.

Great, I'm happy to be on the same page with you!

(Yes, flashing a BMC probably wouldn't brick a
> box, but it'd brick getting a remote console, which for me is almost as
> serious.)  I consider that an acceptable bug rate.
>
> Flashing a RAID controller is actually more frightening

I meant raid controller off the shelf, before you start using it on the
new box you are building with newly released 3TB drives... I'm still on
the same page with you ;-)

Valeri

 to me--flashing
> the BIOS isn't likely to hose your data, but an undetected bad flash on
> a RAID controller could.  Sadly my flash rate of my controllers is
> slightly higher than my BIOS flash rate.
>
> --keith
>
>
>
> --
> kkel...@wombat.san-francisco.ca.us
>
>
> ___
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>



Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-06 Thread Valeri Galtsev

On Sat, September 6, 2014 2:27 pm, Jonathan Billings wrote:
> On Sat, Sep 06, 2014 at 09:46:36AM -0500, Valeri Galtsev wrote:
>> But that is exactly what I said: if the hardware was released and sold
>> with this piece of crap BIOS, then you shouldn't be buying that junk in
>> the first place. Or at least stop buying the crap made by _this_
>> manufacturer in a future. I'm still not convinced. Any better reasons?
>
> In my experience, all code has bugs.  Instead of trying to find some
> vendor that has magically released hardware with bug-free firmware,

I've found a few: tyan for system board, 3ware and LSI for raid
controller, ATI for video card...

I
> choose vendors that make it relatively painless to apply the firmware
> updates under Linux.

This is only so for either very rich, who can afford to have stand by
hardware to replace bricked by flashing box, or very happy to the level
they don't care that the box will not come back up in next 5 min. I am
definitely neither of two...

Valeri

>
> --
> Jonathan Billings 
> ___
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>



Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-06 Thread Jonathan Billings
On Sat, Sep 06, 2014 at 09:46:36AM -0500, Valeri Galtsev wrote:
> But that is exactly what I said: if the hardware was released and sold
> with this piece of crap BIOS, then you shouldn't be buying that junk in
> the first place. Or at least stop buying the crap made by _this_
> manufacturer in a future. I'm still not convinced. Any better reasons?

In my experience, all code has bugs.  Instead of trying to find some
vendor that has magically released hardware with bug-free firmware, I
choose vendors that make it relatively painless to apply the firmware
updates under Linux.

-- 
Jonathan Billings 
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-06 Thread Keith Keller
On 2014-09-06, Valeri Galtsev  wrote:
> I get rackmount
> ones assembled by small company (companies) and about 1/2 of cost of
> similar hardware from Dell. Those are for the most part based on Tyan
> barebones. And during last at least decade I never had a "must to" flash
> newer BIOS situation with any of those boxes.

You have been lucky, then.  I agree that flashing the firmware should be
a rare event, but expecting the rate to be exactly 0 is an unreasonable
expectation.

I have had to flash a BIOS once, and a BMC once, in about 10 years of
buying server hardware.  (Yes, flashing a BMC probably wouldn't brick a
box, but it'd brick getting a remote console, which for me is almost as
serious.)  I consider that an acceptable bug rate.

Flashing a RAID controller is actually more frightening to me--flashing
the BIOS isn't likely to hose your data, but an undetected bad flash on
a RAID controller could.  Sadly my flash rate of my controllers is
slightly higher than my BIOS flash rate.

--keith



-- 
kkel...@wombat.san-francisco.ca.us


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-06 Thread Valeri Galtsev

On Sat, September 6, 2014 10:07 am, John R Pierce wrote:
> On 9/6/2014 7:46 AM, Valeri Galtsev wrote:
>> But that is exactly what I said: if the hardware was released and sold
>> with this piece of crap BIOS, then you shouldn't be buying that junk in
>> the first place. Or at least stop buying the crap made by_this_
>> manufacturer in a future. I'm still not convinced. Any better reasons?
>
> with that approach, you'd quickly find yourself with zero vendors left.

No, I'm still buying Dell desktops. I gave up on their rackmount boxes
(with Dell you don't have flexibility of choice of your preferred, say,
RAID cards: step left, step right and you are shot ;-). I get rackmount
ones assembled by small company (companies) and about 1/2 of cost of
similar hardware from Dell. Those are for the most part based on Tyan
barebones. And during last at least decade I never had a "must to" flash
newer BIOS situation with any of those boxes. If I ever flashed new BIOS,
it was only once before I put the box in production.

Of course, I flashed BIOS of my laptop (after unsoldering EPROM, dumping
original BIOS content, editing the darn thing with hex editor, flashing it
on new EPROM chip, and then sticking it into socket I soldered to laptop
system board in place of EPROM chip), but that is different. The World
will not stop spinning if my laptop stays dead for a couple of days, or
weeks or forever. Any of the servers. - its way different.

Valeri

>
>
>
> --
> john r pierce  37N 122W
> somewhere on the middle of the left coast
>
> ___
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>



Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-06 Thread John R Pierce

On 9/6/2014 7:46 AM, Valeri Galtsev wrote:

But that is exactly what I said: if the hardware was released and sold
with this piece of crap BIOS, then you shouldn't be buying that junk in
the first place. Or at least stop buying the crap made by_this_
manufacturer in a future. I'm still not convinced. Any better reasons?


with that approach, you'd quickly find yourself with zero vendors left.



--
john r pierce  37N 122W
somewhere on the middle of the left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-06 Thread Valeri Galtsev

On Sat, September 6, 2014 9:21 am, Steven Tardy wrote:
> On Sat, Sep 6, 2014 at 9:34 AM, Valeri Galtsev 
> wrote:
>
>>
>> I was always fascinated: why [some] people are dying to upgrade
>> firmware?
>> It doesn't matter whether by firmware you mean system board BIOS, or
>> firmware of some card. Why taking chance having your machine hosed?
>
>
> Because BIOS updates often fix corner case issues/bugs.
> The BIOS release notes for this PowerEdge 2970 server:
>   http://downloads.dell.com/bios/PE2970-040201BIOS.txt
> includes:
>   * Fixed intermittent SATA Drive B not found error.

But that is exactly what I said: if the hardware was released and sold
with this piece of crap BIOS, then you shouldn't be buying that junk in
the first place. Or at least stop buying the crap made by _this_
manufacturer in a future. I'm still not convinced. Any better reasons?

Valeri

>
> The likelihood of a BIOS upgrade going bad if due diligence is done to
> verify the BIOS upgrade is for that hardware is practically zero.
> ___
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>



Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-06 Thread Steven Tardy
On Sat, Sep 6, 2014 at 9:34 AM, Valeri Galtsev 
wrote:

>
> I was always fascinated: why [some] people are dying to upgrade firmware?
> It doesn't matter whether by firmware you mean system board BIOS, or
> firmware of some card. Why taking chance having your machine hosed?


Because BIOS updates often fix corner case issues/bugs.
The BIOS release notes for this PowerEdge 2970 server:
  http://downloads.dell.com/bios/PE2970-040201BIOS.txt
includes:
  * Fixed intermittent SATA Drive B not found error.

The likelihood of a BIOS upgrade going bad if due diligence is done to
verify the BIOS upgrade is for that hardware is practically zero.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-06 Thread Valeri Galtsev

On Fri, September 5, 2014 2:20 pm, m.r...@5-cent.us wrote:
> By the bye, about firmware updates: I like Dell's the best of all. HP, run
> it from some kind of DOS, and hope. Dell, you can do from a running CentOS
> system (I've done it a few times), and unlike everyone else's firmware
> updates, it says, "collecting information", then *tells* you that a) this
> update is, in fact, for this hardware (and so won't brick it), and b)
> whether it's newer than what's installed.

I was always fascinated: why [some] people are dying to upgrade firmware?
It doesn't matter whether by firmware you mean system board BIOS, or
firmware of some card. Why taking chance having your machine hosed? If
current firmware version is crap, then you shouldn't buy any hardware by
this manufacturer in the first place. If current version is OK, why bother
re-flashing and taking chance to kill the [whatever] board. Beats me. The
only time I felt it justified was when new firmware [of 3ware RAID
adapter] was adding support for hard drives above 2TB capacity.

Can anybody offer an argument that can change my mind?

Valeri

PS of course, yo can be that rich and have 3 redundant machines for
everything, so you wouldn't care about each particular one ;-)

>
> mark
>
> ___
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>



Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-05 Thread John Plemons
Hey just coming into this conversation. Here is an Idea.. Why not 
install a SATA card into the machine, one that supports AHCI. I'm 
guessing there is a free PCI or PCI-E slot.


They are made, here is a link, I found quickly with a google search..  
Bang for buck, it could be the cheapest option.


http://www.lycom.com.tw/PE-126.htm

http://www.lycom.com.tw/PE-125.htm  ( Better card )

It could save a bunch of headaches.

On your 2970 which series are you running?  II or III

John


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-05 Thread m . roth
Jason Pyeron wrote:
>> From: m.r...@5-cent.us

>> Dumb question: these machines are getting very long in the tooth, but
>> you're putting SSD's in them? New, or newer machines, would
>
>> solve a lot of problems
>
> Their warrantees are good for another few years... And the money is not :)

Warning: danger, Will Robinson.

One of the main things that pushed us to surplus ours was interesting:
inside of a month, 4? 5? more? of them had the PERC fail, fatally. Amazing
quality control (and they were in about three different rooms, including
one or more in the datacenter, so it wasn't the environment).

Refurbed machines are also an option

   mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-05 Thread Jason Pyeron
> -Original Message-
> From: m.r...@5-cent.us
> Sent: Friday, September 05, 2014 15:19
> To: CentOS mailing list
> 
> Jason Pyeron wrote:
> >> From: m.r...@5-cent.us
> >> Jason Pyeron wrote:
> >> >> From: Jason Pyeron
> >> >> > [mailto:centos-boun...@centos.org] On Behalf Of John R Pierce
> >> >> > On 8/31/2014 2:03 PM, Jason Pyeron wrote:
> 
> >> >> > > Yes. They support internal SATA drives, we are changing
> >> >> > from spinning drives to SSD. I am working with Dell to get a
> >> >> > BIOS patch, but I wont hold my breath.
> 
> Dumb question: these machines are getting very long in the tooth, but
> you're putting SSD's in them? New, or newer machines, would 

32GB SSD for the boot device, not on the raid arrays.

> solve a lot of
> problems

Their warrantees are good for another few years... And the money is not :)

> 
> >> >> >
> >> >> > is the SATA interface in AHCI mode or legacy IDE emulation?
> >> >>
> >> >> Good question, I will ask Dell. The BIOS only has Off and
> >> >> Auto as choices. Is there a preference I should shoot for?
> >> >
> >> > So the dell tech says it only supports ATA (IDE) mode.
> >> [Sorry for the
> >> > accidental forward]
> >> >
> >> > Now I have to find an alternative to supporting a SSD boot
> >> > device on a SATA port in IDE (ATA) mode.
> >> >
> >> Ok, I see - it's an old 2970 - I see the manuals on Dell's
> >> site were last revised in 2011. We got rid of all our 
> 2950's (except for
> >> one, I think, or two, and they're another team's). IIRC, 
> they did have a
> >> choice of AHCI or RAID, and I think there may have been 
> one other option.
> >> Unless this is
> >
> > I think that is on the PERC contoller. The Onboard SATA A/B 
> ports are the
> > issue.
> 
> Nope. That's the kind of stuff that's only in the BIOS - it's 
> certainly
> not on a PERC.

Will go over it again with a fine tooth comb.

> 
> > We have some with 40 pin IDE, but I am ignoring them.
> 
> And to that I have one response: MTBF. You need to talk to management
> about spending some money

Step 1. Make more money.
Step 2. Replace 40 of them when the support contract expires.

> >
> > Both IDE and SATA mother boards have the same BIOS version!?!?!
> >
> Presumably from when the switchover was happening.
> 
> H... have you spoken to Dell, or looked on their website, for a
> firmware update for the BIOS?

Running the latest BIOS.

> 
> mark
> 
> ___
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
> 
> 


--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-   -
- Jason Pyeron  PD Inc. http://www.pdinc.us -
- Principal Consultant  10 West 24th Street #100-
- +1 (443) 269-1555 x333Baltimore, Maryland 21218   -
-   -
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
This message is copyright PD Inc, subject to license 20080407P00. 
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-05 Thread m . roth
By the bye, about firmware updates: I like Dell's the best of all. HP, run
it from some kind of DOS, and hope. Dell, you can do from a running CentOS
system (I've done it a few times), and unlike everyone else's firmware
updates, it says, "collecting information", then *tells* you that a) this
update is, in fact, for this hardware (and so won't brick it), and b)
whether it's newer than what's installed.

mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-05 Thread m . roth
Jason Pyeron wrote:
>> From: m.r...@5-cent.us
>> Jason Pyeron wrote:
>> >> From: Jason Pyeron
>> >> > [mailto:centos-boun...@centos.org] On Behalf Of John R Pierce
>> >> > On 8/31/2014 2:03 PM, Jason Pyeron wrote:

>> >> > > Yes. They support internal SATA drives, we are changing
>> >> > from spinning drives to SSD. I am working with Dell to get a
>> >> > BIOS patch, but I wont hold my breath.

Dumb question: these machines are getting very long in the tooth, but
you're putting SSD's in them? New, or newer machines, would solve a lot of
problems

>> >> >
>> >> > is the SATA interface in AHCI mode or legacy IDE emulation?
>> >>
>> >> Good question, I will ask Dell. The BIOS only has Off and
>> >> Auto as choices. Is there a preference I should shoot for?
>> >
>> > So the dell tech says it only supports ATA (IDE) mode.
>> [Sorry for the
>> > accidental forward]
>> >
>> > Now I have to find an alternative to supporting a SSD boot
>> > device on a SATA port in IDE (ATA) mode.
>> >
>> Ok, I see - it's an old 2970 - I see the manuals on Dell's
>> site were last revised in 2011. We got rid of all our 2950's (except for
>> one, I think, or two, and they're another team's). IIRC, they did have a
>> choice of AHCI or RAID, and I think there may have been one other option.
>> Unless this is
>
> I think that is on the PERC contoller. The Onboard SATA A/B ports are the
> issue.

Nope. That's the kind of stuff that's only in the BIOS - it's certainly
not on a PERC.

> We have some with 40 pin IDE, but I am ignoring them.

And to that I have one response: MTBF. You need to talk to management
about spending some money
>
> Both IDE and SATA mother boards have the same BIOS version!?!?!
>
Presumably from when the switchover was happening.

H... have you spoken to Dell, or looked on their website, for a
firmware update for the BIOS?

mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-05 Thread Jason Pyeron
> -Original Message-
> From: m.r...@5-cent.us
> Sent: Friday, September 05, 2014 14:50
> 
> Jason Pyeron wrote:
> >> -Original Message-
> >> From: Jason Pyeron
> >> Sent: Sunday, August 31, 2014 18:16
> >>
> >> > -Original Message-
> >> > From: centos-boun...@centos.org
> >> > [mailto:centos-boun...@centos.org] On Behalf Of John R Pierce
> >> > Sent: Sunday, August 31, 2014 17:34
> >> > To: centos@centos.org
> >> > Subject: Re: [CentOS] Install Centos 6 x86_64 on Dell
> >> > PowerEdge 2970 and aSSD (hardware probing issues)
> >> >
> >> > On 8/31/2014 2:03 PM, Jason Pyeron wrote:
> >> > > Yes. They support internal SATA drives, we are changing
> >> > from spinning drives to SSD. I am working with Dell to get a
> >> > BIOS patch, but I wont hold my breath.
> >> >
> >> > is the SATA interface in AHCI mode or legacy IDE emulation?
> >>
> >> Good question, I will ask Dell. The BIOS only has Off and
> >> Auto as choices. Is there a preference I should shoot for?
> >
> > So the dell tech says it only supports ATA (IDE) mode. 
> [Sorry for the
> > accidental forward]
> >
> > Now I have to find an alternative to supporting a SSD boot 
> device on a
> > SATA port in IDE (ATA) mode.
> >
> Ok, I see - it's an old 2970 - I see the manuals on Dell's 
> site were last
> revised in 2011. We got rid of all our 2950's (except for 
> one, I think, or
> two, and they're another team's). IIRC, they did have a 
> choice of AHCI or
> RAID, and I think there may have been one other option. Unless this is

I think that is on the PERC contoller. The Onboard SATA A/B ports are the issue.

> *really* old, I can't imagine that they actually have a 
> physical IDE or
> EIDE interface, so there should be some way around this.

We have some with 40 pin IDE, but I am ignoring them.

Both IDE and SATA mother boards have the same BIOS version!?!?!

-Jason 

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-   -
- Jason Pyeron  PD Inc. http://www.pdinc.us -
- Principal Consultant  10 West 24th Street #100-
- +1 (443) 269-1555 x333Baltimore, Maryland 21218   -
-   -
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
This message is copyright PD Inc, subject to license 20080407P00. 


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-05 Thread m . roth
Jason Pyeron wrote:
>> -Original Message-
>> From: Jason Pyeron
>> Sent: Sunday, August 31, 2014 18:16
>>
>> > -Original Message-
>> > From: centos-boun...@centos.org
>> > [mailto:centos-boun...@centos.org] On Behalf Of John R Pierce
>> > Sent: Sunday, August 31, 2014 17:34
>> > To: centos@centos.org
>> > Subject: Re: [CentOS] Install Centos 6 x86_64 on Dell
>> > PowerEdge 2970 and aSSD (hardware probing issues)
>> >
>> > On 8/31/2014 2:03 PM, Jason Pyeron wrote:
>> > > Yes. They support internal SATA drives, we are changing
>> > from spinning drives to SSD. I am working with Dell to get a
>> > BIOS patch, but I wont hold my breath.
>> >
>> > is the SATA interface in AHCI mode or legacy IDE emulation?
>>
>> Good question, I will ask Dell. The BIOS only has Off and
>> Auto as choices. Is there a preference I should shoot for?
>
> So the dell tech says it only supports ATA (IDE) mode. [Sorry for the
> accidental forward]
>
> Now I have to find an alternative to supporting a SSD boot device on a
> SATA port in IDE (ATA) mode.
>
Ok, I see - it's an old 2970 - I see the manuals on Dell's site were last
revised in 2011. We got rid of all our 2950's (except for one, I think, or
two, and they're another team's). IIRC, they did have a choice of AHCI or
RAID, and I think there may have been one other option. Unless this is
*really* old, I can't imagine that they actually have a physical IDE or
EIDE interface, so there should be some way around this.

   mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-09-05 Thread Jason Pyeron
> -Original Message-
> From: Jason Pyeron 
> Sent: Sunday, August 31, 2014 18:16
> 
> > -Original Message-
> > From: centos-boun...@centos.org 
> > [mailto:centos-boun...@centos.org] On Behalf Of John R Pierce
> > Sent: Sunday, August 31, 2014 17:34
> > To: centos@centos.org
> > Subject: Re: [CentOS] Install Centos 6 x86_64 on Dell 
> > PowerEdge 2970 and aSSD (hardware probing issues)
> > 
> > On 8/31/2014 2:03 PM, Jason Pyeron wrote:
> > > Yes. They support internal SATA drives, we are changing 
> > from spinning drives to SSD. I am working with Dell to get a 
> > BIOS patch, but I wont hold my breath.
> > 
> > is the SATA interface in AHCI mode or legacy IDE emulation?
> 
> Good question, I will ask Dell. The BIOS only has Off and 
> Auto as choices. Is there a preference I should shoot for?

So the dell tech says it only supports ATA (IDE) mode. [Sorry for the 
accidental forward]

Now I have to find an alternative to supporting a SSD boot device on a SATA 
port in IDE (ATA) mode.

-Jason


--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-   -
- Jason Pyeron  PD Inc. http://www.pdinc.us -
- Principal Consultant  10 West 24th Street #100-
- +1 (443) 269-1555 x333Baltimore, Maryland 21218   -
-   -
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
This message is copyright PD Inc, subject to license 20080407P00. 


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-08-31 Thread John R Pierce
On 8/31/2014 3:15 PM, Jason Pyeron wrote:
> Good question, I will ask Dell. The BIOS only has Off and Auto as choices. Is 
> there a preference I should shoot for?

ACHI is pretty much required for SSD support.



-- 
john r pierce  37N 122W
somewhere on the middle of the left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-08-31 Thread Jason Pyeron

> -Original Message-
> From: centos-boun...@centos.org 
> [mailto:centos-boun...@centos.org] On Behalf Of John R Pierce
> Sent: Sunday, August 31, 2014 17:34
> To: centos@centos.org
> Subject: Re: [CentOS] Install Centos 6 x86_64 on Dell 
> PowerEdge 2970 and aSSD (hardware probing issues)
> 
> On 8/31/2014 2:03 PM, Jason Pyeron wrote:
> > Yes. They support internal SATA drives, we are changing 
> from spinning drives to SSD. I am working with Dell to get a 
> BIOS patch, but I wont hold my breath.
> 
> is the SATA interface in AHCI mode or legacy IDE emulation?

Good question, I will ask Dell. The BIOS only has Off and Auto as choices. Is 
there a preference I should shoot for?

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-   -
- Jason Pyeron  PD Inc. http://www.pdinc.us -
- Principal Consultant  10 West 24th Street #100-
- +1 (443) 269-1555 x333Baltimore, Maryland 21218   -
-   -
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
This message is copyright PD Inc, subject to license 20080407P00.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-08-31 Thread John R Pierce
On 8/31/2014 2:03 PM, Jason Pyeron wrote:
> Yes. They support internal SATA drives, we are changing from spinning drives 
> to SSD. I am working with Dell to get a BIOS patch, but I wont hold my breath.

is the SATA interface in AHCI mode or legacy IDE emulation?



-- 
john r pierce  37N 122W
somewhere on the middle of the left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-08-31 Thread Rainer Duffner

Am 31.08.2014 um 23:03 schrieb Jason Pyeron :


>> 
>> Is that actually a supported configuration (in the Dell-sense)?.
>> 
> 
> Yes. They support internal SATA drives, we are changing from spinning drives 
> to SSD. I am working with Dell to get a BIOS patch, but I wont hold my breath.
>> 
>> 





You can always try to install RHEL6 and open a ticket with RedHat if that 
fails, too….


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

2014-08-31 Thread Jason Pyeron
> -Original Message-
> From: centos-boun...@centos.org 
> [mailto:centos-boun...@centos.org] On Behalf Of Rainer Duffner
> Sent: Sunday, August 31, 2014 16:54
> To: CentOS mailing list
> Subject: Re: [CentOS] Install Centos 6 x86_64 on Dell 
> PowerEdge 2970 and aSSD (hardware probing issues)
> 
> 
> Am 31.08.2014 um 21:52 schrieb Jason Pyeron :
> 
> > I have a fleet of 2970s and we are upgrading the hardrives 
> on the motherboard SATA ports (A/B not the PERC backplane) 
> when a "detecting hardware" is performed the system crashes, 
> reboots and gives an E1422 error code (useless video: 
> https://www.youtube.com/watch?v=PhyMeUHJar4).
> > 
> > We narrowed it down to a motherboard BIOS issue, if we 
> remove the SSD or add noprobe to the kernel the installer 
> does not crash. 
> 
> 
> 
> 
> Is that actually a supported configuration (in the Dell-sense)?.
> 

Yes. They support internal SATA drives, we are changing from spinning drives to 
SSD. I am working with Dell to get a BIOS patch, but I wont hold my breath.

> Which is the "primary" hard drive then? SATA or PERC?
> 

It will be the SATA.

> Have you booted any other OS on it?

Centos 5/6 32/64

> FreeBSD 10?
> CentOS7?

About to try that.

> 
> Ubuntu?
> 
> Note that I have no idea about Dell servers. I've never 
> worked with them in my professional life - but my experience 
> is that trying the same thing more than three times in a row 

Agreed, but there are 20+ servers and twenty plus SSDs. The SSDs work fine in 
the non-PE2970 systems with RHEL/Centos 6 64bit.

> is a waste of time (and nerves: I can literally see my life 
> being shortened by watching server-BIOS boot-up screens.)

-Jason 

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-   -
- Jason Pyeron  PD Inc. http://www.pdinc.us -
- Principal Consultant  10 West 24th Street #100-
- +1 (443) 269-1555 x333Baltimore, Maryland 21218   -
-   -
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
This message is copyright PD Inc, subject to license 20080407P00.

 

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos