> > First, there's a huge difference, by percentage, of the amount of UP boxes
> > that don't work vs the amount of SMP boxes that don't work.  
> 
> Ah, an observation relevant to the linux-smp list!  Let's see:
> 
> Do you have some hard figures on that?  According to the linux-smp FAQ
> (and from years' membership of this list) I would expect that the "huge
> difference" is something like 98% (SMP) to 99.5% (UP) (or even higher in
> both with even smaller margins) with the SMP problems concentrated in a
> few, very specific, motherboards and hardware combinations.  I'd bow to
> any authoritative, non-anecdotal evidence, though.

No, I don't.  All I know is what I see on this list, and quite frankly it
scares the hell out of me to think of trying to support 2.0 SMP.  Don't
have this argument with me; tell Alan Cox.  If he had told us we should
support it, we probably would have.  We've been *wanting* to support it
for quite some time.  But the simple fact is that 2.0 SMP isn't ready 
for prime time, whether you like it or not.  

Just as I can't *prove* that it isn't with hard numbers, you certainly
can't prove that it is.

> > Second, this isn't a "RH" conservative turn at all.  RH has *never* provided
> > or supported an SMP kernel, EVER.  There is no turn.  This is the way it
> > always has been.  So, this issue lies with whether Dell wants to go out on
> > a limb and do something custom. 
> 
> Again, having done SMP almost exclusively since shortly after 2.0.0 was
> released with nearly flawless operation even when the 2.0.x kernel
> really did have deadlock problems, I don't view this as "going out on a
> limb".  
> 
> However, I do have some very current, very relevant data on this that I
> can contribute to the list.  Recall that the original discussion
> addressed Dell PowerEdge systems being sold with RH linux installed,
> multiple processors, and a UP kernel.  Just what is the "risk"
> associated with running them 2.0.36 SMP?

Why don't you ask Dell?  It's their choice, not ours.  Our recommendation
is that we don't support it on any machine and that they *should* wait
for our next release.  I've seen it run fine on 2300's myself.  I've seen
various 2.0 kernels crash in a blaze of fury on 2300's.  

> The 12 systems that form the core of our beowulf, all running stock,
> unpatched 2.0.36 have been kept under continual load of 1-1.5 per CPU
> (2-3 overall) with moderate continuous and occasionally heavy/bursty
> network traffic.  Their uptimes range from let's see, 105 days to 111
> days.  Now mind you, these systems aren't running Red Hat.  They are
> actually running a melange of old Slackware, a few upgrades of e.g.
> modutils and the like, libc 5.4.44 (I know, we're out of touch;-) and an
> absolutely stock, unpatched 2.0.36.  Then there are the other twelve or
> thirteen SMP systems in our department net on desktops -- mostly PPro's
> and PII's -- with a fair range of hardware onboard and with uptimes that
> range from 5 days to almost 200.  I think that the record in our
> department is a system that has to be running 2.0.33 or maybe even
> 2.0.32 and has been up for almost a year.  They way it's going, it will
> stay up until we "upgrade" to Red Hat this spring.

That's *great*.   But point me to the proof *prior* to that 2.0.36 kernel
that SMP was going to be rock solid.  You can't do it.  The fact that
you've now tested 2.0.36 and it appears very good doesn't help us very 
much.  Things have to be field tested for a time before people are going
to put lots of faith in it.

> Now, two questions:  
> 
> First, are you implying that Red Hat is LESS stable with 2.0.x linux SMP
> than this hard data clearly suggests (that is, should we reconsider
> converting to Red Hat as our base distribution)?  I certainly hope not.
> ;-)

Under SMP?  It could be, I don't know.  But the fact that this is the
first kernel I've seen that is clearly good doesn't help the situation.

> Second, >>where is the limb<< that Dell is supposedly going out on?  A
> hundred days of uptime isn't the whole story, the story isn't finished
> -- there have been NO CRASHES WHATSOEVER on the twelve workhorse
> compute-server systems.

Great.  Point me to where you saw this situation *before* 2.0.36.  

> As far as I'm concerned, on all but a tiny fraction of SMP systems
> 2.0.36 is awesomely, boringly STABLE.  Alan Cox, et. al., have done a
> fabulous job -- they're down to patching bugs that affect only a
> miniscule fraction of all SMP systems built for a few users a wee bit of
> the time; at that, usually ones (too old or too new) with some feature
> or another that departs from the approximate "standard".

The simple fact is that by the time 2.0 got stable, most work was focused
on 2.2.  That meant getting fixes for 2.0 would have been much more 
difficult.  The kernel folks were basically telling us *not* to try to
support it as 2.2 was so close and would be better. 

> Now, we're still looking forward to installing 2.2.x (right after the RH
> upgrade).  Reports on this list (and my own still growing base of
> experience with it) have it both technically superior in many ways and
> amazingly stable for its revision number (although I would expect LESS
> stable than 2.0.36, making it surprising that RH plans to support SMP
> only under 2.2.x if stability is really your primary concern).  
> 
> I certainly hope that we aren't "going out on a limb" by expecting that
> Red Hat will support SMP operation, with 2.0.x or 2.2.x, any LESS
> flawless than we are already observing with our existing hodge-podge
> linux installation and 2.0.36.  I also hope that we can look forward to
> at least modest support from Red Hat in the event that we encounter
> difficulties -- we are buying RH CD's fairly regularly now and are
> mostly self-supporting via membership in the various relevant linux
> lists anyway.

Even is 2.2 is less stable (I don't think I buy that...I think the bulk
of the problems we see these days are with new boards coming out that
have weird things happening), we will support it.  Why?  Because the
kernel folks tell us it is supportable.  

> > We do provide all sources, but we do *not* install them all by default.
> > That would be dumb.  We've never done that and we likely never will.
> > *Most* people don't need to recompile their kernel and thus don't need
> > the kernel sources.  But, there is a kernel-source binary RPM (as Doug
> 
> Our experience here obviously differs.  However, I'd be happy to poll
> DULUG, our Duke Linux User's Group and get some hard data on:

Fine, but your sample group is tainted.

>   a) Whether RH users have, on the average, needed to rebuild a kernel
> and
>   b) Whether RH users (including ones that haven't needed to rebuild a
> kernel) would consider it "dumb" to include the make ready kernel
> sources in the standard install, at least as a button/question mediated
> option.  I mean, it >>is<< trivial to make it a question in the standard
> install and let the user decide, isn't it?  Freedom of choice and all
> that?

We're not adding more questions to the install just for this.  If you want
them, do individual package selection and TURN THEM ON.  YOU HAVE THE POWER.

> In fact, given that DULUG is "right next door" to Red Hat, it might be
> reasonable for Red Hat to consider adopting Duke students as a
> prototyping "fishbowl".  It's a big enough population to be

We do that with the NCSU LUG already via installfests and the like.  At
last check DULUG didn't even have regular meetings, just a mailing
list.  Has that changed?

> statistically significant, but small enough to be controllable, the
> members are, if anything, brighter than average and highly motivated.
> At the moment, I and a few other DULUG members with similar experience
> provide most of the real support to these students.  I cannot help but
> believe that Red Hat could improve their product and support by working
> with these students, analyzing their problems and complaints, and
> working out scalable solutions that prevent the problems from occurring
> or minimize the difficulty of fixing them when they do.

Sure, there are lots of folks to work with to try to get problems solved.
We've never had a shortage of that.  What we do have is a shortage of
resources to work with each and every one of them.  That's why we have
bugzilla and the various mailing lists (include lists specific to our
public beta).  I'd encourage you to have them organize an installfest and
invite some RH folks to help.  We'd probably send someone.  I'd also 
encourage you to have them join our lists, play with the betas, and
provide feedback via already established channels.  It's not like we're
sitting in the dark over here.

> As an indirect support person for Red Hat (and hardly qualified for the
> role except in the most general linux/unix terms, although now that I
> run 5.2 at home my RH-specific skills are rapidly improving) I must say
> that my opinion of Red Hat hasn't changed too much over the last few
> years.  

Well, our goals haven't changed all that much, either.  ;-)

> > I've done exactly the above in the past.  
> 
> Then why didn't the RH person do just this when the original poster of
> the Dell question contacted RH for help?  Why is "How do I make my RH

Because it isn't supported!

> system run SMP" virtually a linux-smp FAQ?  Why is the immediate problem
> encountered by most of these people that they've gotten clean/current
> kernel sources and have a terrible time making them install according to
> the Red Hat prescription?  Why is the number of people who have
> privately contacted me to say "Go get 'em, Rob" continually increasing
> as I am giving voice to some concerns that are manifestly shared by
> quite a few individuals out there?  Obviously, the word isn't getting
> out.  As the maker of "linux for the masses" you have to know that
> nobody actually reads manuals -- the only thing many users know of Red
> Hat is what they see during their install and learn from their
> friends...

Why do I keep getting personal mail telling me that *I* am right?  Because
there are different opinions in the world on how things ought to be done.

Why do we even *bother* with package selection code at all?  Hell, by
your observation we should just install everything that will fit on
their disk, just like MS does.  I like to think we're better than that,
not worse.

In the future we will have better tools for package management.  People
will start to get used to the fact that they need to look through those
packages if something is "missing" (ie kernel sources) and install them.

> > You're arguing too many issues here.  The question is whether or not
> > 2.0 SMP is supportable or not.  We don't really think it is, especially
> > given how close we are to 2.2 SMP shipping.
> 
> <Stunned :-o>  
> 
> You don't think 2.0.>>36<< is supportable, although it is so stable it
> is basically moribund and abandoned, but you think that 2.2.(x \approx
> 5) (a brand new "stable" release) is, in spite of the fact that your
> existing 5.2 release runs 2.0.36 flawlessly in nearly all SMP boxes in
> existence (including the Dells under primary discussion, never forget!)
> while installing 2.2.x requires upgrading a half dozen key systems
> components irreversibly, some of which are >>still<< broken for 2.2 in
> the most recent "experimental" 2.2.x RPM's?  Brother, we have very
> different definitions of the word supportable and stable...

I'm not arguing this any more.  Go argue it with the core kernel 
developers.

> I'm >>glad<< RH is going to support 2.2.x and, at last, SMP.  I think
> that it is >>silly<< for Red Hat to not help Dell put a functional SMP
> version of 5.2 on their high end SMP servers in the meantime.  I think
> the data (NOT opinion, >>data<<) above makes it very clear that any
> argument about stability or supportability is specious.  Most Dell SMP
> server purchasers would probably consider well over 1000 days continous
> uptime in aggregate acceptable.  On the other hand, permitting Dell to
> distribute RH 5.2 with UP kernels on SMP systems makes Dell look stupid
> and RH look bad.  Or is it the other way around?

Thanks for your opinion.  I don't share it, but I don't see a reason to
argue any longer.

> Perhaps Red Hat doesn't value their business relationship with Dell.

Dumb comment.

> Perhaps Red Hat doesn't care about, umm, "irritating" the customers who
> buy SMP Dell systems to run their ISP or whatever only to find that it
> is pre-installed with a UP kernel, that no instructions are provided for
> actually using the other processors, and that when they call RH for help
> the service people get snooty and say "We don't do SMP".  Perhaps Red
> Hat doesn't care about its customers?  I hope not...

Dell knows that they should warn people that SMP configuration isn't
supported and that they have to do work to get it working if they want
it *now*.  That's the reason this got started...at some point they
didn't.  The person who originally pointed this out has now apologized
for started this whole mess and appears satisfied with the responses
thus far.  That's good enough for me.

> Obviously you guys are very well defended on this issue, but in cold,
> hard business terms, it would cost Red Hat on the order of ten
> man-hours, max, to put together a "custom" 2.0.36 SMP version of the 5.2
> CD for Dell's private use.  It would cost you no more time than that to
> add a 2.0.36-0.7smp RPM to the CD (and add an "install kernel source?"
> and a "beware, try-at-your-own-risk-and-join-linux-smp-for-help install
> smp kernel" question to the original install).  It would cost even less
> than this to provide a drop-in smp solution (a downloadable RPM) for
> those persons who call requesting it, and I'll bet such a thing already
> exists anyway (which would reduce the cost still further, but your
> service people still need to be directed to use it).  I personally think
> that the expected profits from such an investment, in customer
> satisfaction and goodwill translated into increased sales on Dell SMP
> platforms alone (not to mention the gazillion other SMP server platforms
> assembed by folks all over the country), are likely worth it.  But I'm
> not a Red Hat manager and RH clearly has problems providing adequate
> support as it is, so you could be right not to do it...

It would take *you* ten minutes to contribute the "HOWTO" on how to do
this with what currently exists.  Here's a framework for you:

install kernel-source RPM
remove the comment in front of SMP=1
read RH manual section foo

You write it up, I'll give it to our support folks to use as a canned
response when asked about this.  Deal?  It will have a disclaimer, though,
that this isn't supported and not to come to us for further help.

> > > I know for a fact that Dell can manage an SMP NT install on those
> > > particular beasts...however poor it may be.
> > 
> > So?
> 
> Interesting response for somebody selling a product in competition with
> NT on multiprocessing servers.  Perhaps Red Hat >>doesn't<< care about
> their business relationship with Dell or the multiprocessing server
> market...who'd have thought it?  How is Linus going to achieve world
> domination if linux users buying the "premier" linux distribution get so
> frustrated that they are forced to return to NT?  >>NEVER<< give a
> corporate manager who's taken the "risk" of trying linux at all a
> >>good<< excuse to go back to NT or you'll never win them back.  It will
> be "Been there, done that, no way" forever whenever somebody suggests
> reconverting to linux in the future...

It's sunny outside.  What illogical conclusions can you go draw from that?

You've taken this issue to new depths.  It's a simple matter that Dell
should warn people before selling Linux on an SMP box.  They're going to
do that (the fact they might not have in one case is an oversight).
You want Dell to preconfigure SMP support on those boxes.  Go tell Dell.
You want Red Hat to provide SMP kernels.  No.  We're not devoting 
engineering resources to do that now as they are tied up making SMP and
other things work well in 5.9.


--Donnie

--
   Donnie Barnes    http://www.redhat.com/~djb    [EMAIL PROTECTED]   "Bah."
   Challenge Diversity.  Ignore People.  Live Life.  Use Linux.  879. V. 
                The more you cry, the less you'll pee.


-
Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/mentre/smp-faq/
To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]

Reply via email to