Dear Brian,

thanks for your help


adding the changes to the make file did make a difference. The test
programs supplied by Christian did work for a while, but eventually the
system crashes after about 20-30 transfers of 1024 bytes.

It does seem that what you are saying is going in the right direction.

What do you think the implications would be of either:-

1. Ditching Redhat for Suse (say)?
2. Downloading and building a bog standard 2.4 kernel.
3. Dowloading and building a bog standard 2.6 kernel.

kind regards
Rob



p.s. We have dual pentium3s.
p.p.s I've found that
-freorder-blocks
Reorder basic blocks in the compiled function in order to reduce number of
taken branches and improve code locality.
Enabled at levels `-O2', `-O3'.
http://m68hc11.serveftp.org/doc/gcc_3.html#SEC16




                                                                                
                                                                 
                      "Brian F. G. Bidulock"                                    
                                                                 
                      <[EMAIL PROTECTED]>             To:       [EMAIL 
PROTECTED]                                                       
                      Sent by:                           cc:       
linux-streams@gsyc.escet.urjc.es                                              
                      [EMAIL PROTECTED]        Subject:  Re: [Linux-streams] 
TLI failure                                               
                      cet.urjc.es                                               
                                                                 
                                                                                
                                                                 
                                                                                
                                                                 
                      23/02/2005 11:35                                          
                                                                 
                      Please respond to bidulock                                
                                                                 
                                                                                
                                                                 
                                                                                
                                                                 




Robert.Wendes,

Gee Robert, this is the first time you mentioned the kernel you were
running on.  Perhaps you could also mention which exact kernel (there
are about 6 smp kernels in EL3), and what processors your a using.

Nevertheless, I did a little investigation into the EL3 2.4.21 kernels
and discovered a problem which will cause much grief if you are running
the i686 hugemem kernel (say, on Dell PowerEdge 2650/2680 dual Xeons).

Almost all 2.4 series kernels (in almost all distributions) above 2.4.18
have the following lines in /usr/src/linux-2.4/arch/i386/Makefile:

             ifdef CONFIG_MPENTIUMIII
             CFLAGS += -march=i686
             endif

             ifdef CONFIG_MPENTIUM4
             CFLAGS += -march=i686
             endif

EL3 kernels in the 2.4.21 series have the following:

             ifdef CONFIG_MPENTIUMIII
             CFLAGS += $(call check_gcc,-march=pentium3,-march=i686)
             endif

             ifdef CONFIG_MPENTIUM4
             CFLAGS += $(call check_gcc,-march=pentium4,-march=i686)
             endif

Also note that Redhat always adds the following:

             CFLAGS+=-freorder-blocks

If you check the GCC 3.2.3 compiler that the kernel is compiled with
(you can check this with 'cat /proc/version'), you will find that it
honors the -march=pentium3 and -march=pentium4 flags, as well as the
-freorder-blocks flag.  If you check the gcc296 compiler (read
/usr/src/linux-2.4/Documentation/Changes and understand that gcc3 will
not necessarily compile a 2.4 kernel, but the 2.96 one should), you will
find that gcc296 does not honor the -march=pentium3 nor the
-march=pentium4 flags and, guess what?, it won't honor the
-freorder-blocks flag either!  Of course that means that the compiler
recommended for compiling the stock 2.4.21 kernel will not compile a RH
2.4.21 kernel.  Lookin in the kernel-2.4.spec file for the cited EL3
kernel, you will find BuildRequires: gcc >= 2.96-98, yet the gcc296
compiler that ships with EL3 (gcc 2.96-128) will not compile an ix86
kernel because it will not honor the -freorder-blocks flag.

(If you look at the output of rpmbuild --rebuild -vv on the source rpm
for this could, you won't believe the warning and crap that gets spewed
out from compiling a 2.4 kernel with a gcc 3 compiler...)

Now, back to the -march flags.  If you have a Pentium 4, just about any
Pentium 4, anaconda will install the i686-hugemem (smp) kernel as the
default boot kernel.  The i686-hugemem kernel is the only kernel that
has CONFIG_MPENTIUMIII in its .config file.  (Strangley enough, this
-march modification is not in any other RH kernel in the 2.4.21 series
except for EL3 series even though CONFIG_MPENTIUMIII is set for bigmem
(smp) kernels in other (RH 7.x, 9, ...) kernels.)

Another difference is that all EL3 kernels are compiled -g for debug.
Yet the LiS and strinet kernel modules are not.  Other RH release have
a kernel compiled with -g (the 2.4.blah-blah.debug kernel) and the other
boot kernels compiled without -g.  Under EL3, *all* kernels are compiled
-g (I suppose so RH can support 'em).  That can be a non-architecture
related problem, particularly if gcc 3 enhances linkages to support
debugging.  It's no surprise that this doesn't happen with other
RH releases.

So, now, you likely have a kernel compiled for -g -march=pentium3
-freorder-blocks and a LiS STREAMS package and strinet driver compiled
for -march=i686 without -g nor -freorder-blocks.  I don't even know what
-freorder-blocks does ('cause its not in the gcc manual): is that a
RedHat thing?

Kernels and kernel modules compiled with different primary compiler
flags will result in a kernel that crahses mysteriously and is difficult
to impossible to debug.  Differences in -g and -march combined are
likely enough to kill.  We had this problem a year or two ago until we
found the -mpreferred-stack-boundary-2 problem with gcc 2.96 vs. gcc
2.95.3.

If you look in the SRPM you will find the change in the
linux-2.4.21-selected-ac-bits.patch, and I suppose the ac means Alan
Cox.  The -g CFLAGS change is in the spec changelog under Rik van Riel.
So there you go.  Because even a stock 2.4.25 kernel does not have this
change, the problem looks isolated to EL3.

Having gone through the -mpreffered-stack-boundary=2 grief myself, I
can sympathize with your plight.  However, it is not LiS, strinet or
libxnet that is causing your grief, it is the vendor of your overpriced
kernel that you have to thank for mucking with architecture flags.

I will stick some checks in the autoconf macros to see if I can fix this
in the longer term.  I had some reports of mysterious strtst failures on
dual-Xeon a number of months ago and was mysteriously failing strtst on
WhiteBox myself.

The short term workaround is as follows:

After doing a ./configure on the LiS/strxnet package look in the top
level Makefile for KERNEL_CFLAGS.  It will look something like

             KERNEL_CFLAGS = -Wall -Wno-trigraphs -Werror -O2 \
                         -fomit-frame-pointer -fno-strict-aliasing \
                         -fno-common -pipe -mpreferred-stack-boundary=2 \
                         -march=i686

Edit this line to add things like -g -freorder-blocks and change
-march=i686 to -march=pentium3 depending on your exact configuration.
Then perform a make and make install as normal.  If you run configure
again, it will overwrite the Makefile and you will have to make the
change again.

I hope that solves your problem.

--brian

P.S.  Please, please, please:  detailed bug reports are invaluable.
Do not report kernel crashes without the complete output from
./configure and the generated config.log.

I have wanted to put a bugreport script into the package for a while and
I will soon do that so that a bugreport template will be generated with
all the information that can be automatically gleened filled in.  But,
until then, please provide as much information as possible.  To avoid
excessively sized mails to this list, you can send attachments directly
to me, or join the [EMAIL PROTECTED] mailing list and post
them there.

On Tue, 22 Feb 2005, [EMAIL PROTECTED] wrote:

> Dear All,
>
> I've been in communication with  Brian Bidulock and Christian Hildner
> regarding problems with XTI.
>
> This is to record my thoughts at present, because I don't know where its
> going, and perhaps some other poor soul will benefit from my experience.
>
> I am using Redhat Linux with a 2.4.21-27.0.2.ELsmp kernel and dual Intel
> processors.
>
> I can configure, make and install the driver. I can run the test
programs.
> The original GCOM site indicates that the driver is o.k. with SMP, which
> would tend to be supported by the test programs.
>
> My legacy program from AIX uses tcp. It compiles against the driver. It
> doesn't run.
> The sample programs from HOB compiled after changes to the include files.
I
> think that's all I did, but when it ran
> it crashed the kernel on transferring data.
>
> My thought process has worked as follows:->
> 1 Are there any test programs that come with the driver which do work in
> this situation.
>   a) test-xnet, superficially seems representative. GDB wouldn't 'follow
a
> child fork' so it was down
>      to taking a long look at the code. Although this code does execute
the
> XTI interface, it does so over a pipe
>      and as a result none of the data structure need to be populated for
it
> to run.
>      So its not very representative, and not really a comprehensive test
> for any transport other than a pipe. In the
>      here and now, it would be a brave person who would say thattcp was
not
> a driving force.
>   b) test-inet_tcp also seemed a good candidate, but inspection of the
code
> (few comments as ever) revealed that
>      it wasn't what I was looking for being more streams based.
>   c) I couldn't find anything else that fitted the bill.
>
> 2. I'm sure that the test programs are comprehensive in their own way,
but
> they're certainly not all encompassing, and don#t cover tcp over XTI.
> 3. Because the test programs work on my SMP box, it might lead me to
> believe that the driver software was sound.
> 4. My belief now is that the driver isn't sound, because no user space
> program should bring the system down.
> 5. How do I report it? How do I report what! Unless someone has the
> time/machine to debug the driver it is unlilely to get fixed, even if I
> could offer more information about the bug which I can't. I could spend
> project time debugging the driver, but it is becoming more difficult to
> justify the amount of time I am spending not getting
> my legacy code going.
> 6. Having spent a fair few days believing that XTI would work on this
> Driver I felt that my project would have been better served by just
> converting the legacy software to sockets.
> 7. O.k., so as Christian says its beta, but I've used beta stuff before,
> and at that stage you would expect it to
> be a little better than crashing the system.
>
> At a crossroads... is XTI too old hat? even though this driver software
> isn't finished?
>
>
>
> _______________________________________________
> Linux-streams mailing list
> Linux-streams@gsyc.escet.urjc.es
> http://gsyc.escet.urjc.es/mailman/listinfo/linux-streams

--
Brian F. G. Bidulock    ¦ The reasonable man adapts himself to the ¦
[EMAIL PROTECTED]    ¦ world; the unreasonable one persists in  ¦
http://www.openss7.org/ ¦ trying  to adapt the  world  to himself. ¦
                        ¦ Therefore  all  progress  depends on the ¦
                        ¦ unreasonable man. -- George Bernard Shaw ¦
_______________________________________________
Linux-streams mailing list
Linux-streams@gsyc.escet.urjc.es
http://gsyc.escet.urjc.es/mailman/listinfo/linux-streams





_______________________________________________
Linux-streams mailing list
Linux-streams@gsyc.escet.urjc.es
http://gsyc.escet.urjc.es/mailman/listinfo/linux-streams

Reply via email to