Dear Brian, thanks for your help
adding the changes to the make file did make a difference. The test programs supplied by Christian did work for a while, but eventually the system crashes after about 20-30 transfers of 1024 bytes. It does seem that what you are saying is going in the right direction. What do you think the implications would be of either:- 1. Ditching Redhat for Suse (say)? 2. Downloading and building a bog standard 2.4 kernel. 3. Dowloading and building a bog standard 2.6 kernel. kind regards Rob p.s. We have dual pentium3s. p.p.s I've found that -freorder-blocks Reorder basic blocks in the compiled function in order to reduce number of taken branches and improve code locality. Enabled at levels `-O2', `-O3'. http://m68hc11.serveftp.org/doc/gcc_3.html#SEC16 "Brian F. G. Bidulock" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent by: cc: linux-streams@gsyc.escet.urjc.es [EMAIL PROTECTED] Subject: Re: [Linux-streams] TLI failure cet.urjc.es 23/02/2005 11:35 Please respond to bidulock Robert.Wendes, Gee Robert, this is the first time you mentioned the kernel you were running on. Perhaps you could also mention which exact kernel (there are about 6 smp kernels in EL3), and what processors your a using. Nevertheless, I did a little investigation into the EL3 2.4.21 kernels and discovered a problem which will cause much grief if you are running the i686 hugemem kernel (say, on Dell PowerEdge 2650/2680 dual Xeons). Almost all 2.4 series kernels (in almost all distributions) above 2.4.18 have the following lines in /usr/src/linux-2.4/arch/i386/Makefile: ifdef CONFIG_MPENTIUMIII CFLAGS += -march=i686 endif ifdef CONFIG_MPENTIUM4 CFLAGS += -march=i686 endif EL3 kernels in the 2.4.21 series have the following: ifdef CONFIG_MPENTIUMIII CFLAGS += $(call check_gcc,-march=pentium3,-march=i686) endif ifdef CONFIG_MPENTIUM4 CFLAGS += $(call check_gcc,-march=pentium4,-march=i686) endif Also note that Redhat always adds the following: CFLAGS+=-freorder-blocks If you check the GCC 3.2.3 compiler that the kernel is compiled with (you can check this with 'cat /proc/version'), you will find that it honors the -march=pentium3 and -march=pentium4 flags, as well as the -freorder-blocks flag. If you check the gcc296 compiler (read /usr/src/linux-2.4/Documentation/Changes and understand that gcc3 will not necessarily compile a 2.4 kernel, but the 2.96 one should), you will find that gcc296 does not honor the -march=pentium3 nor the -march=pentium4 flags and, guess what?, it won't honor the -freorder-blocks flag either! Of course that means that the compiler recommended for compiling the stock 2.4.21 kernel will not compile a RH 2.4.21 kernel. Lookin in the kernel-2.4.spec file for the cited EL3 kernel, you will find BuildRequires: gcc >= 2.96-98, yet the gcc296 compiler that ships with EL3 (gcc 2.96-128) will not compile an ix86 kernel because it will not honor the -freorder-blocks flag. (If you look at the output of rpmbuild --rebuild -vv on the source rpm for this could, you won't believe the warning and crap that gets spewed out from compiling a 2.4 kernel with a gcc 3 compiler...) Now, back to the -march flags. If you have a Pentium 4, just about any Pentium 4, anaconda will install the i686-hugemem (smp) kernel as the default boot kernel. The i686-hugemem kernel is the only kernel that has CONFIG_MPENTIUMIII in its .config file. (Strangley enough, this -march modification is not in any other RH kernel in the 2.4.21 series except for EL3 series even though CONFIG_MPENTIUMIII is set for bigmem (smp) kernels in other (RH 7.x, 9, ...) kernels.) Another difference is that all EL3 kernels are compiled -g for debug. Yet the LiS and strinet kernel modules are not. Other RH release have a kernel compiled with -g (the 2.4.blah-blah.debug kernel) and the other boot kernels compiled without -g. Under EL3, *all* kernels are compiled -g (I suppose so RH can support 'em). That can be a non-architecture related problem, particularly if gcc 3 enhances linkages to support debugging. It's no surprise that this doesn't happen with other RH releases. So, now, you likely have a kernel compiled for -g -march=pentium3 -freorder-blocks and a LiS STREAMS package and strinet driver compiled for -march=i686 without -g nor -freorder-blocks. I don't even know what -freorder-blocks does ('cause its not in the gcc manual): is that a RedHat thing? Kernels and kernel modules compiled with different primary compiler flags will result in a kernel that crahses mysteriously and is difficult to impossible to debug. Differences in -g and -march combined are likely enough to kill. We had this problem a year or two ago until we found the -mpreferred-stack-boundary-2 problem with gcc 2.96 vs. gcc 2.95.3. If you look in the SRPM you will find the change in the linux-2.4.21-selected-ac-bits.patch, and I suppose the ac means Alan Cox. The -g CFLAGS change is in the spec changelog under Rik van Riel. So there you go. Because even a stock 2.4.25 kernel does not have this change, the problem looks isolated to EL3. Having gone through the -mpreffered-stack-boundary=2 grief myself, I can sympathize with your plight. However, it is not LiS, strinet or libxnet that is causing your grief, it is the vendor of your overpriced kernel that you have to thank for mucking with architecture flags. I will stick some checks in the autoconf macros to see if I can fix this in the longer term. I had some reports of mysterious strtst failures on dual-Xeon a number of months ago and was mysteriously failing strtst on WhiteBox myself. The short term workaround is as follows: After doing a ./configure on the LiS/strxnet package look in the top level Makefile for KERNEL_CFLAGS. It will look something like KERNEL_CFLAGS = -Wall -Wno-trigraphs -Werror -O2 \ -fomit-frame-pointer -fno-strict-aliasing \ -fno-common -pipe -mpreferred-stack-boundary=2 \ -march=i686 Edit this line to add things like -g -freorder-blocks and change -march=i686 to -march=pentium3 depending on your exact configuration. Then perform a make and make install as normal. If you run configure again, it will overwrite the Makefile and you will have to make the change again. I hope that solves your problem. --brian P.S. Please, please, please: detailed bug reports are invaluable. Do not report kernel crashes without the complete output from ./configure and the generated config.log. I have wanted to put a bugreport script into the package for a while and I will soon do that so that a bugreport template will be generated with all the information that can be automatically gleened filled in. But, until then, please provide as much information as possible. To avoid excessively sized mails to this list, you can send attachments directly to me, or join the [EMAIL PROTECTED] mailing list and post them there. On Tue, 22 Feb 2005, [EMAIL PROTECTED] wrote: > Dear All, > > I've been in communication with Brian Bidulock and Christian Hildner > regarding problems with XTI. > > This is to record my thoughts at present, because I don't know where its > going, and perhaps some other poor soul will benefit from my experience. > > I am using Redhat Linux with a 2.4.21-27.0.2.ELsmp kernel and dual Intel > processors. > > I can configure, make and install the driver. I can run the test programs. > The original GCOM site indicates that the driver is o.k. with SMP, which > would tend to be supported by the test programs. > > My legacy program from AIX uses tcp. It compiles against the driver. It > doesn't run. > The sample programs from HOB compiled after changes to the include files. I > think that's all I did, but when it ran > it crashed the kernel on transferring data. > > My thought process has worked as follows:- > > 1 Are there any test programs that come with the driver which do work in > this situation. > a) test-xnet, superficially seems representative. GDB wouldn't 'follow a > child fork' so it was down > to taking a long look at the code. Although this code does execute the > XTI interface, it does so over a pipe > and as a result none of the data structure need to be populated for it > to run. > So its not very representative, and not really a comprehensive test > for any transport other than a pipe. In the > here and now, it would be a brave person who would say thattcp was not > a driving force. > b) test-inet_tcp also seemed a good candidate, but inspection of the code > (few comments as ever) revealed that > it wasn't what I was looking for being more streams based. > c) I couldn't find anything else that fitted the bill. > > 2. I'm sure that the test programs are comprehensive in their own way, but > they're certainly not all encompassing, and don#t cover tcp over XTI. > 3. Because the test programs work on my SMP box, it might lead me to > believe that the driver software was sound. > 4. My belief now is that the driver isn't sound, because no user space > program should bring the system down. > 5. How do I report it? How do I report what! Unless someone has the > time/machine to debug the driver it is unlilely to get fixed, even if I > could offer more information about the bug which I can't. I could spend > project time debugging the driver, but it is becoming more difficult to > justify the amount of time I am spending not getting > my legacy code going. > 6. Having spent a fair few days believing that XTI would work on this > Driver I felt that my project would have been better served by just > converting the legacy software to sockets. > 7. O.k., so as Christian says its beta, but I've used beta stuff before, > and at that stage you would expect it to > be a little better than crashing the system. > > At a crossroads... is XTI too old hat? even though this driver software > isn't finished? > > > > _______________________________________________ > Linux-streams mailing list > Linux-streams@gsyc.escet.urjc.es > http://gsyc.escet.urjc.es/mailman/listinfo/linux-streams -- Brian F. G. Bidulock ¦ The reasonable man adapts himself to the ¦ [EMAIL PROTECTED] ¦ world; the unreasonable one persists in ¦ http://www.openss7.org/ ¦ trying to adapt the world to himself. ¦ ¦ Therefore all progress depends on the ¦ ¦ unreasonable man. -- George Bernard Shaw ¦ _______________________________________________ Linux-streams mailing list Linux-streams@gsyc.escet.urjc.es http://gsyc.escet.urjc.es/mailman/listinfo/linux-streams _______________________________________________ Linux-streams mailing list Linux-streams@gsyc.escet.urjc.es http://gsyc.escet.urjc.es/mailman/listinfo/linux-streams