Re: More on BTX halted / crashes trying to use -stable /boot/loader
> Regardless, the check I added should be there as it was an uncovered > error condition. ... and I just realised I deleted your patch. D'oh! > >BTW Jim, the stuff you're working on sounds really cool. Thanks for > >taking it on! > > Apparently I am a glutton for punishment. Expect more bugfix patches > in the near future. Heh. If you're interested in maintaining this, maybe we need to get you commit access... -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: More on BTX halted / crashes trying to use -stable/boot/loader
At 17:44 -0800 12/7/00, Mike Smith wrote: >This is probably an OK workaround. I think that there's something >fundamentally wrong with the 'net' filesystems getting called for an open >against a disk device, but I've paged out all the libstand state and I >can't get it back fast enough to comment more usefully. 8( A "better" thing to do would be for tftp_open to check the dv_type of the struct devsw member of struct open_file to see if it is a network device. However, stand.h has a comment stating that the dv_type member of struct devsw is an "opaque type constant, arch-dependant". Since tftp.c is in the arch-neutral libstand(3), I figured it would be bad for tftp.c to gain knowledge of the "opaque" db_type field. Regardless, the check I added should be there as it was an uncovered error condition. >BTW Jim, the stuff you're working on sounds really cool. Thanks for >taking it on! Apparently I am a glutton for punishment. Expect more bugfix patches in the near future. Jim Browne[EMAIL PROTECTED] "We lost our lease. You lose culture" - sign on SF Arts Comission Bldg To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: More on BTX halted / crashes trying to use -stable /boot/loader
This is probably an OK workaround. I think that there's something fundamentally wrong with the 'net' filesystems getting called for an open against a disk device, but I've paged out all the libstand state and I can't get it back fast enough to comment more usefully. 8( BTW Jim, the stuff you're working on sounds really cool. Thanks for taking it on! > At 16:02 -0800 12/7/00, Jim Browne wrote: > >When TFTP tries to open a file, it is expecting struct open_file > >member f_devdata to be a pointer to a socket number. When currdev > >is "pxe", that assumption is correct. When currdev is "disk*", that > >assumption is incorrect. Specifically, tftp.c does: > > > >tftpfile->iodesc = io = socktodesc(*(int *) (f->f_devdata)); > > > >In my case, that often winds up making tftpfile->iodesc = 0. That > >parameter is later passed in tftp_makereq to sendrecv as the iodesc, > >which via sendudp (and possibly the ARP functions) winds up calling > >netif_put. netif_put derefs the bogus iodesc to get a function > >pointer for the put function of the network interface and calls it. > >WHAM. QED. :) > > How does this look? > > *** tftp.c Thu Dec 7 16:20:02 2000 > --- tftp2.c Thu Dec 7 16:20:55 2000 > *** tftp_open(path, f) > *** 257,260 > --- 257,262 > > tftpfile->iodesc = io = socktodesc(*(int *) (f->f_devdata)); > + if (io == NULL) > + return (EINVAL); > io->destip = servip; > tftpfile->off = 0; > > (I suppose I could have included this earlier. Ugh.) > > Jim Browne[EMAIL PROTECTED] > "We lost our lease. You lose culture" - sign on SF Arts Comission Bldg > -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: More on BTX halted / crashes trying to use -stable /boot/loader
At 16:02 -0800 12/7/00, Jim Browne wrote: >When TFTP tries to open a file, it is expecting struct open_file >member f_devdata to be a pointer to a socket number. When currdev >is "pxe", that assumption is correct. When currdev is "disk*", that >assumption is incorrect. Specifically, tftp.c does: > >tftpfile->iodesc = io = socktodesc(*(int *) (f->f_devdata)); > >In my case, that often winds up making tftpfile->iodesc = 0. That >parameter is later passed in tftp_makereq to sendrecv as the iodesc, >which via sendudp (and possibly the ARP functions) winds up calling >netif_put. netif_put derefs the bogus iodesc to get a function >pointer for the put function of the network interface and calls it. >WHAM. QED. :) How does this look? *** tftp.c Thu Dec 7 16:20:02 2000 --- tftp2.c Thu Dec 7 16:20:55 2000 *** tftp_open(path, f) *** 257,260 --- 257,262 tftpfile->iodesc = io = socktodesc(*(int *) (f->f_devdata)); + if (io == NULL) + return (EINVAL); io->destip = servip; tftpfile->off = 0; (I suppose I could have included this earlier. Ugh.) Jim Browne[EMAIL PROTECTED] "We lost our lease. You lose culture" - sign on SF Arts Comission Bldg To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: More on BTX halted / crashes trying to use -stable/boot/loader
At 00:13 -0800 12/7/00, Mike Smith wrote: > > The option works wonderfully for /boot/pxeboot. But it turns out > > that the normal /boot/loader, when compiled with the above > > option, will crash horribly whenever it tries to open() a file and > > can't find it in the UFS filesystem on disk... it falls through the > > filesystem list until it hits the tftp FS and BEWM. Explosion. > > > > I sure would appreciate it if one of the bootstrap gurus could take > > a look at what happens when the tftp open routine is called from a > > normal disk-based /boot/loader! > >Probably hits an uninitialised function vector; this would be a good >catch for someone looking to learn a bit about the loader and libstand. Devsw "pxedisk" treats struct open_file member f_devdata as a pointer to a socket number[1]. Other devsw drivers treat f_devdata as a pointer to a struct i386_devdesc[2]. When you boot via PXE, sys/boot/i386/loader/main.c sets the current device to "pxedisk". If you do not boot via PXE, your current device is likely to be some take on "disk". When TFTP tries to open a file, it is expecting struct open_file member f_devdata to be a pointer to a socket number. When currdev is "pxe", that assumption is correct. When currdev is "disk*", that assumption is incorrect. Specifically, tftp.c does: tftpfile->iodesc = io = socktodesc(*(int *) (f->f_devdata)); In my case, that often winds up making tftpfile->iodesc = 0. That parameter is later passed in tftp_makereq to sendrecv as the iodesc, which via sendudp (and possibly the ARP functions) winds up calling netif_put. netif_put derefs the bogus iodesc to get a function pointer for the put function of the network interface and calls it. WHAM. QED. :) I happen to be knee deep in this code right now as I am adding two things: support for booting from a flash based FS and porting the netboot Ethernet drivers to work under libstand(3) so I can use loader(8) with an AMD LANCE compatible chip. I was lurking until my code was finished, but your problem (which I was debugging today for my own configuration) is a good opportunity to speak up. I think the correct solution is to not overload f_devdata. Perhaps another field should be added to struct open_file specifically for a socket number and perhaps some error checking code is in order? :) I have to have my code working yesterday, so I'll keep plugging along on a solution. I'll email patches when finished. However, there are others who are far more familiar with this code than I, so pointers are appreciated especially from Alpha aware people. (I haven't even looked at the Alpha version of loader(8).) [1] sys/boot/i386/libi386/pxe.c function pxe_open towards the bottom. Actually, pxe.c just overwrites what is likely a pointer to a i386_devdesc that was allocated by i386_parsedev (i.e. memory leak). [2] sys/boot/i386/libi386/devicename.c function i386_parsedev On a final note: why is netif_drivers defined in pxe.c rather than conf.c? I'm currently working around that with a Makefile define, but I really think the defition of netif_drivers belongs in conf.c, especially if one is to have more than one netif_driver compiled into the binary (i.e. "pxe" and "ether") Jim Browne[EMAIL PROTECTED] "We lost our lease. You lose culture" - sign on SF Arts Comission Bldg To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: More on BTX halted / crashes trying to use -stable /boot/loader
:I've already looked at this, investigating a problem reported in :connection with PR 21559. I'll probably sort it out in the next day :or two, unless someone else gets there first. : :-- :Robert Nordier : :[EMAIL PROTECTED] :[EMAIL PROTECTED] That'd be great. When you have a patch, if you email it to me I will test it on a box that I know crashes on the currennt problem. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: More on BTX halted / crashes trying to use -stable /boot/loader
Matt Dillon wrote: > I sure would appreciate it if one of the bootstrap gurus could take > a look at what happens when the tftp open routine is called from a > normal disk-based /boot/loader! I've already looked at this, investigating a problem reported in connection with PR 21559. I'll probably sort it out in the next day or two, unless someone else gets there first. -- Robert Nordier [EMAIL PROTECTED] [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: More on BTX halted / crashes trying to use -stable /boot/loader
> Ah ha! I think I found the culprit, but I'm not sure exactly where > in the tftp code the problem is occuring. > > I have been setting LOADER_TFTP_SUPPORT in /etc/make.conf so my pxeboot > file uses tftp to get the kernel rather then NFS (since NFS appears to > only be able to get [rootfs]:/kernel, which is the wrong kernel for a > diskless boot). This isn't the case; pxeboot will load whichever kernel you've specified in your loader config. > The option works wonderfully for /boot/pxeboot. But it turns out > that the normal /boot/loader, when compiled with the above > option, will crash horribly whenever it tries to open() a file and > can't find it in the UFS filesystem on disk... it falls through the > filesystem list until it hits the tftp FS and BEWM. Explosion. > > I sure would appreciate it if one of the bootstrap gurus could take > a look at what happens when the tftp open routine is called from a > normal disk-based /boot/loader! Probably hits an uninitialised function vector; this would be a good catch for someone looking to learn a bit about the loader and libstand. -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: More on BTX halted / crashes trying to use -stable /boot/loader
Ah ha! I think I found the culprit, but I'm not sure exactly where in the tftp code the problem is occuring. I have been setting LOADER_TFTP_SUPPORT in /etc/make.conf so my pxeboot file uses tftp to get the kernel rather then NFS (since NFS appears to only be able to get [rootfs]:/kernel, which is the wrong kernel for a diskless boot). The option works wonderfully for /boot/pxeboot. But it turns out that the normal /boot/loader, when compiled with the above option, will crash horribly whenever it tries to open() a file and can't find it in the UFS filesystem on disk... it falls through the filesystem list until it hits the tftp FS and BEWM. Explosion. I sure would appreciate it if one of the bootstrap gurus could take a look at what happens when the tftp open routine is called from a normal disk-based /boot/loader! -Matt :I've experimented a bit more. If I do an installworld and reboot, :the machine crashes in one of two ways (randomly): : :#1 Crashes with a BTX error : : int 5 err 0 efl 00010206 eip 0012 : eax 0039 ebx 00023920 ecx 00023934 edx : esi edi 000c ebp 000943c8 esp 000943cc : cs 002b ds 0033 es 0033 fs 0033 gs 0033 ss 0033 : cs:eip 62 00 00 00 e8 05 04 00 00 90 31 c0 cd 30 58 01 : ss:esp 1c 8a 01 00 00 00 00 00 6c 44 09 00 1a 00 00 00 : :#2 Loader has all sorts of 'can't find file BLAH' errors, stack undeflow :errors, and winds up at an 'ok ' prompt. : :Trying to run commands from the prompt sometimes work, sometimes return :a 'stack underflow' error. : :-- : :That's with the latest -stable /boot/loader. : :If I take that machine and net-boot it, then mount / and copy a :/boot/loader from March 20th, then reboot the machine, the machine :now boots just fine. : :If I put the -stable /boot/loader back into /boot, the machine dies. :If I put the March 20th /boot/loader in, the machine boots just fine. : :Anybody have any ideas? What happened to /boot/loader between March :and now? I am at a loss. : :-r-xr-xr-x 1 root wheel 163840 Dec 6 17:33 loader.NEW :-r-xr-xr-x 1 root wheel 143360 Dec 6 17:47 loader.OLD : : -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message