Re: More on BTX halted / crashes trying to use -stable /boot/loader

2000-12-07 Thread Jim Browne

At 16:02 -0800 12/7/00, Jim Browne wrote:
When TFTP tries to open a file, it is expecting struct open_file 
member f_devdata to be a pointer to a socket number.  When currdev 
is "pxe", that assumption is correct.  When currdev is "disk*", that 
assumption is incorrect.  Specifically, tftp.c does:

tftpfile-iodesc = io = socktodesc(*(int *) (f-f_devdata));

In my case, that often winds up making tftpfile-iodesc = 0.  That 
parameter is later passed in tftp_makereq to sendrecv as the iodesc, 
which via sendudp (and possibly the ARP functions) winds up calling 
netif_put.  netif_put derefs the bogus iodesc to get a function 
pointer for the put function of the network interface and calls it. 
WHAM.  QED. :)

How does this look?

*** tftp.c  Thu Dec  7 16:20:02 2000
--- tftp2.c Thu Dec  7 16:20:55 2000
*** tftp_open(path, f)
*** 257,260 
--- 257,262 
 
 tftpfile-iodesc = io = socktodesc(*(int *) (f-f_devdata));
+   if (io == NULL)
+   return (EINVAL);
 io-destip = servip;
 tftpfile-off = 0;

(I suppose I could have included this earlier.  Ugh.)

Jim Browne[EMAIL PROTECTED]
"We lost our lease.  You lose culture" - sign on SF Arts Comission Bldg


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: More on BTX halted / crashes trying to use -stable /boot/loader

2000-12-07 Thread Mike Smith


This is probably an OK workaround.  I think that there's something 
fundamentally wrong with the 'net' filesystems getting called for an open 
against a disk device, but I've paged out all the libstand state and I 
can't get it back fast enough to comment more usefully. 8(

BTW Jim, the stuff you're working on sounds really cool.  Thanks for 
taking it on!

 At 16:02 -0800 12/7/00, Jim Browne wrote:
 When TFTP tries to open a file, it is expecting struct open_file 
 member f_devdata to be a pointer to a socket number.  When currdev 
 is "pxe", that assumption is correct.  When currdev is "disk*", that 
 assumption is incorrect.  Specifically, tftp.c does:
 
 tftpfile-iodesc = io = socktodesc(*(int *) (f-f_devdata));
 
 In my case, that often winds up making tftpfile-iodesc = 0.  That 
 parameter is later passed in tftp_makereq to sendrecv as the iodesc, 
 which via sendudp (and possibly the ARP functions) winds up calling 
 netif_put.  netif_put derefs the bogus iodesc to get a function 
 pointer for the put function of the network interface and calls it. 
 WHAM.  QED. :)
 
 How does this look?
 
 *** tftp.c  Thu Dec  7 16:20:02 2000
 --- tftp2.c Thu Dec  7 16:20:55 2000
 *** tftp_open(path, f)
 *** 257,260 
 --- 257,262 
  
  tftpfile-iodesc = io = socktodesc(*(int *) (f-f_devdata));
 +   if (io == NULL)
 +   return (EINVAL);
  io-destip = servip;
  tftpfile-off = 0;
 
 (I suppose I could have included this earlier.  Ugh.)
 
 Jim Browne[EMAIL PROTECTED]
 "We lost our lease.  You lose culture" - sign on SF Arts Comission Bldg
 

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
   V I C T O R Y   N O T   V E N G E A N C E




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: More on BTX halted / crashes trying to use -stable/boot/loader

2000-12-07 Thread Jim Browne

At 17:44 -0800 12/7/00, Mike Smith wrote:
This is probably an OK workaround.  I think that there's something
fundamentally wrong with the 'net' filesystems getting called for an open
against a disk device, but I've paged out all the libstand state and I
can't get it back fast enough to comment more usefully. 8(

A "better" thing to do would be for tftp_open to check the dv_type of 
the struct devsw member of struct open_file to see if it is a network 
device.  However, stand.h has a comment stating that the dv_type 
member of struct devsw is an "opaque type constant, arch-dependant". 
Since tftp.c is in the arch-neutral libstand(3), I figured it would 
be bad for tftp.c to gain knowledge of the "opaque" db_type field.

Regardless, the check I added should be there as it was an uncovered 
error condition.

BTW Jim, the stuff you're working on sounds really cool.  Thanks for
taking it on!

Apparently I am a glutton for punishment.  Expect more bugfix patches 
in the near future.

Jim Browne[EMAIL PROTECTED]
"We lost our lease.  You lose culture" - sign on SF Arts Comission Bldg


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: More on BTX halted / crashes trying to use -stable /boot/loader

2000-12-07 Thread Mike Smith

 Regardless, the check I added should be there as it was an uncovered 
 error condition.

... and I just realised I deleted your patch.  D'oh!

 BTW Jim, the stuff you're working on sounds really cool.  Thanks for
 taking it on!
 
 Apparently I am a glutton for punishment.  Expect more bugfix patches 
 in the near future.

Heh.  If you're interested in maintaining this, maybe we need to get you 
commit access...

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
   V I C T O R Y   N O T   V E N G E A N C E




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: More on BTX halted / crashes trying to use -stable /boot/loader

2000-12-07 Thread Matt Dillon

:I've already looked at this, investigating a problem reported in
:connection with PR 21559.  I'll probably sort it out in the next day
:or two, unless someone else gets there first.
:
:-- 
:Robert Nordier
:
:[EMAIL PROTECTED]
:[EMAIL PROTECTED]

That'd be great.  When you have a patch, if you email it to me I
will test it on a box that I know crashes on the currennt problem.

-Matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: More on BTX halted / crashes trying to use -stable /boot/loader

2000-12-07 Thread Robert Nordier

Matt Dillon wrote:
 
 I sure would appreciate it if one of the bootstrap gurus could take 
 a look at what happens when the tftp open routine is called from a 
 normal disk-based /boot/loader!
 
I've already looked at this, investigating a problem reported in
connection with PR 21559.  I'll probably sort it out in the next day
or two, unless someone else gets there first.

-- 
Robert Nordier

[EMAIL PROTECTED]
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: More on BTX halted / crashes trying to use -stable/boot/loader

2000-12-07 Thread Jim Browne

At 00:13 -0800 12/7/00, Mike Smith wrote:
  The option works wonderfully for /boot/pxeboot.  But it turns out
  that the normal /boot/loader, when compiled with the above
  option, will crash horribly whenever it tries to open() a file and
  can't find it in the UFS filesystem on disk... it falls through the
  filesystem list until it hits the tftp FS and BEWM.  Explosion.
 
  I sure would appreciate it if one of the bootstrap gurus could take
  a look at what happens when the tftp open routine is called from a
  normal disk-based /boot/loader!

Probably hits an uninitialised function vector; this would be a good
catch for someone looking to learn a bit about the loader and libstand.

Devsw "pxedisk" treats struct open_file member f_devdata as a pointer 
to a socket number[1].  Other devsw drivers treat f_devdata as a 
pointer to a struct i386_devdesc[2].

When you boot via PXE, sys/boot/i386/loader/main.c sets the current 
device to "pxedisk".  If you do not boot via PXE, your current device 
is likely to be some take on "disk".

When TFTP tries to open a file, it is expecting struct open_file 
member f_devdata to be a pointer to a socket number.  When currdev is 
"pxe", that assumption is correct.  When currdev is "disk*", that 
assumption is incorrect.  Specifically, tftp.c does:

tftpfile-iodesc = io = socktodesc(*(int *) (f-f_devdata));

In my case, that often winds up making tftpfile-iodesc = 0.  That 
parameter is later passed in tftp_makereq to sendrecv as the iodesc, 
which via sendudp (and possibly the ARP functions) winds up calling 
netif_put.  netif_put derefs the bogus iodesc to get a function 
pointer for the put function of the network interface and calls it. 
WHAM.  QED. :)

I happen to be knee deep in this code right now as I am adding two 
things: support for booting from a flash based FS and porting the 
netboot Ethernet drivers to work under libstand(3) so I can use 
loader(8) with an AMD LANCE compatible chip.  I was lurking until my 
code was finished, but your problem (which I was debugging today for 
my own configuration) is a good opportunity to speak up.

I think the correct solution is to not overload f_devdata.  Perhaps 
another field should be added to struct open_file specifically for a 
socket number and perhaps some error checking code is in order? :)

I have to have my code working yesterday, so I'll keep plugging along 
on a solution.  I'll email patches when finished.  However, there are 
others who are far more familiar with this code than I, so pointers 
are appreciated especially from Alpha aware people.  (I haven't even 
looked at the Alpha version of loader(8).)

[1] sys/boot/i386/libi386/pxe.c function pxe_open towards the bottom. 
Actually, pxe.c just overwrites what is likely a pointer to a 
i386_devdesc that was allocated by i386_parsedev (i.e. memory leak).
[2] sys/boot/i386/libi386/devicename.c function i386_parsedev

On a final note: why is netif_drivers defined in pxe.c rather than 
conf.c?  I'm currently working around that with a Makefile define, 
but I really think the defition of netif_drivers belongs in conf.c, 
especially if one is to have more than one netif_driver compiled into 
the binary (i.e. "pxe" and "ether")

Jim Browne[EMAIL PROTECTED]
"We lost our lease.  You lose culture" - sign on SF Arts Comission Bldg


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: More on BTX halted / crashes trying to use -stable /boot/loader

2000-12-06 Thread Mike Smith

 Ah ha!  I think I found the culprit, but I'm not sure exactly where
 in the tftp code the problem is occuring.
 
 I have been setting LOADER_TFTP_SUPPORT in /etc/make.conf so my pxeboot
 file uses tftp to get the kernel rather then NFS (since NFS appears to
 only be able to get [rootfs]:/kernel, which is the wrong kernel for a
 diskless boot).

This isn't the case; pxeboot will load whichever kernel you've 
specified in your loader config.

 The option works wonderfully for /boot/pxeboot.  But it turns out
 that the normal /boot/loader, when compiled with the above
 option, will crash horribly whenever it tries to open() a file and 
 can't find it in the UFS filesystem on disk... it falls through the
 filesystem list until it hits the tftp FS and BEWM.  Explosion.
 
 I sure would appreciate it if one of the bootstrap gurus could take 
 a look at what happens when the tftp open routine is called from a 
 normal disk-based /boot/loader!

Probably hits an uninitialised function vector; this would be a good 
catch for someone looking to learn a bit about the loader and libstand.

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
   V I C T O R Y   N O T   V E N G E A N C E




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: More on BTX halted / crashes trying to use -stable /boot/loader

2000-12-06 Thread Matt Dillon

Ah ha!  I think I found the culprit, but I'm not sure exactly where
in the tftp code the problem is occuring.

I have been setting LOADER_TFTP_SUPPORT in /etc/make.conf so my pxeboot
file uses tftp to get the kernel rather then NFS (since NFS appears to
only be able to get [rootfs]:/kernel, which is the wrong kernel for a
diskless boot).

The option works wonderfully for /boot/pxeboot.  But it turns out
that the normal /boot/loader, when compiled with the above
option, will crash horribly whenever it tries to open() a file and 
can't find it in the UFS filesystem on disk... it falls through the
filesystem list until it hits the tftp FS and BEWM.  Explosion.

I sure would appreciate it if one of the bootstrap gurus could take 
a look at what happens when the tftp open routine is called from a 
normal disk-based /boot/loader!

-Matt


:I've experimented a bit more.  If I do an installworld and reboot,
:the machine crashes in one of two ways (randomly):
:
:#1 Crashes with a BTX error
:
:   int 5 err 0 efl 00010206 eip 0012
:   eax 0039 ebx 00023920 ecx 00023934 edx 
:   esi  edi 000c ebp 000943c8 esp 000943cc
:   cs 002b ds 0033 es 0033 fs 0033 gs 0033 ss 0033
:   cs:eip  62 00 00 00 e8 05 04 00 00 90 31 c0 cd 30 58 01
:   ss:esp  1c 8a 01 00 00 00 00 00 6c 44 09 00 1a 00 00 00
:
:#2 Loader has all sorts of 'can't find file BLAH' errors, stack undeflow
:errors, and winds up at an 'ok ' prompt.
:
:Trying to run commands from the prompt sometimes work, sometimes return
:a 'stack underflow' error.
:
:--
:
:That's with the latest -stable /boot/loader.
:
:If I take that machine and net-boot it, then mount / and copy a
:/boot/loader from March 20th, then reboot the machine, the machine
:now boots just fine.
:
:If I put the -stable /boot/loader back into /boot, the machine dies.
:If I put the March 20th /boot/loader in, the machine boots just fine.
:
:Anybody have any ideas?  What happened to /boot/loader between March
:and now?  I am at a loss.
:
:-r-xr-xr-x   1 root  wheel  163840 Dec  6 17:33 loader.NEW
:-r-xr-xr-x   1 root  wheel  143360 Dec  6 17:47 loader.OLD
:
:   -Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message