Re: Odd crash

2000-03-16 Thread Warner Losh

In message <[EMAIL PROTECTED]> John Baldwin writes:
: > arpintr+0x85:   pushl   $0xc02f5c60
: > arpintr+0x8a:   pushl   $0x3
: > arpintr+0x8c:   calllog
: > arpintr+0x91:   addl$0x8,%esp
: > arpintr+0x94:   jmp arpintr+0x5
: > arpintr+0x99:   leal0(%esi),%esi
: 
: This instruction does nothing, so I assume this isn't
: optimized code?

This is just padding to make the branch targets come out to a given
alignment.

I'm still at a loss for how it even works at all...

Warner


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



RE: Odd crash

2000-03-16 Thread John Baldwin


On 15-Mar-00 Warner Losh wrote:
> 
> I just got an odd crash:
> 
> Fatal trap 12: page fault while in kernel mode
> fault virtual address   = 0x8
> fault code  = supervisor read, page not present
> instruction pointer = 0x8:0xc01d16ac
> stack pointer   = 0x10:0xc031e704
> frame pointer   = 0x10:0xc031e70c
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, def32 1, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = Idle
> interrupt mask  = 
> kernel: type 12 trap, code=0
> Stopped at  arpintr+0x9c:   movl0x8(%ebx),%ecx
> db> trace
> arpintr(c02a997b,0,10,10,c5d20010) at arpintr+0x9c
> swi_net_next() at swi_net_next
> db>
> 
> I'm using the realtek driver with a RealTek 8139 built into the SBC
> that I have sitting on my desk.
> 
> rl0:  port 0x6000-0x60ff mem 0xf900-0xf9ff irq 11 
>at device 6.0 on
> pci0
> rl0: Ethernet address: 00:60:e0:00:7f:c8
> 
> Looking at the disassembled output of ddb, I think that I'm crashing
> at the following place.
> if (m->m_len < sizeof(struct arphdr) &&
> (m = m_pullup(m, sizeof(struct arphdr)) == NULL)) {
>   log(LOG_ERR, "arp: runt packet -- m_pullup failed.");
>   continue;
>   }
>   ar = mtod(m, struct arphdr *);
> 
> ==>   if (ntohs(ar->ar_hrd) != ARPHRD_ETHER
>   && ntohs(ar->ar_hrd) != ARPHRD_IEEE802) {
>   log(LOG_ERR,
>   "arp: unknown hardware address format (%2D)",
>   (unsigned char *)&ar->ar_hrd, "");
>   m_freem(m);
>   continue;
>   }
> 
> since ar is NULL for some reason.  I have no clue at all why this
> would happen.  This means that m->m_data has to be NULL.  But that
> doesn't make sense because of the m_pullup just before this.  If it
> doesn't return NULL, then I thought that m->m_data was guaranteed to
> be valid.
> 
> I think that there might be a bug in the code generation, but I don't
> know for sure.  If we look at the disassembled output:
> 
> arpintr+0x79:   testl   %eax,%eax
> arpintr+0x7b:   setz%al
> arpintr+0x7e:   movzbl  %al,%ebx
> arpintr+0x81:   testl   %ebx,%ebx
> arpintr+0x83:   jz  arpintr+0x9c

Functionally, apart from spamming %ebx, these 5 instructions
are equivalent to:

  testl %eax, %eax
  jnz   arpintr+0x9c

> arpintr+0x85:   pushl   $0xc02f5c60
> arpintr+0x8a:   pushl   $0x3
> arpintr+0x8c:   calllog
> arpintr+0x91:   addl$0x8,%esp
> arpintr+0x94:   jmp arpintr+0x5
> arpintr+0x99:   leal0(%esi),%esi

This instruction does nothing, so I assume this isn't
optimized code?

> arpintr+0x9c:   movl0x8(%ebx),%ecx
> arpintr+0x9f:   movzwl  0(%ecx),%eax
> arpintr+0xa2:   xchgb   %ah,%al
> arpintr+0xa4:   cmpw$0x1,%ax
> arpintr+0xa8:   jz  arpintr+0xd8
> arpintr+0xaa:   movzwl  0(%ecx),%eax
> arpintr+0xad:   xchgb   %ah,%al
> arpintr+0xaf:   cmpw$0x6,%ax
> arpintr+0xb3:   jz  arpintr+0xd8
> arpintr+0xb5:   pushl   $0xc02f5c0e
> arpintr+0xba:   pushl   %ecx
> arpintr+0xbb:   pushl   $0xc02f5ca0
> arpintr+0xc0:   pushl   $0x3
> arpintr+0xc2:   calllog
> 
> So we're between the two log calls, which is good.  Notice that we
> effectively zero %ebx at 7e.  We then jump to 9c if it isss zero, and
> then dereference 3bx.  Bang, we're dead.I think that the jz should
> be a jnz, no?

It looks like the compiler is making bad assumptions and/or trashing
%ebx.

 testl %eax,%eax   ; if %eax == 0, ZF = 1, else ZF = 0
 setz %al  ; if ZF, %al = 1, else %al = 0, so
   ; %al = !%eax
 movzbl %al, %ebx  ; %ebx = zero sign extend of %al
   ; so %ebx == 0 iff %eax != 0

So, %ebx is 0 (zero) if %eax != 0.  If %eax = m, then
%ebx is zero, and the jump is taken if %eax != NULL, i.e.
m != NULL, so that code generation is correct wrt to the if()
statement at least.  However, the stuff below that bothers me: 

  lea (%esi),%esi  ; basically does %esi = %esi

This probably is the

  'ar = mtod(m, struct arphdr *);'

In which case, if this is accurate, then %esi = ar,
and it should be:

  mov $8(%esi), %ecx  ; note %esi instead of %ebx

Also, if that is the case, then the jz in question
should jump to the lea instruction instead of the
mov instruction it faulted at.  It seems that the
compiler is assuming that %ebx = m, when in fact
%ebx != m, but is the boolean result of m != NULL.

I also

Re: Odd crash

2000-03-15 Thread Joerg Micheel

On Wed, Mar 15, 2000 at 04:46:02PM -0700, Warner Losh wrote:
> 
> I just got an odd crash:
> 
> Fatal trap 12: page fault while in kernel mode
> fault virtual address   = 0x8
> fault code  = supervisor read, page not present
> instruction pointer = 0x8:0xc01d16ac
> stack pointer   = 0x10:0xc031e704
> frame pointer   = 0x10:0xc031e70c
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, def32 1, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = Idle
> interrupt mask  = 
> kernel: type 12 trap, code=0
> Stopped at  arpintr+0x9c:   movl0x8(%ebx),%ecx
> db> trace
> arpintr(c02a997b,0,10,10,c5d20010) at arpintr+0x9c
> swi_net_next() at swi_net_next
> db>

I'm chasing a similiar bug since 2 weeks. The kernel crashed
with a double page fault and access to a very low address
(something like 0x0098). I eliminated the RealTek card
(replaced by a 3Com) and things got better. But it kept crashing.
Now I have disabled SMP and things happen even more rarely (once
every 2 days instead of once every 3-4 hours). This is a very fast
machine (733) potentially with chipsets not supported too well.
I have a few crash dumps I could people have a look at, at least
send the output of kdgb.

If anyone is interested, I may be able to provide access to
the machine and the crash dumps (you can't ftp them, they
are 1GB+ in size, it would cost me NZ$250).

Joerg
-- 
Joerg B. MicheelEmail: <[EMAIL PROTECTED]>
Waikato Applied Network DynamicsPhone: +64 7 8384794
The University of Waikato, CompScience  Fax:   +64 7 8384155
Private Bag 3105Pager: +64 868 38222
Hamilton, New Zealand   Plan:  TINE and the DAG's


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Odd crash

2000-03-15 Thread Warner Losh


I just got an odd crash:

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x8
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc01d16ac
stack pointer   = 0x10:0xc031e704
frame pointer   = 0x10:0xc031e70c
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = Idle
interrupt mask  = 
kernel: type 12 trap, code=0
Stopped at  arpintr+0x9c:   movl0x8(%ebx),%ecx
db> trace
arpintr(c02a997b,0,10,10,c5d20010) at arpintr+0x9c
swi_net_next() at swi_net_next
db>

I'm using the realtek driver with a RealTek 8139 built into the SBC
that I have sitting on my desk.

rl0:  port 0x6000-0x60ff mem 0xf900-0xf9ff irq 11 
at device 6.0 on pci0
rl0: Ethernet address: 00:60:e0:00:7f:c8

Looking at the disassembled output of ddb, I think that I'm crashing
at the following place.
if (m->m_len < sizeof(struct arphdr) &&
(m = m_pullup(m, sizeof(struct arphdr)) == NULL)) {
log(LOG_ERR, "arp: runt packet -- m_pullup failed.");
continue;
}
ar = mtod(m, struct arphdr *);

==> if (ntohs(ar->ar_hrd) != ARPHRD_ETHER
&& ntohs(ar->ar_hrd) != ARPHRD_IEEE802) {
log(LOG_ERR,
"arp: unknown hardware address format (%2D)",
(unsigned char *)&ar->ar_hrd, "");
m_freem(m);
continue;
}

since ar is NULL for some reason.  I have no clue at all why this
would happen.  This means that m->m_data has to be NULL.  But that
doesn't make sense because of the m_pullup just before this.  If it
doesn't return NULL, then I thought that m->m_data was guaranteed to
be valid.

I think that there might be a bug in the code generation, but I don't
know for sure.  If we look at the disassembled output:

arpintr+0x79:   testl   %eax,%eax
arpintr+0x7b:   setz%al
arpintr+0x7e:   movzbl  %al,%ebx
arpintr+0x81:   testl   %ebx,%ebx
arpintr+0x83:   jz  arpintr+0x9c
arpintr+0x85:   pushl   $0xc02f5c60
arpintr+0x8a:   pushl   $0x3
arpintr+0x8c:   calllog
arpintr+0x91:   addl$0x8,%esp
arpintr+0x94:   jmp arpintr+0x5
arpintr+0x99:   leal0(%esi),%esi
arpintr+0x9c:   movl0x8(%ebx),%ecx
arpintr+0x9f:   movzwl  0(%ecx),%eax
arpintr+0xa2:   xchgb   %ah,%al
arpintr+0xa4:   cmpw$0x1,%ax
arpintr+0xa8:   jz  arpintr+0xd8
arpintr+0xaa:   movzwl  0(%ecx),%eax
arpintr+0xad:   xchgb   %ah,%al
arpintr+0xaf:   cmpw$0x6,%ax
arpintr+0xb3:   jz  arpintr+0xd8
arpintr+0xb5:   pushl   $0xc02f5c0e
arpintr+0xba:   pushl   %ecx
arpintr+0xbb:   pushl   $0xc02f5ca0
arpintr+0xc0:   pushl   $0x3
arpintr+0xc2:   calllog

So we're between the two log calls, which is good.  Notice that we
effectively zero %ebx at 7e.  We then jump to 9c if it isss zero, and
then dereference 3bx.  Bang, we're dead.I think that the jz should
be a jnz, no?

Warner


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message