Re: Odd crash
In message <[EMAIL PROTECTED]> John Baldwin writes: : > arpintr+0x85: pushl $0xc02f5c60 : > arpintr+0x8a: pushl $0x3 : > arpintr+0x8c: calllog : > arpintr+0x91: addl$0x8,%esp : > arpintr+0x94: jmp arpintr+0x5 : > arpintr+0x99: leal0(%esi),%esi : : This instruction does nothing, so I assume this isn't : optimized code? This is just padding to make the branch targets come out to a given alignment. I'm still at a loss for how it even works at all... Warner To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
RE: Odd crash
On 15-Mar-00 Warner Losh wrote: > > I just got an odd crash: > > Fatal trap 12: page fault while in kernel mode > fault virtual address = 0x8 > fault code = supervisor read, page not present > instruction pointer = 0x8:0xc01d16ac > stack pointer = 0x10:0xc031e704 > frame pointer = 0x10:0xc031e70c > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = Idle > interrupt mask = > kernel: type 12 trap, code=0 > Stopped at arpintr+0x9c: movl0x8(%ebx),%ecx > db> trace > arpintr(c02a997b,0,10,10,c5d20010) at arpintr+0x9c > swi_net_next() at swi_net_next > db> > > I'm using the realtek driver with a RealTek 8139 built into the SBC > that I have sitting on my desk. > > rl0: port 0x6000-0x60ff mem 0xf900-0xf9ff irq 11 >at device 6.0 on > pci0 > rl0: Ethernet address: 00:60:e0:00:7f:c8 > > Looking at the disassembled output of ddb, I think that I'm crashing > at the following place. > if (m->m_len < sizeof(struct arphdr) && > (m = m_pullup(m, sizeof(struct arphdr)) == NULL)) { > log(LOG_ERR, "arp: runt packet -- m_pullup failed."); > continue; > } > ar = mtod(m, struct arphdr *); > > ==> if (ntohs(ar->ar_hrd) != ARPHRD_ETHER > && ntohs(ar->ar_hrd) != ARPHRD_IEEE802) { > log(LOG_ERR, > "arp: unknown hardware address format (%2D)", > (unsigned char *)&ar->ar_hrd, ""); > m_freem(m); > continue; > } > > since ar is NULL for some reason. I have no clue at all why this > would happen. This means that m->m_data has to be NULL. But that > doesn't make sense because of the m_pullup just before this. If it > doesn't return NULL, then I thought that m->m_data was guaranteed to > be valid. > > I think that there might be a bug in the code generation, but I don't > know for sure. If we look at the disassembled output: > > arpintr+0x79: testl %eax,%eax > arpintr+0x7b: setz%al > arpintr+0x7e: movzbl %al,%ebx > arpintr+0x81: testl %ebx,%ebx > arpintr+0x83: jz arpintr+0x9c Functionally, apart from spamming %ebx, these 5 instructions are equivalent to: testl %eax, %eax jnz arpintr+0x9c > arpintr+0x85: pushl $0xc02f5c60 > arpintr+0x8a: pushl $0x3 > arpintr+0x8c: calllog > arpintr+0x91: addl$0x8,%esp > arpintr+0x94: jmp arpintr+0x5 > arpintr+0x99: leal0(%esi),%esi This instruction does nothing, so I assume this isn't optimized code? > arpintr+0x9c: movl0x8(%ebx),%ecx > arpintr+0x9f: movzwl 0(%ecx),%eax > arpintr+0xa2: xchgb %ah,%al > arpintr+0xa4: cmpw$0x1,%ax > arpintr+0xa8: jz arpintr+0xd8 > arpintr+0xaa: movzwl 0(%ecx),%eax > arpintr+0xad: xchgb %ah,%al > arpintr+0xaf: cmpw$0x6,%ax > arpintr+0xb3: jz arpintr+0xd8 > arpintr+0xb5: pushl $0xc02f5c0e > arpintr+0xba: pushl %ecx > arpintr+0xbb: pushl $0xc02f5ca0 > arpintr+0xc0: pushl $0x3 > arpintr+0xc2: calllog > > So we're between the two log calls, which is good. Notice that we > effectively zero %ebx at 7e. We then jump to 9c if it isss zero, and > then dereference 3bx. Bang, we're dead.I think that the jz should > be a jnz, no? It looks like the compiler is making bad assumptions and/or trashing %ebx. testl %eax,%eax ; if %eax == 0, ZF = 1, else ZF = 0 setz %al ; if ZF, %al = 1, else %al = 0, so ; %al = !%eax movzbl %al, %ebx ; %ebx = zero sign extend of %al ; so %ebx == 0 iff %eax != 0 So, %ebx is 0 (zero) if %eax != 0. If %eax = m, then %ebx is zero, and the jump is taken if %eax != NULL, i.e. m != NULL, so that code generation is correct wrt to the if() statement at least. However, the stuff below that bothers me: lea (%esi),%esi ; basically does %esi = %esi This probably is the 'ar = mtod(m, struct arphdr *);' In which case, if this is accurate, then %esi = ar, and it should be: mov $8(%esi), %ecx ; note %esi instead of %ebx Also, if that is the case, then the jz in question should jump to the lea instruction instead of the mov instruction it faulted at. It seems that the compiler is assuming that %ebx = m, when in fact %ebx != m, but is the boolean result of m != NULL. I also
Re: Odd crash
On Wed, Mar 15, 2000 at 04:46:02PM -0700, Warner Losh wrote: > > I just got an odd crash: > > Fatal trap 12: page fault while in kernel mode > fault virtual address = 0x8 > fault code = supervisor read, page not present > instruction pointer = 0x8:0xc01d16ac > stack pointer = 0x10:0xc031e704 > frame pointer = 0x10:0xc031e70c > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = Idle > interrupt mask = > kernel: type 12 trap, code=0 > Stopped at arpintr+0x9c: movl0x8(%ebx),%ecx > db> trace > arpintr(c02a997b,0,10,10,c5d20010) at arpintr+0x9c > swi_net_next() at swi_net_next > db> I'm chasing a similiar bug since 2 weeks. The kernel crashed with a double page fault and access to a very low address (something like 0x0098). I eliminated the RealTek card (replaced by a 3Com) and things got better. But it kept crashing. Now I have disabled SMP and things happen even more rarely (once every 2 days instead of once every 3-4 hours). This is a very fast machine (733) potentially with chipsets not supported too well. I have a few crash dumps I could people have a look at, at least send the output of kdgb. If anyone is interested, I may be able to provide access to the machine and the crash dumps (you can't ftp them, they are 1GB+ in size, it would cost me NZ$250). Joerg -- Joerg B. MicheelEmail: <[EMAIL PROTECTED]> Waikato Applied Network DynamicsPhone: +64 7 8384794 The University of Waikato, CompScience Fax: +64 7 8384155 Private Bag 3105Pager: +64 868 38222 Hamilton, New Zealand Plan: TINE and the DAG's To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Odd crash
I just got an odd crash: Fatal trap 12: page fault while in kernel mode fault virtual address = 0x8 fault code = supervisor read, page not present instruction pointer = 0x8:0xc01d16ac stack pointer = 0x10:0xc031e704 frame pointer = 0x10:0xc031e70c code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = kernel: type 12 trap, code=0 Stopped at arpintr+0x9c: movl0x8(%ebx),%ecx db> trace arpintr(c02a997b,0,10,10,c5d20010) at arpintr+0x9c swi_net_next() at swi_net_next db> I'm using the realtek driver with a RealTek 8139 built into the SBC that I have sitting on my desk. rl0: port 0x6000-0x60ff mem 0xf900-0xf9ff irq 11 at device 6.0 on pci0 rl0: Ethernet address: 00:60:e0:00:7f:c8 Looking at the disassembled output of ddb, I think that I'm crashing at the following place. if (m->m_len < sizeof(struct arphdr) && (m = m_pullup(m, sizeof(struct arphdr)) == NULL)) { log(LOG_ERR, "arp: runt packet -- m_pullup failed."); continue; } ar = mtod(m, struct arphdr *); ==> if (ntohs(ar->ar_hrd) != ARPHRD_ETHER && ntohs(ar->ar_hrd) != ARPHRD_IEEE802) { log(LOG_ERR, "arp: unknown hardware address format (%2D)", (unsigned char *)&ar->ar_hrd, ""); m_freem(m); continue; } since ar is NULL for some reason. I have no clue at all why this would happen. This means that m->m_data has to be NULL. But that doesn't make sense because of the m_pullup just before this. If it doesn't return NULL, then I thought that m->m_data was guaranteed to be valid. I think that there might be a bug in the code generation, but I don't know for sure. If we look at the disassembled output: arpintr+0x79: testl %eax,%eax arpintr+0x7b: setz%al arpintr+0x7e: movzbl %al,%ebx arpintr+0x81: testl %ebx,%ebx arpintr+0x83: jz arpintr+0x9c arpintr+0x85: pushl $0xc02f5c60 arpintr+0x8a: pushl $0x3 arpintr+0x8c: calllog arpintr+0x91: addl$0x8,%esp arpintr+0x94: jmp arpintr+0x5 arpintr+0x99: leal0(%esi),%esi arpintr+0x9c: movl0x8(%ebx),%ecx arpintr+0x9f: movzwl 0(%ecx),%eax arpintr+0xa2: xchgb %ah,%al arpintr+0xa4: cmpw$0x1,%ax arpintr+0xa8: jz arpintr+0xd8 arpintr+0xaa: movzwl 0(%ecx),%eax arpintr+0xad: xchgb %ah,%al arpintr+0xaf: cmpw$0x6,%ax arpintr+0xb3: jz arpintr+0xd8 arpintr+0xb5: pushl $0xc02f5c0e arpintr+0xba: pushl %ecx arpintr+0xbb: pushl $0xc02f5ca0 arpintr+0xc0: pushl $0x3 arpintr+0xc2: calllog So we're between the two log calls, which is good. Notice that we effectively zero %ebx at 7e. We then jump to 9c if it isss zero, and then dereference 3bx. Bang, we're dead.I think that the jz should be a jnz, no? Warner To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message