Thanks. I think I found the problem. I had programmed RedBoot (ROM startup) into the flash on my board. RedBoot boots and runs very quickly -- before I can start GDB using the JTAG debugger. Therefore, the caches were enabled and active when the JTAG debugger takes over. After loading my application code via the JTAG interface, I would run my application which would disable and flush the caches. This cache flushing would overwrite portions of my program/data (somewhat randomly), causing it to crash. I have fixed this by putting some additional stuff in my GDB init script to disable and flush the caches before I load my application. All works again.
Jay -----Original Message----- From: Andrew Lunn [mailto:[EMAIL PROTECTED] Sent: Thursday, July 06, 2006 3:15 PM To: Jay Foster Subject: Re: [ECOS] ARM HAL Problems On Thu, Jul 06, 2006 at 02:58:00PM -0700, Jay Foster wrote: > I'm working on an ARM9 HAL (ARM940T core, gcc 3.4.3), and am having problems > with the code crashing pseudo randomly. The frustrating part is that this > was working great last week. This week, I can not get it to work at all. I > am loading the code (RAM startup) onto the target board using a JTAG > emulator. It either crashes on an ASSERT or dies with a data abort or > prefetch abort. After a couple of days of fruitless debugging, the best I > can determine is that the CPU registers are getting corrupted by the RTC > interrupt, causing the code to run off into the weeds in random ways. I > can't figure out how (if) this is happening. I'm using only IRQ (no FIQ) > interrupts to avoid nesting problems. Any helpful debugging tips? I few random things to check, from my past experiance. 1) I assume you have asserts enabled? Well, yes, you do, since you say it sometimes dies with an assert. 2) When it has crashed, take a look at the interrupt vectors code in 0x0-0x40. Have you de-referenced a null pointer and so corrupted the vectors. Also check the eCos list of interrupts, not just the ARM vectors. It is less likely, but still possible. I once spent a week looking for a bug like this. Something corrupts the IRQ vector. 10ms later the timer tick goes off and then you die. Nasty to find. 3) Make all you stacks bigger, just in case. 4) check the processor mode when it goes wrong. Interrupt mode? System mode? 5) Try back tracking from an assert/data abort by decoding the stack(s) by hand. Check the other stacks as well, not just the current CPU mode stack. Andrew -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
