Hi all. We were having problems with multiple sequentail progdev calls failing on our ROACH-2 systems. We were testing multiple bof files in a loop, and the roach would fall over and crash completely, and after the kernel panic, it would reboot itself.
After a great deal of concentrated debugging effort this afternoon by Jack, David, Justin, Ryan, Arindam, Randy, and me, the cause of the crashing upon multiple progdev calls was found. It turned out to have nothing to do with programming the chip, rather it was a problem with memory allocation by the operating system. Jack found that problem could also be caused by allocating a huge array in Python, using lots of memory. The problem was caused by the kernel thinking that the ROACH has 768 MB of memory on board, when in fact it has only 512 MB. The fix is to pass the real amount of memory to the kernel in the bootargs. the systems have been mostly working for a long time (Years!), so you may want to check that your systems know in fact how much memory they have. If you start up top you can see what it thinks, or look in /proc/meminfo. John