On 4/12/07, Ashley Pittman <[EMAIL PROTECTED]> wrote: > On Mon, 2007-04-09 at 11:30 -0600, Matt Funk wrote: > > The reason i want to run on 32 processor though, is that it takes (on > > 32 procs) several hours till my program crashes. Also, i would like to > > be able to keep the conditions under which it crashes intact as much > > as possible (i.e. run on 32 procs rather than 1). > > > > Does anyone have any advice? I am open to try out other things as well > > if possible. I am just starting to learn debugger techniques for a > > parallel > > program. > > What you are trying to do isn't uncommon, some of us do it most days. > having a job which exhibits the problem with only 32 procs and several > hours isn't a bad reproducer, I've certainly seen much worse. Debugging > at this scale isn't exactly interactive but it's small enough to me able > to make timely progress. > > My advice would be first and foremost to look at the core file, I assume > your program is receiving a SEGV and exiting? core files can be > problematical, partly because they aren't always enabled and partly > because to extract anything useful out of them you need to run the > debugger with the same environment as the application was, this isn't > always as easy as it sounds if you are using modules or something like > that.
One question. When the debuggee app was a 32-PE MPI job, you would end up with 32 core files. Would you check each of them manually? Or do you have any trick to parallellize the checking process? Say, using a parallel debugger? Naoya Maruyama Tokyo Institute of Technology _______________________________________________ Beowulf mailing list, [EMAIL PROTECTED] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
