All, As mentioned in another thread I've recently ported padb, a command line job inspection tool (kinda like a parallel debugger) to orte and OpenMPI. Padb is an existing stable product which has worked for a number of years on Slurm and RMS, orte support is new and not widely tested yet although it works for all cases I've tried.
For those who haven't used it padb is a open source command-line tool which among other things can collect stack traces, display MPI message queues and present a lot of process information about parallel jobs to the user is an accessible way. Ideally padb will find it's place within the day to day workings of OpenMPI developers and become a recommended tool for users as well, it also has a mode where it can be launched automatically to gather information about job hangs without human intervention, I'd be willing to work with the OpenMPI team to integrate this into the MTT code if desired. I would encourage you to download it and try it out, if it works for you and you like it that's great, if not let me know and I'll do what I can to fix it. There is a website and public mailing lists for padb issues or I am happy to discuss orte specific issues on this list. The website is at http://padb.pittman.org.uk and I welcome any feedback, either here, off-list or on either of the padb mailing lists. Yours, Ashley Pittman, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk