What's the output of running lstopo from hwloc 1.3.2? (this is the version that's in the OMPI trunk and v1.5 branches)
http://www.open-mpi.org/software/hwloc/v1.3/ Is there any difference from v1.4 hwloc? http://www.open-mpi.org/software/hwloc/v1.4/ On Feb 21, 2012, at 7:20 PM, Eugene Loh wrote: > We have some amount of MTT testing going on every night and on ONE of our > systems v1.5 has been dead since r25914. The system is > > Linux burl-ct-v20z-10 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 2007 > x86_64 x86_64 x86_64 GNU/Linux > > and I'm encountering the problem with Intel (composer_xe_2011_sp1.7.256) > compilers. I haven't poked around enough yet to figure out what the > problematic characteristic of this configuration is. > > In r25914, orte/mca/odls/base/odls_base_open.c, we get > > 222 /* get the number of local sockets unless we were given a number */ > 223 if (0 == orte_default_num_sockets_per_board) { > 224 > opal_paffinity_base_get_socket_info(&orte_odls_globals.num_sockets); > 225 } > 226 /* get the number of local processors */ > 227 > opal_paffinity_base_get_processor_info(&orte_odls_globals.num_processors); > 228 /* compute the base number of cores/socket, if not given */ > 229 if (0 == orte_default_num_cores_per_socket) { > 230 orte_odls_globals.num_cores_per_socket = > orte_odls_globals.num_processors / orte_odls_globals.num_sockets; > 231 } > > Well, we execute the branch at line 224, but num_sockets remains 0. This > leads to the divide-by-0 at line 230. Digging deeper, the call at line 224 > led us to opal/mca/paffinity/hwloc/paffinity_hwloc_module.c (lots of stuff > left out): > > static int module_get_socket_info(int *num_sockets) { > hwloc_topology_t *t = &opal_hwloc_topology; > *num_sockets = (int) hwloc_get_nbobjs_by_type(*t, HWLOC_OBJ_SOCKET); > return OPAL_SUCCESS; > } > > Anyhow, SOCKET is somehow an unknown layer, so num_sockets is returning 0. > > I can poke around more, but does someone want to advise? > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/