What's the output of running lstopo from hwloc 1.3.2?  (this is the version 
that's in the OMPI trunk and v1.5 branches)

    http://www.open-mpi.org/software/hwloc/v1.3/

Is there any difference from v1.4 hwloc?

    http://www.open-mpi.org/software/hwloc/v1.4/


On Feb 21, 2012, at 7:20 PM, Eugene Loh wrote:

> We have some amount of MTT testing going on every night and on ONE of our 
> systems v1.5 has been dead since r25914.  The system is
> 
> Linux burl-ct-v20z-10 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 2007 
> x86_64 x86_64 x86_64 GNU/Linux
> 
> and I'm encountering the problem with Intel (composer_xe_2011_sp1.7.256) 
> compilers.  I haven't poked around enough yet to figure out what the 
> problematic characteristic of this configuration is.
> 
> In r25914, orte/mca/odls/base/odls_base_open.c, we get
> 
>    222     /* get the number of local sockets unless we were given a number */
>    223     if (0 == orte_default_num_sockets_per_board) {
>    224         
> opal_paffinity_base_get_socket_info(&orte_odls_globals.num_sockets);
>    225     }
>    226     /* get the number of local processors */
>    227     
> opal_paffinity_base_get_processor_info(&orte_odls_globals.num_processors);
>    228     /* compute the base number of cores/socket, if not given */
>    229     if (0 == orte_default_num_cores_per_socket) {
>    230         orte_odls_globals.num_cores_per_socket = 
> orte_odls_globals.num_processors / orte_odls_globals.num_sockets;
>    231     }
> 
> Well, we execute the branch at line 224, but num_sockets remains 0.  This 
> leads to the divide-by-0 at line 230.  Digging deeper, the call at line 224 
> led us to opal/mca/paffinity/hwloc/paffinity_hwloc_module.c (lots of stuff 
> left out):
> 
> static int module_get_socket_info(int *num_sockets) {
>    hwloc_topology_t *t = &opal_hwloc_topology;
>    *num_sockets = (int) hwloc_get_nbobjs_by_type(*t, HWLOC_OBJ_SOCKET);
>    return OPAL_SUCCESS;
> }
> 
> Anyhow, SOCKET is somehow an unknown layer, so num_sockets is returning 0.
> 
> I can poke around more, but does someone want to advise?
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to