Or, if there's a specific problem in hwloc (i.e., hwloc proper -- not the component in OMPI), post to hwloc-de...@open-mpi.org.
I *think* that hwloc handles CPU sets of any size. I bumped the version of hwloc to 1.2.1 (the latest stable release) in both the trunk and v1.5. v1.4 doesn't have hwloc. On Aug 29, 2011, at 11:57 AM, Ralph Castain wrote: > > On Aug 29, 2011, at 8:35 AM, nadia.der...@bull.net wrote: > >> >> devel-boun...@open-mpi.org wrote on 08/29/2011 04:20:30 PM: >> >> > De : Ralph Castain <r...@open-mpi.org> >> > A : Open MPI Developers <de...@open-mpi.org> >> > Date : 08/29/2011 04:26 PM >> > Objet : Re: [OMPI devel] known limitation or bug in hwloc? >> > Envoyé par : devel-boun...@open-mpi.org >> > >> > Actually, I'll eat those words. I was looking at the wrong place. >> > >> > Yes, that is a bug in hwloc. It needs to loop over CPU_MAX for those >> > cases where the bit mask extends over multiple words. >> >> But I'm afraid the fix won't be trivial at all: hwloc in itself is coherent: >> it loops overs NUM_BITS, but it uses masks that are NUM_BITS wide >> (hwloc_bitmap_t set)... > > I guess I'm missing that - I just did a search and cannot find any reference > to OPAL_PAFFINITY_BITMASK_T_NUM_BITS anywhere in paffinity/hwloc after the > last change. > > Can you point me to where you believe a problem exists? Or feel free to > submit a patch to fix it :-) We can push it upstream to the hwloc folks for > their consideration. > > >> >> Regards, >> Nadia >> > >> > >> > On Aug 29, 2011, at 7:16 AM, Ralph Castain wrote: >> > >> > > Actually, if you look closely at the definition of those two >> > values, you'll see that it really doesn't matter which one we loop >> > over. The NUM_BITS value defines the actual total number of bits in >> > the mask. The CPU_MAX is the total number of cpus we can support, >> > which was set to a value such that the two are equal (i.e., it's a >> > power of two that happens to be an integer multiple of 64). >> > > >> > > I believe the original intent was to allow CPU_MAX to be >> > independent of address-alignment questions, so NUM_BITS could >> > technically be greater than CPU_MAX. Even if this happens, though, >> > all that would do is cause the loop to run across more bits than required. >> > > >> > > So it doesn't introduce a limitation at all. In hindsight, we >> > could simplify things by eliminating one of those values and just >> > putting a requirement on the number that it be a multiple of 64 so >> > it aligns with a memory address. >> > > >> > > >> > > On Aug 29, 2011, at 7:05 AM, Kenneth Lloyd wrote: >> > > >> > >> Nadia, >> > >> >> > >> Interesting. I haven't tried pushing this to levels above 8 on a >> > >> particular >> > >> machine. Do you think that the cpuset / paffinity / hwloc only applies >> > >> at >> > >> the machine level, at which time you need to employ a graph with carto? >> > >> >> > >> Regards, >> > >> >> > >> Ken >> > >> >> > >> -----Original Message----- >> > >> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On >> > >> Behalf Of nadia.derbey >> > >> Sent: Monday, August 29, 2011 5:45 AM >> > >> To: Open MPI Developers >> > >> Subject: [OMPI devel] known limitation or bug in hwloc? >> > >> >> > >> Hi list, >> > >> >> > >> I'm hitting a limitation with paffinity/hwloc with cpu numbers >= 64. >> > >> >> > >> In opal/mca/paffinity/hwloc/paffinity_hwloc_module.c, module_set() is >> > >> the routine that sets the calling process affinity to the mask given as >> > >> parameter. Note that "mask" is a opal_paffinity_base_cpu_set_t (so we >> > >> allow the cpus to be potentially numbered up to >> > >> OPAL_PAFFINITY_BITMASK_CPU_MAX - 1). >> > >> >> > >> The problem with module_set() is that is loops over >> > >> OPAL_PAFFINITY_BITMASK_T_NUM_BITS bits to check if these bits are set in >> > >> the mask: >> > >> >> > >> for (i = 0; ((unsigned int) i) < OPAL_PAFFINITY_BITMASK_T_NUM_BITS; ++i) >> > >> { >> > >> if (OPAL_PAFFINITY_CPU_ISSET(i, mask)) { >> > >> hwloc_bitmap_set(set, i); >> > >> } >> > >> } >> > >> >> > >> Given "mask"'s type, I think module_set() should instead loop over >> > >> OPAL_PAFFINITY_BITMASK_CPU_MAX bits. >> > >> >> > >> Note that module_set() uses a type for its internal mask that is >> > >> coherent with OPAL_PAFFINITY_BITMASK_T_NUM_BITS (hwloc_bitmap_t). >> > >> >> > >> So I'm wondering whether this is a known limitation I've never heard of >> > >> or an actual bug? >> > >> >> > >> Regards, >> > >> Nadia >> > >> >> > >> >> > >> _______________________________________________ >> > >> devel mailing list >> > >> de...@open-mpi.org >> > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > >> ----- >> > >> No virus found in this message. >> > >> Checked by AVG - www.avg.com >> > >> Version: 10.0.1392 / Virus Database: 1520/3864 - Release Date: 08/28/11 >> > >> >> > >> _______________________________________________ >> > >> devel mailing list >> > >> de...@open-mpi.org >> > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > >> > >> > >> > _______________________________________________ >> > devel mailing list >> > de...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/