Or, if there's a specific problem in hwloc (i.e., hwloc proper -- not the 
component in OMPI), post to hwloc-de...@open-mpi.org.

I *think* that hwloc handles CPU sets of any size.  I bumped the version of 
hwloc to 1.2.1 (the latest stable release) in both the trunk and v1.5.  v1.4 
doesn't have hwloc.


On Aug 29, 2011, at 11:57 AM, Ralph Castain wrote:

> 
> On Aug 29, 2011, at 8:35 AM, nadia.der...@bull.net wrote:
> 
>> 
>> devel-boun...@open-mpi.org wrote on 08/29/2011 04:20:30 PM:
>> 
>> > De : Ralph Castain <r...@open-mpi.org> 
>> > A : Open MPI Developers <de...@open-mpi.org> 
>> > Date : 08/29/2011 04:26 PM 
>> > Objet : Re: [OMPI devel] known limitation or bug in hwloc? 
>> > Envoyé par : devel-boun...@open-mpi.org 
>> > 
>> > Actually, I'll eat those words. I was looking at the wrong place.
>> > 
>> > Yes, that is a bug in hwloc. It needs to loop over CPU_MAX for those
>> > cases where the bit mask extends over multiple words. 
>> 
>> But I'm afraid the fix won't be trivial at all: hwloc in itself is coherent: 
>> it loops overs NUM_BITS, but it uses masks that are NUM_BITS wide 
>> (hwloc_bitmap_t set)... 
> 
> I guess I'm missing that - I just did a search and cannot find any reference 
> to OPAL_PAFFINITY_BITMASK_T_NUM_BITS anywhere in paffinity/hwloc after the 
> last change.
> 
> Can you point me to where you believe a problem exists? Or feel free to 
> submit a patch to fix it :-)  We can push it upstream to the hwloc folks for 
> their consideration.
> 
> 
>> 
>> Regards, 
>> Nadia
>> > 
>> > 
>> > On Aug 29, 2011, at 7:16 AM, Ralph Castain wrote:
>> > 
>> > > Actually, if you look closely at the definition of those two 
>> > values, you'll see that it really doesn't matter which one we loop 
>> > over. The NUM_BITS value defines the actual total number of bits in 
>> > the mask. The CPU_MAX is the total number of cpus we can support, 
>> > which was set to a value such that the two are equal (i.e., it's a 
>> > power of two that happens to be an integer multiple of 64).
>> > > 
>> > > I believe the original intent was to allow CPU_MAX to be 
>> > independent of address-alignment questions, so NUM_BITS could 
>> > technically be greater than CPU_MAX. Even if this happens, though, 
>> > all that would do is cause the loop to run across more bits than required.
>> > > 
>> > > So it doesn't introduce a limitation at all. In hindsight, we 
>> > could simplify things by eliminating one of those values and just 
>> > putting a requirement on the number that it be a multiple of 64 so 
>> > it aligns with a memory address.
>> > > 
>> > > 
>> > > On Aug 29, 2011, at 7:05 AM, Kenneth Lloyd wrote:
>> > > 
>> > >> Nadia,
>> > >> 
>> > >> Interesting. I haven't tried pushing this to levels above 8 on a 
>> > >> particular
>> > >> machine. Do you think that the cpuset / paffinity / hwloc only applies 
>> > >> at
>> > >> the machine level, at which time you need to employ a graph with carto?
>> > >> 
>> > >> Regards,
>> > >> 
>> > >> Ken
>> > >> 
>> > >> -----Original Message-----
>> > >> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
>> > >> Behalf Of nadia.derbey
>> > >> Sent: Monday, August 29, 2011 5:45 AM
>> > >> To: Open MPI Developers
>> > >> Subject: [OMPI devel] known limitation or bug in hwloc?
>> > >> 
>> > >> Hi list,
>> > >> 
>> > >> I'm hitting a limitation with paffinity/hwloc with cpu numbers >= 64.
>> > >> 
>> > >> In opal/mca/paffinity/hwloc/paffinity_hwloc_module.c, module_set() is
>> > >> the routine that sets the calling process affinity to the mask given as
>> > >> parameter. Note that "mask" is a opal_paffinity_base_cpu_set_t (so we
>> > >> allow the cpus to be potentially numbered up to
>> > >> OPAL_PAFFINITY_BITMASK_CPU_MAX - 1).
>> > >> 
>> > >> The problem with module_set() is that is loops over
>> > >> OPAL_PAFFINITY_BITMASK_T_NUM_BITS bits to check if these bits are set in
>> > >> the mask:
>> > >> 
>> > >> for (i = 0; ((unsigned int) i) < OPAL_PAFFINITY_BITMASK_T_NUM_BITS; ++i)
>> > >> {
>> > >>       if (OPAL_PAFFINITY_CPU_ISSET(i, mask)) {
>> > >>           hwloc_bitmap_set(set, i);
>> > >>       }
>> > >>   }
>> > >> 
>> > >> Given "mask"'s type, I think module_set() should instead loop over
>> > >> OPAL_PAFFINITY_BITMASK_CPU_MAX bits.
>> > >> 
>> > >> Note that module_set() uses a type for its internal mask that is
>> > >> coherent with OPAL_PAFFINITY_BITMASK_T_NUM_BITS (hwloc_bitmap_t).
>> > >> 
>> > >> So I'm wondering whether this is a known limitation I've never heard of
>> > >> or an actual bug?
>> > >> 
>> > >> Regards,
>> > >> Nadia
>> > >> 
>> > >> 
>> > >> _______________________________________________
>> > >> devel mailing list
>> > >> de...@open-mpi.org
>> > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > >> -----
>> > >> No virus found in this message.
>> > >> Checked by AVG - www.avg.com
>> > >> Version: 10.0.1392 / Virus Database: 1520/3864 - Release Date: 08/28/11
>> > >> 
>> > >> _______________________________________________
>> > >> devel mailing list
>> > >> de...@open-mpi.org
>> > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > > 
>> > 
>> > 
>> > _______________________________________________
>> > devel mailing list
>> > de...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to