devel-boun...@open-mpi.org wrote on 08/29/2011 04:20:30 PM:

> De : Ralph Castain <r...@open-mpi.org>
> A : Open MPI Developers <de...@open-mpi.org>
> Date : 08/29/2011 04:26 PM
> Objet : Re: [OMPI devel] known limitation or bug in hwloc?
> Envoyé par : devel-boun...@open-mpi.org
> 
> Actually, I'll eat those words. I was looking at the wrong place.
> 
> Yes, that is a bug in hwloc. It needs to loop over CPU_MAX for those
> cases where the bit mask extends over multiple words.

But I'm afraid the fix won't be trivial at all: hwloc in itself is 
coherent: it loops overs NUM_BITS, but it uses masks that are NUM_BITS 
wide (hwloc_bitmap_t set)...

Regards,
Nadia
> 
> 
> On Aug 29, 2011, at 7:16 AM, Ralph Castain wrote:
> 
> > Actually, if you look closely at the definition of those two 
> values, you'll see that it really doesn't matter which one we loop 
> over. The NUM_BITS value defines the actual total number of bits in 
> the mask. The CPU_MAX is the total number of cpus we can support, 
> which was set to a value such that the two are equal (i.e., it's a 
> power of two that happens to be an integer multiple of 64).
> > 
> > I believe the original intent was to allow CPU_MAX to be 
> independent of address-alignment questions, so NUM_BITS could 
> technically be greater than CPU_MAX. Even if this happens, though, 
> all that would do is cause the loop to run across more bits than 
required.
> > 
> > So it doesn't introduce a limitation at all. In hindsight, we 
> could simplify things by eliminating one of those values and just 
> putting a requirement on the number that it be a multiple of 64 so 
> it aligns with a memory address.
> > 
> > 
> > On Aug 29, 2011, at 7:05 AM, Kenneth Lloyd wrote:
> > 
> >> Nadia,
> >> 
> >> Interesting. I haven't tried pushing this to levels above 8 on a 
particular
> >> machine. Do you think that the cpuset / paffinity / hwloc only 
applies at
> >> the machine level, at which time you need to employ a graph with 
carto?
> >> 
> >> Regards,
> >> 
> >> Ken
> >> 
> >> -----Original Message-----
> >> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] 
On
> >> Behalf Of nadia.derbey
> >> Sent: Monday, August 29, 2011 5:45 AM
> >> To: Open MPI Developers
> >> Subject: [OMPI devel] known limitation or bug in hwloc?
> >> 
> >> Hi list,
> >> 
> >> I'm hitting a limitation with paffinity/hwloc with cpu numbers >= 64.
> >> 
> >> In opal/mca/paffinity/hwloc/paffinity_hwloc_module.c, module_set() is
> >> the routine that sets the calling process affinity to the mask given 
as
> >> parameter. Note that "mask" is a opal_paffinity_base_cpu_set_t (so we
> >> allow the cpus to be potentially numbered up to
> >> OPAL_PAFFINITY_BITMASK_CPU_MAX - 1).
> >> 
> >> The problem with module_set() is that is loops over
> >> OPAL_PAFFINITY_BITMASK_T_NUM_BITS bits to check if these bits are set 
in
> >> the mask:
> >> 
> >> for (i = 0; ((unsigned int) i) < OPAL_PAFFINITY_BITMASK_T_NUM_BITS; 
++i)
> >> {
> >>       if (OPAL_PAFFINITY_CPU_ISSET(i, mask)) {
> >>           hwloc_bitmap_set(set, i);
> >>       }
> >>   }
> >> 
> >> Given "mask"'s type, I think module_set() should instead loop over
> >> OPAL_PAFFINITY_BITMASK_CPU_MAX bits.
> >> 
> >> Note that module_set() uses a type for its internal mask that is
> >> coherent with OPAL_PAFFINITY_BITMASK_T_NUM_BITS (hwloc_bitmap_t).
> >> 
> >> So I'm wondering whether this is a known limitation I've never heard 
of
> >> or an actual bug?
> >> 
> >> Regards,
> >> Nadia
> >> 
> >> 
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> -----
> >> No virus found in this message.
> >> Checked by AVG - www.avg.com
> >> Version: 10.0.1392 / Virus Database: 1520/3864 - Release Date: 
08/28/11
> >> 
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > 
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to