With HWLOC_COMPONENTS=no_os, MPICH is now working fine but all tests now
fail with Open-MPI (see below).  I know how to resolve this, but am noting
it for the benefit of others.

--------------------------------------------------------------------------
All nodes which are allocated for this job are already filled.
--------------------------------------------------------------------------

Jeff

On Thu, Sep 13, 2018 at 10:36 PM, Brice Goglin <brice.gog...@inria.fr>
wrote:

> If lstopo fails there, run "hwloc-gather-topology foo" and send foo.tar.bz2
>
> As a workaround for ARMCI, you may try setting HWLOC_COMPONENTS=no_os,stop
> in the environment so that hwloc behaves as if the operating system had no
> topology support.
>
> Brice
>
>
>
> Le 14/09/2018 à 06:11, Jeff Hammond a écrit :
>
> All of the job failures have this warning so I am inclined to think they
> are related.  I don't know what I should expect from lstopo on inside of
> AWS, but I guess I'll try it.
>
> I was using the hwloc shipped with the mpich-3.3b1.  Talk to the MPICH
> team if you want them to upgrade :-)
>
> Jeff
>
> On Thu, Sep 13, 2018 at 8:42 AM, Brice Goglin <brice.gog...@inria.fr>
> wrote:
>
>> This is actually just a warning. Usually it causes the topology to be
>> wrong (like a missing object), but it shouldn't prevent the program from
>> working. Are you sure your programs are failing because of hwloc? Do you
>> have a way to run lstopo on that node?
>>
>> By the way, you shouldn't use hwloc 2.0.0rc2, at least because it's old,
>> it has a broken ABI, and it's a RC :)
>>
>> Brice
>>
>>
>>
>> Le 13/09/2018 à 16:12, Jeff Hammond a écrit :
>>
>> I am running ARMCI-MPI over MPICH in a Travis CI Linux instance and
>> topology is causing it to fail.  I do not care about topology in a
>> virtualized environment.  How do I fix this?
>>
>> ************************************************************
>> ****************
>> * hwloc 2.0.0rc2-git has encountered what looks like an error from the
>> operating system.
>> *
>> * Group0 (cpuset 0x00001111,0x11111111) intersects with L3 (cpuset
>> 0x00001000,0x02100002) without inclusion!
>> * Error occurred in topology.c line 1384
>> *
>> * The following FAQ entry in the hwloc documentation may help:
>> *   What should I do when hwloc reports "operating system" warnings?
>> * Otherwise please report this error message to the hwloc user's mailing
>> list
>> * along with the files generated by the hwloc-gather-topology script.
>> ************************************************************
>> ****************
>>
>> https://travis-ci.org/jeffhammond/armci-mpi/jobs/425342479 has all of
>> the details.
>>
>> Jeff
>>
>>
>> --
>> Jeff Hammond
>> jeff.scie...@gmail.com
>> http://jeffhammond.github.io/
>>
>>
>> _______________________________________________
>> hwloc-users mailing 
>> listhwloc-us...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/hwloc-users
>>
>>
>>
>> _______________________________________________
>> hwloc-users mailing list
>> hwloc-users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
>>
>
>
>
> --
> Jeff Hammond
> jeff.scie...@gmail.com
> http://jeffhammond.github.io/
>
>
> _______________________________________________
> hwloc-users mailing 
> listhwloc-us...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/hwloc-users
>
>
>
> _______________________________________________
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
>



-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/
_______________________________________________
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Reply via email to