I guess the problem is that your OMPI uses an old hwloc internally. That
one may be too old to understand recent XML exports.
Try replacing "Package" with "Socket" everywhere in the XML file.

Brice



Le 27/10/2015 15:31, Fabian Wein a écrit :
> Thank you very much for the file.
>
> When I try with PETSc, compiled with open-mpi and icc I get
>
> ----------
> Failed to parse XML input with the minimalistic parser. If it was not
> generated by hwloc, try enabling full XML support with libxml2.
> --------------------------------------------------------------------------
>
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
>   topology discovery failed
>   --> Returned value Not supported (-8) instead of ORTE_SUCCESS
> -----------
>
> Without export HWLOC_XMLFILE
>
> I get the well known
>
> * hwloc has encountered what looks like an error from the operating
> system.
> *
> * L3 (cpuset 0x000003f0) intersects with NUMANode (P#0 cpuset
> 0x0000003f) without inclusion!
> * Error occurred in topology.c line 942
> *
> * The following FAQ entry in a recent hwloc documentation may help:
> *   What should I do when hwloc reports "operating system" warnings?
> * Otherwise please report this error message to the hwloc user's
> mailing list,
> * along with the output+tarball generated by the hwloc-gather-topology
> script.
>
> And the poor scaling
>
> Triad:        55372.8884   Rate (MB/s)
> ------------------------------------------------
> np  speedup
> 1 1.0
> 2 1.03
> 3 2.98
> 4 3.98
> 5 4.95
> 6 5.96
> 7 4.15
> 8 4.73
> 9 5.36
> 10 5.94
> 11 4.79
> 12 5.25
>
> which is very random upon repetition but never better than a maximal
> speedup of 7.
> I have 24 (48) kernels and only one was used by the time by another
> process.
>
> Using mpich instead of open-mpi I get no message about the hwloc issue
> but
> the same poor and random speedups.
>
> I tried to check the xml file by myself via
> xmllint --valid leo_brice.xml  --loaddtd /usr/local/share/hwloc/hwloc.dtd
>
> However xmllint complains about hwloc.dtd itself
> /usr/local/share/hwloc/hwloc.dtd:8: parser error : StartTag: invalid
> element name
> <!ELEMENT topology (object)+>
>
> I have to mention that I have a mixture of hwloc. The most resent
> installed locally and
> an older as part of petsc.
>
> Any ideas?
>
> Thanks,
>
> Fabian
>
>
>
> On 10/27/2015 10:21 AM, Brice Goglin wrote:
>> Here's the fixed XML. For the record, for each NUMA node, I extended
>> the cpusets of the L3 to match the container NUMA node, and moved all
>> L2 objects as children of that L3.
>> Now you may load that XML instead of the native discovery by setting
>> HWLOC_XMLFILE=leo2.xml in your environment.
>> Brice
>>
>>
>>
>> Le 27/10/2015 10:08, Fabian Wein a écrit :
>>> Brice,
>>>
>>> thank you very much for the offer. I attached the xml file
>>> ..
>>>
>>> * hwloc 1.11.1 has encountered what looks like an error from the
>>> operating system.
>>> *
>>> * L3 (cpuset 0x000003f0) intersects with NUMANode (P#0 cpuset
>>> 0x0000003f) without inclusion!
>>> * Error occurred in topology.c line 981
>>> *
>>> ..
>>>
>>> So if you can affort the time, I apprechiate it very much!
>>>
>>> Fabian
>>>
>>>
>>>
>>> On 10/27/2015 09:52 AM, Brice Goglin wrote:
>>>> Hello
>>>>
>>>> This bug is about L3 cache locality only, everything else should be
>>>> fine, including cache sizes. Few applications use that locality
>>>> information, so I assume it doesn't matter for PETSc scaling.
>>>> We can work around the bug by loading a XML topology. There's no easy
>>>> way to build that correct XML, but I can do it manually if you send
>>>> your
>>>> current broken topology (lstopo foo.xml and send this foo.xml).
>>>>
>>>> Brice
>>>>
>>>>
>>>>
>>>> Le 27/10/2015 09:43, Fabian Wein a écrit :
>>>>> Hello,
>>>>>
>>>>> I'm new to the list and new to the mpi-business, too.
>>>>>
>>>>> Our 4*12 Opteron 6238 system is very similar to the one from the
>>>>> original poster and I get the same error message.
>>>>> Any use in posting my logs?
>>>>>
>>>>> I compiled the latest hwloc, no change. our System is Ubunut 14.4 LTS
>>>>> with kernel 3.13. and our bios is not updated.
>>>>>
>>>>> The system scales very fine with OpenMP but fails to give any
>>>>> realistic scaling using PETSc (both for the standard
>>>>> streaming benchmark and quick tests with a given application).
>>>>>
>>>>> As far as I understand the system is fine, just the information
>>>>> gathering fails, right?!
>>>>>
>>>>> Do you know if the hwloc issue relates with our poor PETSc
>>>>> scaling? Is
>>>>> there a way to configure the topology
>>>>> manually?
>>>>>
>>>>> To me it appears that an bios update wouldn't help, right?! I
>>>>> wouldn't
>>>>> try it if it is not nesessary. I'm a user with sudo accesss,
>>>>> not an administrator but we have no admin for the system.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Fabian
>>>>> _______________________________________________
>>>>> hwloc-users mailing list
>>>>> hwloc-us...@open-mpi.org
>>>>> Subscription:
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1201.php
>>>>
>>>> _______________________________________________
>>>> hwloc-users mailing list
>>>> hwloc-us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1204.php
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> hwloc-users mailing list
>>> hwloc-us...@open-mpi.org
>>> Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>> Link to this
>>> post:http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1205.php
>>
>>
>>
>> _______________________________________________
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1206.php
>>
>

Reply via email to