Thank you very much for the file.

When I try with PETSc, compiled with open-mpi and icc I get

----------
Failed to parse XML input with the minimalistic parser. If it was not
generated by hwloc, try enabling full XML support with libxml2.
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  topology discovery failed
  --> Returned value Not supported (-8) instead of ORTE_SUCCESS
-----------

Without export HWLOC_XMLFILE

I get the well known

* hwloc has encountered what looks like an error from the operating system.
*
* L3 (cpuset 0x000003f0) intersects with NUMANode (P#0 cpuset 0x0000003f) without inclusion!
* Error occurred in topology.c line 942
*
* The following FAQ entry in a recent hwloc documentation may help:
*   What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing list, * along with the output+tarball generated by the hwloc-gather-topology script.

And the poor scaling

Triad:        55372.8884   Rate (MB/s)
------------------------------------------------
np  speedup
1 1.0
2 1.03
3 2.98
4 3.98
5 4.95
6 5.96
7 4.15
8 4.73
9 5.36
10 5.94
11 4.79
12 5.25

which is very random upon repetition but never better than a maximal speedup of 7. I have 24 (48) kernels and only one was used by the time by another process.

Using mpich instead of open-mpi I get no message about the hwloc issue but
the same poor and random speedups.

I tried to check the xml file by myself via
xmllint --valid leo_brice.xml  --loaddtd /usr/local/share/hwloc/hwloc.dtd

However xmllint complains about hwloc.dtd itself
/usr/local/share/hwloc/hwloc.dtd:8: parser error : StartTag: invalid element name
<!ELEMENT topology (object)+>

I have to mention that I have a mixture of hwloc. The most resent installed locally and
an older as part of petsc.

Any ideas?

Thanks,

Fabian



On 10/27/2015 10:21 AM, Brice Goglin wrote:
Here's the fixed XML. For the record, for each NUMA node, I extended
the cpusets of the L3 to match the container NUMA node, and moved all
L2 objects as children of that L3.
Now you may load that XML instead of the native discovery by setting
HWLOC_XMLFILE=leo2.xml in your environment.
Brice



Le 27/10/2015 10:08, Fabian Wein a écrit :
Brice,

thank you very much for the offer. I attached the xml file
..

* hwloc 1.11.1 has encountered what looks like an error from the
operating system.
*
* L3 (cpuset 0x000003f0) intersects with NUMANode (P#0 cpuset
0x0000003f) without inclusion!
* Error occurred in topology.c line 981
*
..

So if you can affort the time, I apprechiate it very much!

Fabian



On 10/27/2015 09:52 AM, Brice Goglin wrote:
Hello

This bug is about L3 cache locality only, everything else should be
fine, including cache sizes. Few applications use that locality
information, so I assume it doesn't matter for PETSc scaling.
We can work around the bug by loading a XML topology. There's no easy
way to build that correct XML, but I can do it manually if you send
your
current broken topology (lstopo foo.xml and send this foo.xml).

Brice



Le 27/10/2015 09:43, Fabian Wein a écrit :
Hello,

I'm new to the list and new to the mpi-business, too.

Our 4*12 Opteron 6238 system is very similar to the one from the
original poster and I get the same error message.
Any use in posting my logs?

I compiled the latest hwloc, no change. our System is Ubunut 14.4 LTS
with kernel 3.13. and our bios is not updated.

The system scales very fine with OpenMP but fails to give any
realistic scaling using PETSc (both for the standard
streaming benchmark and quick tests with a given application).

As far as I understand the system is fine, just the information
gathering fails, right?!

Do you know if the hwloc issue relates with our poor PETSc
scaling? Is
there a way to configure the topology
manually?

To me it appears that an bios update wouldn't help, right?! I
wouldn't
try it if it is not nesessary. I'm a user with sudo accesss,
not an administrator but we have no admin for the system.

Thanks,

Fabian
_______________________________________________
hwloc-users mailing list
hwloc-us...@open-mpi.org
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
Link to this post:
http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1201.php

_______________________________________________
hwloc-users mailing list
hwloc-us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
Link to this post:
http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1204.php




_______________________________________________
hwloc-users mailing list
hwloc-us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
Link to this 
post:http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1205.php



_______________________________________________
hwloc-users mailing list
hwloc-us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
Link to this post: 
http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1206.php


--
Dr. Fabian Wein, University of Erlangen-Nuremberg
Department of Mathematics/ Excellence Cluster for Engineering of Advanced Materials
phone: +49 9131 85 20849

Reply via email to