Gilles,
a short update about the patched version (3.0.0). After we updated from CentOS
7.3 to 7.4, this version build with all versions of all compilers stopped to
work with message like:
$ a.out: symbol lookup error:
/opt/MPI/openmpi-3.0.0p/linux/intel_17.0.5.239/lib/openmpi/mca_mpool_memkind.so:
undefined symbol: memkind_get_kind_by_partition
Short look in-depth show up that memkind package containing libmemkind.so.0.0.1
was updated from memkind-1.4.0-1.el7.x86_64 to memkind-1.5.0-1.el7.x86_64 by
updating CentOS, and that the older one really contain
'memkind_get_kind_by_partition' symbol whilst the new one did not have this
symbol anymore.
So I will rebuild this version to look what happen' (regular v3.0.0 didn't have
the issue). Nevertheless I liked to give you a report about this side effect of
your patch ...
[reading the fine google search results first]
Likely I see another one of type
https://github.com/open-mpi/ompi/issues/4466
Most amazing is that only one version of Open MPI (the patched 3.0.0 one) stops
to work instead of all. Seem's we're lucky. WOW.
will report on results of 3.0.0p rebuild.
best,
Paul Kapinos
$ objdump -S /usr/lib64/libmemkind.so.0.0.1 | grep -i
memkind_get_kind_by_partition
7f70 :
7f76: 77 19 ja 7f91
7f89: 74 06 je 7f91
On 10/12/2017 11:21 AM, Gilles Gouaillardet wrote:
> Paul,
>
> Sorry for the typo.
>
> The patch was developed on the master branch.
> Note v1.10 is no more supported, and since passive wait is a new feature, it
> would start at v3.1 or later.
>
> That being said, if you are kind of stucked with 1.10.7, i can try to craft a
> one off patch in order to help
>
>
> Cheers,
>
> Gilles
>
> Paul Kapinos wrote:
>> Hi Gilles,
>> Thank you for your message and quick path!
>>
>> You likely mean (instead of links in your eMail below)
>> https://github.com/open-mpi/ompi/pull/4331 and
>> https://github.com/open-mpi/ompi/pull/4331.patch
>> for your PR #4331 (note '4331' instead of '4431' :-)
>>
>> I was not able to path 1.10.7 release - likely because you develop on much
>> much
>> newer version of Open MPI.
>>
>> Q1: on *which* release the path 4331 should be applied?
>>
>> Q2: I assume it is unlikely that this patch would be back-ported to 1.10.x?
>>
>> Best
>> Paul Kapinos
>>
>>
>>
>>
>> On 10/12/2017 09:31 AM, Gilles Gouaillardet wrote:
>>> Paul,
>>>
>>>
>>> i made PR #4331 https://github.com/open-mpi/ompi/pull/4431 in order to
>>> implement
>>> this.
>>>
>>> in order to enable passive wait, you simply need to
>>>
>>> mpirun --mca mpi_poll_when_idle true ...
>>>
>>>
>>> fwiw, when you use mpi_yield_when_idle, Open MPI does (highly
>>> oversimplified)
>>>
>>> for (...) sched_yield();
>>>
>>>
>>> as you already noted, top show 100% cpu usage (a closer look shows the
>>> usage is
>>> in the kernel and not user space).
>>>
>>> that being said, since the process is only yielding, the other running
>>> processes
>>> will get most of their time slices,
>>>
>>> and hence the system remains pretty responsive.
>>>
>>>
>>> Can you please give this PR a try ?
>>>
>>> the patch can be manually downloaded at
>>> https://github.com/open-mpi/ompi/pull/4431.patch
>>>
>>>
>>> Cheers,
>>>
>>>
>>> Gilles
>>>
>>>
>>> On 10/12/2017 12:37 AM, Paul Kapinos wrote:
Dear Jeff,
Dear All,
we know about *mpi_yield_when_idle* parameter [1]. We read [2]. You're
right,
> if an MPI application is waiting a long time for messages,
> perhaps its message passing algorithm should be re-designed
... but we cannot spur the ParaView/VTK developer to rewrite their software
famous for busy-wait on any user mouse move with N x 100% CPU load [3].
It turned out that
a) (at least some) spin time is on MPI_Barrier call (waitin' user
interaction)
b) for Intel MPI and MPICH we found a way to disable this busy wait [4]
c) But, for both 'pvserver' and minimal example (attached), we were not
able to
stop the busy waiting with Open MPI: setting *mpi_yield_when_idle*
parameter to
'1' just seem to move the spin activity from userland to kernel, with
staying at
100%, cf. attached screenshots and [5]. The behaviour is the same for
1.10.4 and
2.0.2.
Well, The Question: is there a way/a chance to effectively disable the
busy wait
using Open MPI?
Best,
Paul Kapinos
[1] http://www.open-mpi.de/faq/?category=running#force-aggressive-degraded
[2]
http://blogs.cisco.com/performance/polling-vs-blocking-message-passingprogress
[3]
https://www.paraview.org/Wiki/Setting_up_a_ParaView_Server#Server_processes_always_have_100.25_CPU_usage
[4]
https://public.kitware.com/pipermail/paraview-developers/2017-October/005587.html
[5]
https://serverfault.com/questions/180711/what-exactly-do-the-colors-in-htop-status-bars-mea