[OMPI devel] Please provide the pshmem_finalize symbol
Dear all, the Score-P community is currently in the process to support the OpenSHMEM API in its performance measurement infrastructure Score-P [1]. And we are near a release of a new major version of it. Now that Open MPI also provides an OpenSHMEM implementation, we extended our testing also to the new 1.8 version of Open MPI. We already submitted some bug reports while we are working on this, but the last one isn't really a bug though: The OpenSHMEM standard does not include the shmem_finalize API, though Open MPI provides one and also ensures (via the destructor attribute) that this function is called in the end. But when a performance monitor like Score-P intercepts the library calls via weak symbols we finally need to call the original function too. As the user is free to call this function itself, but Score-P still needs the parallel context to finalize the measurement after exiting main, we need to intercept shmem_finalize and call the real shmem_finalize after we are done. But unfortunately we can't call the original shmem_finalize as there is no pshmem_finalize in Open MPI. But without finalizing the Open MPI library orterun will report errors because the application did not call shmem_finalize. So our pledge to the Open MPI community is to provide the pshmem_finalize symbol, even though this function is not (yet) in the OpenSHMEM standard. Sincerely, Bert Wesarg [1] http://www.vi-hps.org/projects/score-p -- Dipl.-Inf. Bert Wesarg wiss. Mitarbeiter Technische Universität Dresden Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) 01062 Dresden Tel.: +49 (351) 463-42451 Fax: +49 (351) 463-37773 E-Mail: Bert.Wesarg@tu-dresden. smime.p7s Description: S/MIME Cryptographic Signature
Re: [OMPI devel] Please provide the pshmem_finalize symbol
here it goes, https://svn.open-mpi.org/trac/ompi/changeset/31751 On Wed, May 14, 2014 at 9:19 AM, Bert Wesarg wrote: > Dear all, > > the Score-P community is currently in the process to support the OpenSHMEM > API in its performance measurement infrastructure Score-P [1]. And we are > near a release of a new major version of it. Now that Open MPI also > provides an OpenSHMEM implementation, we extended our testing also to the > new 1.8 version of Open MPI. We already submitted some bug reports while we > are working on this, but the last one isn't really a bug though: > > The OpenSHMEM standard does not include the shmem_finalize API, though > Open MPI provides one and also ensures (via the destructor attribute) that > this function is called in the end. But when a performance monitor like > Score-P intercepts the library calls via weak symbols we finally need to > call the original function too. As the user is free to call this function > itself, but Score-P still needs the parallel context to finalize the > measurement after exiting main, we need to intercept shmem_finalize and > call the real shmem_finalize after we are done. But unfortunately we can't > call the original shmem_finalize as there is no pshmem_finalize in Open > MPI. But without finalizing the Open MPI library orterun will report errors > because the application did not call shmem_finalize. > > So our pledge to the Open MPI community is to provide the pshmem_finalize > symbol, even though this function is not (yet) in the OpenSHMEM standard. > > Sincerely, > Bert Wesarg > > [1] http://www.vi-hps.org/projects/score-p > > -- > Dipl.-Inf. Bert Wesarg > wiss. Mitarbeiter > > Technische Universität Dresden > Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) > 01062 Dresden > Tel.: +49 (351) 463-42451 > Fax: +49 (351) 463-37773 > E-Mail: Bert.Wesarg@tu-dresden. > > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14793.php >
Re: [OMPI devel] Non-uniform BTL problems in: openib, tcp, sctp, portals4, vader, scif
Good catch. I fixed the TCP BTL (r31753). It is the only BTL I can test so that's the most I can do here. However, I never get OPAL_ERR_DATA_VALUE_NOT_FOUND out of the modex call when the key doesn't exists. I looked in dstore and the correct value one should look for is OPAL_ERR_NOT_FOUND. I guess you might want to revise the check in the USNIC. George. PS: There is a easy way to test this particular case by using the MPMD capabilities of mpiexec. As an example for a quick NetPIPE run between two processes one supporting SM and TCP and one supporting only SM (I ignored self here), you can do: mpirun -np 1 --mca btl tcp,sm,self ./NPmpi -l 5 -u 5 : -np 1 --mca btl sm,self ./NPmpi -l 5 -u 5 On Tue, May 13, 2014 at 2:09 PM, Jeff Squyres (jsquyres) wrote: > I notice that BTLs are not checking the return value from ompi_modex_recv() > for OPAL_ERR_DATA_VALUE_NOT_FOUND (indicating that the peer process didn't > put that modex key). In the BTL context, NOT_FOUND means that that peer > process doesn't have this BTL, so this local peer process should probably > mark it as unreachable in add_procs(). > > This is on both trunk and the v1.8 branch. > > The BTLs listed above are not checking/handling ompi_modex_recv() returning > OPAL_ERR_DATA_VALUE_NOT_FOUND properly. Most of these BTLs do something like > this: > > - > module_add_procs() { > loop over the peers { > proc = proc_create(...) > if (NULL == proc) > error! > > } > } > > proc_create(...) { > if (ompi_modex_recv() != OMPI_SUCCESS) > return NULL; > ... > } > - > > The fix is to make proc_create() return something a bit more expressive so > that add_procs() can tell the difference between "error!" and "you can't > reach this peer". > > I fixed this in the usnic BTL back in late March, but forgot to bring this to > everyone's attention -- oops. See > https://svn.open-mpi.org/trac/ompi/ticket/4442 > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14783.php
[OMPI devel] RFC: Add some basic CUDA-aware support to reductions
WHAT: Add some basic support so that reduction functions can support GPU buffers. All this patch does is move the GPU data into a host buffer before the reduction call and move it back to GPU after the reduction call. Changes have no effect if CUDA-aware support is not compiled in. WHY: Users of CUDA-aware support expect reductions to work. WHEN: After next con call, May 20, 2014 See attached patch. --- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. --- reduce.diff Description: reduce.diff
Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)
Looks like this is a scif bug. From the documentation: scif_poll() waits for one of a set of endpoints to become ready to perform an I/O operation; it is syntactically and semantically very similar to poll() . The SCIF functions on which scif_poll() waits are scif_accept(), scif_send(), and scif_recv(). Consult the SCIF API reference manuals for details on scif_poll() usage. So, if it is indeed similar to poll() it should wake up when the file descriptor is closed. Since that is not the case I will look through the documentation and see if there is a way other than pthread_cancel. -Nathan On Wed, May 14, 2014 at 11:18:05AM +0900, Gilles Gouaillardet wrote: > Folks, > > i would like to comment on r31738 : > > > There is no reason to cancel the listening thread. It should die > > automatically when the file descriptor is closed. > i could not agree more > > It is sufficient to just wait for the thread to exit with pthread join. > unfortunatly, at least in my test environment (an outdated MPSS 2.1) it > is *not* :-( > > this is what i described in #4615 > https://svn.open-mpi.org/trac/ompi/ticket/4615 > in which i attached scif_hang.c that evidences that (at least in my > environment) > scif_poll(...) does *not* return after scif_close(...) is closed, and > hence the scif pthread never ends. > > this is likely a bug in MPSS and it might have been fixed in earlier > release. > > Nathan, could you try scif_hang in your environment and report the MPSS > version you are running ? > > > bottom line, and once again, in my test environment, pthread_join (...) > without pthread_cancel(...) > might cause a hang when the btl/scif module is released. > > > assuming the bug is in old MPSS and has been fixed in recent releases, > what is the OpenMPI policy ? > a) test the MPSS version and call pthread_cancel() or do *not* call > pthread_join if buggy MPSS is detected ? > b) display an error/warning if a buggy MPSS is detected ? > c) do not call pthread_join at all ? /* SIGSEGV might occur with older > MPSS, it is in MPI_Finalize() so impact is limited */ > d) do nothing, let the btl/scif module hang, this is *not* an OpenMPI > problem after all ? > e) something else ? > > Gilles > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14786.php pgpXjNLn9CkRi.pgp Description: PGP signature
Re: [OMPI devel] opal_free_list_t annoyance
Indeed, if the constructor is called then the destructor should be as well. Adding the destructor call might be a good idea, despite the fact that it delays everything till the end of the execution. The benefits during the execution is minimal, it only keeps valgrind happy at the end. Btw, can we merge the two free lists? They looks pretty similar (except the mpool stuff). George. On Tue, May 13, 2014 at 5:01 PM, Nathan Hjelm wrote: > While tracking down memory leaks in components I ran into an interesting > issue. osc/rdma uses an opal_free_list_t (not an ompi_free_list_t) for > buffer fragments. The fragment class allocates a buffer as part in the > constructor and frees the buffer in the destructor. The problem is that > the item constructor is called but the destructor is never called. > > I looked into the issue and I see what is happening. When growing the free > list we call the constructor for each item we allocate (see > opal_free_list.c:113) but the free list destructor does not invoke the > destructor. This is different from ompi_free_list_t which does invoke > the destructor on each constructed item. > > The question is. Is this difference intentional? It seems a little odd > that the free list does not call the item destructor given that it > calls the constructor. If this is intentional is there a reason for this > behavior? If not I plan on "fixing" the opal_free_list_t destructor to > call the item destructor. > > -Nathan > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14785.php
Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)
Couple of suggestions: * detect that this is an older scif lib and just don't build or enable the scif btl * have a flag that indicates "you should exit", and then tickle the fd so scif_poll exits Ralph On May 14, 2014, at 7:45 AM, Nathan Hjelm wrote: > Looks like this is a scif bug. From the documentation: > > scif_poll() waits for one of a set of endpoints to become ready to perform an > I/O operation; > it is syntactically and semantically very similar to poll() . The SCIF > functions on which > scif_poll() waits are scif_accept(), scif_send(), and scif_recv(). Consult > the SCIF > API reference manuals for details on scif_poll() usage. > > So, if it is indeed similar to poll() it should wake up when the file > descriptor is closed. > > Since that is not the case I will look through the documentation and see > if there is a way other than pthread_cancel. > > -Nathan > > On Wed, May 14, 2014 at 11:18:05AM +0900, Gilles Gouaillardet wrote: >> Folks, >> >> i would like to comment on r31738 : >> >>> There is no reason to cancel the listening thread. It should die >>> automatically when the file descriptor is closed. >> i could not agree more >>> It is sufficient to just wait for the thread to exit with pthread join. >> unfortunatly, at least in my test environment (an outdated MPSS 2.1) it >> is *not* :-( >> >> this is what i described in #4615 >> https://svn.open-mpi.org/trac/ompi/ticket/4615 >> in which i attached scif_hang.c that evidences that (at least in my >> environment) >> scif_poll(...) does *not* return after scif_close(...) is closed, and >> hence the scif pthread never ends. >> >> this is likely a bug in MPSS and it might have been fixed in earlier >> release. >> >> Nathan, could you try scif_hang in your environment and report the MPSS >> version you are running ? >> >> >> bottom line, and once again, in my test environment, pthread_join (...) >> without pthread_cancel(...) >> might cause a hang when the btl/scif module is released. >> >> >> assuming the bug is in old MPSS and has been fixed in recent releases, >> what is the OpenMPI policy ? >> a) test the MPSS version and call pthread_cancel() or do *not* call >> pthread_join if buggy MPSS is detected ? >> b) display an error/warning if a buggy MPSS is detected ? >> c) do not call pthread_join at all ? /* SIGSEGV might occur with older >> MPSS, it is in MPI_Finalize() so impact is limited */ >> d) do nothing, let the btl/scif module hang, this is *not* an OpenMPI >> problem after all ? >> e) something else ? >> >> Gilles >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/05/14786.php > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14797.php
Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)
Nathan, > Looks like this is a scif bug. From the documentation: and from the source code, scif_poll(...) simply calls poll(...) at least in MPSS 2.1 > Since that is not the case I will look through the documentation and see if there is a way other than pthread_cancel. what about : - use a global variable (a boolean called "close_requested") - update the scif thread so it checks close_requested after each scif_poll, and exits if true - when closing btl/scif : * set close_requested to true * scif_connect to myself * close this connection * pthread_join(...) that's a bit heavyweight, but it does the job ( and we keep an infinite timeout for scif_poll() so overhead at runtime is null) i can test this approach from tomorrow if needed Gilles
Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)
It sounds more like a suboptimal usage of the pthread cancelation helpers than a real issue with the pthread_cancel itself. I do agree the usage is not necessarily straightforward even for a veteran coder, but the related issues remain belong to the realm of implementation not at the conceptual level. George. On Tue, May 13, 2014 at 10:56 PM, Paul Hargrove wrote: > George, > > Just my USD0.02: > > With pthreads many system calls (mostly those that might block) become > "cancellation points" where the implementation checks if the callinf thread > has been cancelled. > This means that a thread making any of those calls may simply never return > (calling pthread_exit() internally), unless extra work has been done to > prevent this default behavior. > This makes it very hard to write code that properly cleans up its resources, > including (but not limited to) file descriptors and malloc()ed memory. > Even if Open MPI is written very carefully, one cannot assume that all the > libraries it calls (and their dependencies, etc.) are written to properly > deal with cancellation. > > -Paul > > > On Tue, May 13, 2014 at 7:32 PM, George Bosilca wrote: >> >> I heard multiple references to pthread_cancel being known to have bad >> side effects. Can somebody educate my on this topic please? >> >> Thanks, >> George. >> >> >> >> On Tue, May 13, 2014 at 10:25 PM, Ralph Castain wrote: >> > It could be a bug in the software stack, though I wouldn't count on it. >> > Unfortunately, pthread_cancel is known to have bad side effects, and so we >> > avoid its use. >> > >> > The key here is that the thread must detect that the file descriptor has >> > closed and exit, or use some other method for detecting that it should >> > terminate. We do this in multiple other places in the code, without using >> > pthread_cancel and without hanging. So it is certainly doable. >> > >> > I don't know the specifics of why Nathan's code is having trouble >> > exiting, but I suspect that a simple solution - not involving >> > pthread_cancel >> > - can be readily developed. >> > >> > >> > On May 13, 2014, at 7:18 PM, Gilles Gouaillardet >> > wrote: >> > >> >> Folks, >> >> >> >> i would like to comment on r31738 : >> >> >> >>> There is no reason to cancel the listening thread. It should die >> >>> automatically when the file descriptor is closed. >> >> i could not agree more >> >>> It is sufficient to just wait for the thread to exit with pthread >> >>> join. >> >> unfortunatly, at least in my test environment (an outdated MPSS 2.1) it >> >> is *not* :-( >> >> >> >> this is what i described in #4615 >> >> https://svn.open-mpi.org/trac/ompi/ticket/4615 >> >> in which i attached scif_hang.c that evidences that (at least in my >> >> environment) >> >> scif_poll(...) does *not* return after scif_close(...) is closed, and >> >> hence the scif pthread never ends. >> >> >> >> this is likely a bug in MPSS and it might have been fixed in earlier >> >> release. >> >> >> >> Nathan, could you try scif_hang in your environment and report the MPSS >> >> version you are running ? >> >> >> >> >> >> bottom line, and once again, in my test environment, pthread_join (...) >> >> without pthread_cancel(...) >> >> might cause a hang when the btl/scif module is released. >> >> >> >> >> >> assuming the bug is in old MPSS and has been fixed in recent releases, >> >> what is the OpenMPI policy ? >> >> a) test the MPSS version and call pthread_cancel() or do *not* call >> >> pthread_join if buggy MPSS is detected ? >> >> b) display an error/warning if a buggy MPSS is detected ? >> >> c) do not call pthread_join at all ? /* SIGSEGV might occur with older >> >> MPSS, it is in MPI_Finalize() so impact is limited */ >> >> d) do nothing, let the btl/scif module hang, this is *not* an OpenMPI >> >> problem after all ? >> >> e) something else ? >> >> >> >> Gilles >> >> ___ >> >> devel mailing list >> >> de...@open-mpi.org >> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> Link to this post: >> >> http://www.open-mpi.org/community/lists/devel/2014/05/14786.php >> > >> > ___ >> > devel mailing list >> > de...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > Link to this post: >> > http://www.open-mpi.org/community/lists/devel/2014/05/14787.php >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/05/14788.php > > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > ___ > devel mailing list > de...@open-mpi.org > Subs
Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)
On Wed, May 14, 2014 at 07:55:54AM -0700, Ralph Castain wrote: > Couple of suggestions: > > * detect that this is an older scif lib and just don't build or enable the > scif btl > > * have a flag that indicates "you should exit", and then tickle the fd so > scif_poll exits Thinking along these lines now. I can initiate a connection to the local process which will wake up the thread. I didn't do that before because scif_poll should return on hangup. -Nathan pgpoGZMSdrGZ_.pgp Description: PGP signature
Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)
There seems to be a consensus on the fact that closing an fd should trigger the return from poll. Unfortunately this assumption is wrong, and not condoned by any documentation available online. To be more clear, all documentation I know tend to point in the opposite direction: it is unwise to close a socket some other thread is polling onto. As an example on the Linux close man page there is a warning about this usage: > It is probably unwise to close file descriptors while they may be in use by > system calls in other threads in the same process. Since a file descriptor > may be reused, there are some obscure race conditions that may cause > unintended side effects. Extra info available at http://stackoverflow.com/questions/10561602/closing-a-file-descriptor-that-is-being-polled George. On May 13, 2014, at 22:18 , Gilles Gouaillardet wrote: > Folks, > > i would like to comment on r31738 : > >> There is no reason to cancel the listening thread. It should die >> automatically when the file descriptor is closed. > i could not agree more >> It is sufficient to just wait for the thread to exit with pthread join. > unfortunatly, at least in my test environment (an outdated MPSS 2.1) it > is *not* :-( > > this is what i described in #4615 > https://svn.open-mpi.org/trac/ompi/ticket/4615 > in which i attached scif_hang.c that evidences that (at least in my > environment) > scif_poll(...) does *not* return after scif_close(...) is closed, and > hence the scif pthread never ends. > > this is likely a bug in MPSS and it might have been fixed in earlier > release. > > Nathan, could you try scif_hang in your environment and report the MPSS > version you are running ? > > > bottom line, and once again, in my test environment, pthread_join (...) > without pthread_cancel(...) > might cause a hang when the btl/scif module is released. > > > assuming the bug is in old MPSS and has been fixed in recent releases, > what is the OpenMPI policy ? > a) test the MPSS version and call pthread_cancel() or do *not* call > pthread_join if buggy MPSS is detected ? > b) display an error/warning if a buggy MPSS is detected ? > c) do not call pthread_join at all ? /* SIGSEGV might occur with older > MPSS, it is in MPI_Finalize() so impact is limited */ > d) do nothing, let the btl/scif module hang, this is *not* an OpenMPI > problem after all ? > e) something else ? > > Gilles > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14786.php
Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)
That is exactly how I decided to fix it. It looks like it is working. Please try r31755 when you get a chance. -Nathan On Thu, May 15, 2014 at 12:03:53AM +0900, Gilles Gouaillardet wrote: >Nathan, > >> Looks like this is a scif bug. From the documentation: > >and from the source code, scif_poll(...) simply calls poll(...) > >at least in MPSS 2.1 > >> Since that is not the case I will look through the documentation and >see > >if there is a way other than pthread_cancel. > >what about : > >- use a global variable (a boolean called "close_requested") > >- update the scif thread so it checks close_requested after each >scif_poll, > >and exits if true > >- when closing btl/scif : > > * set close_requested to true > > * scif_connect to myself > > * close this connection > > * pthread_join(...) > >that's a bit heavyweight, but it does the job > >( and we keep an infinite timeout for scif_poll() so overhead at runtime >is null) > >i can test this approach from tomorrow if needed > >Gilles > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14800.php pgpk6L7XcS245.pgp Description: PGP signature
Re: [OMPI devel] Please provide the pshmem_finalize symbol
On 05/14/2014 03:15 PM, Mike Dubman wrote: here it goes, https://svn.open-mpi.org/trac/ompi/changeset/31751 Thank you very much. I will test against the latest nightly builds for trunk and v1.8 and report back. Regards, Bert On Wed, May 14, 2014 at 9:19 AM, Bert Wesarg wrote: -- Dipl.-Inf. Bert Wesarg wiss. Mitarbeiter Technische Universität Dresden Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) 01062 Dresden Tel.: +49 (351) 463-42451 Fax: +49 (351) 463-37773 E-Mail: bert.wes...@tu-dresden.de smime.p7s Description: S/MIME Cryptographic Signature