> -----Original Message----- > From: Jack Morgenstein [mailto:ja...@dev.mellanox.co.il] > Sent: Friday, December 21, 2007 2:32 AM > To: Tang, Changqing > Cc: pa...@dev.mellanox.co.il; > mvapich-disc...@cse.ohio-state.edu; > gene...@lists.openfabrics.org; Open MPI Developers > Subject: Re: [ofa-general] [RFC] XRC -- make receiving XRC QP > independent of any one user process > > On Thursday 20 December 2007 18:24, Tang, Changqing wrote: > > If I have a MPI server processes on a node, many other MPI > > client processes will dynamically connect/disconnect with > the server. The server use same XRC domain. > > > > Will this cause accumulating the "kernel" QP for such > > application ? we want the server to run 365 days a year. > > Yes, it will. I have no way of knowing when a given > receiving XRC QP is no longer needed -- except when the > domain it belongs to is finally closed. > > I don't see that adding a userspace "destroy" verb for this > QP will help:
This kernel QP is for receiving only, so when there is no activity on this QP, can the kernel sends a heart-beat message to check if the remote sending QP is still there (still connected) ? if not, the kernel is safe to cleanup this qp. So whenever the RC connection is broken, kernel can destroy this QP. > > The only one who actually knows that the XRC QP is no longer > required is the userspace process which created the QP at the > remote end of the RC connection of the receiving XRC QP. > > This remote process can only send a request to destroy the QP > to some local process (via its own private protocol). > However, you pointed out that the process which originally > created the QP may not be around any more (this was the > source of the problem which led to the RFC in this thread) -- > and sending the destroy request to all the remote processes > on that node which it communicates with is REALLY ugly. > > I'm not familiar with MPI, so this may be a silly question: > Can the MPI server process create a new domain for each > client process, and destroy that domain when the client > process is done (i.e., is this MPI server process a > supervisor of resources for distributed computations (but is > not a participant in these computations)?). The server could be process group across multiple nodes, there are parallel database searching engine, for example. > > (Actually, what I'm asking -- is it possible to allocate a > new XRC domain for a distributed computation, and destroy > that domain at the end of that computation?) Yes, it could, but it makes MPI harder to manage the code. And also we have a connect/accept speed concern. We hope not to do it this way. --CQ > > > -- Jack >