Re: [OMPI devel] RFC: Linuxes shipping libibverbs
> Is it possible that /sys/class/infiniband directory exist and it is > empty ? In which cases ? Do "modprobe ib_core" on a system with no hardware drivers loaded (or no RDMA hardware installed)
Re: [OMPI devel] Communication between entities
I see, thanks for the explanation! I'm afraid you'll have no choice, though, but to relay the message via the local daemon. I know that creates a window of vulnerability, but it cannot be helped. Passing full contact info for all daemons to all procs would take us back a few steps and cause a whole lot of sockets to be opened... On 5/29/08 8:04 AM, "Leonardo Fialho"wrote: > Ralph, > > I want to implement a receiver based message log (called RADIC > architecture) that stores the log file in another node (than no stable > storage is necessary). > > I developed a wrapper to PML that manage the messages and then store it > locally (or in a stable storage), but now I need to migrate this "log > file" to other node. Only PML need this file (to generate and recovery > after a fail) but ORTE daemon store and manage the files to launch then > when one node dies. > > In this approach ORTE daemon are treated like application "protectors", > and the application are the "protected". > > Thanks, > Leonardo > > > Ralph H Castain escribió: >> There is no way to send a message to a daemon located on another node >> without relaying it through the local daemon. The application procs have no >> knowledge of the contact info for any daemon other than their own, so even >> using the direct routed module would not work. >> >> Can you provide some reason why the normal relay is unacceptable? And why >> the PML would want to communicate with a daemon, which, after all, is -not- >> an MPI process and has no idea what a PML is? >> >> >> On 5/29/08 7:41 AM, "Leonardo Fialho" wrote: >> >> >>> Hi All, >>> >>> If, inside a PML component I need to send a message to the ORTE daemon >>> located in other node, how can I do it? >>> >>> It´s safe to create a thread to manage this communication independently >>> or Open MPI have any service to do it (like RML in ORTE environment)? >>> >>> I saw a socket connection between the application and the local ORTE >>> daemon, but I don´t want to send the message to local ORTE daemon an >>> then it send the same message to que remote ORTE daemon... >>> >>> Thanks, >>> >> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >
Re: [OMPI devel] Communication between entities
Ralph, I want to implement a receiver based message log (called RADIC architecture) that stores the log file in another node (than no stable storage is necessary). I developed a wrapper to PML that manage the messages and then store it locally (or in a stable storage), but now I need to migrate this "log file" to other node. Only PML need this file (to generate and recovery after a fail) but ORTE daemon store and manage the files to launch then when one node dies. In this approach ORTE daemon are treated like application "protectors", and the application are the "protected". Thanks, Leonardo Ralph H Castain escribió: There is no way to send a message to a daemon located on another node without relaying it through the local daemon. The application procs have no knowledge of the contact info for any daemon other than their own, so even using the direct routed module would not work. Can you provide some reason why the normal relay is unacceptable? And why the PML would want to communicate with a daemon, which, after all, is -not- an MPI process and has no idea what a PML is? On 5/29/08 7:41 AM, "Leonardo Fialho"wrote: Hi All, If, inside a PML component I need to send a message to the ORTE daemon located in other node, how can I do it? It´s safe to create a thread to manage this communication independently or Open MPI have any service to do it (like RML in ORTE environment)? I saw a socket connection between the application and the local ORTE daemon, but I don´t want to send the message to local ORTE daemon an then it send the same message to que remote ORTE daemon... Thanks, ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Leonardo Fialho Computer Architecture and Operating Systems Department - CAOS Universidad Autonoma de Barcelona - UAB ETSE, Edifcio Q, QC/3088 http://www.caos.uab.es Phone: +34-93-581-2888 Fax: +34-93-581-2478
Re: [OMPI devel] RFC: Linuxes shipping libibverbs
On May 29, 2008, at 3:27 AM, Pavel Shamis (Pasha) wrote: I got some more feedback from Roland off-list explaining that if / sys/ class/infiniband does exist and is non-empty and /sys/class/ infiniband_verbs/abi_version does not exist, then this is definitely a case where we want to warn because it implies that config is screwed up -- RDMA devices are present but not usable. Is it possible that /sys/class/infiniband directory exist and it is empty ? In which cases ? Roland consistently said "...and not empty" in e-mails to me, so that's what I assumed. However, Pasha just did a test: on a machine with a ConnectX HCA, he manually removed the mlx4 drive and started the openibd service. /sys/ class/infiniband was created, but it was empty. I guess this is a situation that we want to warn about -- we can simplify the whole deal by making the overriding assumption: if the drivers are loaded at all (such that /sys/class/infiniband/ exists at all), OMPI should expect to be able to find some RDMA devices. If it doesn't find any, it should issue a warning. -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Communication between entities
There is no way to send a message to a daemon located on another node without relaying it through the local daemon. The application procs have no knowledge of the contact info for any daemon other than their own, so even using the direct routed module would not work. Can you provide some reason why the normal relay is unacceptable? And why the PML would want to communicate with a daemon, which, after all, is -not- an MPI process and has no idea what a PML is? On 5/29/08 7:41 AM, "Leonardo Fialho"wrote: > Hi All, > > If, inside a PML component I need to send a message to the ORTE daemon > located in other node, how can I do it? > > It´s safe to create a thread to manage this communication independently > or Open MPI have any service to do it (like RML in ORTE environment)? > > I saw a socket connection between the application and the local ORTE > daemon, but I don´t want to send the message to local ORTE daemon an > then it send the same message to que remote ORTE daemon... > > Thanks,
[OMPI devel] Communication between entities
Hi All, If, inside a PML component I need to send a message to the ORTE daemon located in other node, how can I do it? It´s safe to create a thread to manage this communication independently or Open MPI have any service to do it (like RML in ORTE environment)? I saw a socket connection between the application and the local ORTE daemon, but I don´t want to send the message to local ORTE daemon an then it send the same message to que remote ORTE daemon... Thanks, -- Leonardo Fialho Computer Architecture and Operating Systems Department - CAOS Universidad Autonoma de Barcelona - UAB ETSE, Edifcio Q, QC/3088 http://www.caos.uab.es Phone: +34-93-581-2888 Fax: +34-93-581-2478
Re: [OMPI devel] RFC: Linuxes shipping libibverbs
I got some more feedback from Roland off-list explaining that if /sys/ class/infiniband does exist and is non-empty and /sys/class/ infiniband_verbs/abi_version does not exist, then this is definitely a case where we want to warn because it implies that config is screwed up -- RDMA devices are present but not usable. Is it possible that /sys/class/infiniband directory exist and it is empty ? In which cases ?
Re: [OMPI devel] Notes from mem hooks call today
Hi Roland, Roland Dreier wrote: Stick in a separate library then? I don't think we want the complexity in the kernel -- I personally would argue against merging it upstream; and given that the userspace solution is actually faster, it becomes pretty hard to justify. Memory registration has always been expensive, so it's not in the critical path (not used for small messages and a system call overhead is nothing for large messages in MPI). Sure, you can have the kernel notify the user space through mapped flags, but it's a bit ugly IMHO. There are cases where the basic registration already uses the same infrastructure as a regcache. For example, on Solaris, MacOSX and Linux PowerPC, you really want to register segments as large as possible to limit the IOMMU overhead. You also don't want to register multiple time the same page with overlapping registrations, because the IOMMU space is limited. In short, you already have a registration cache in the driver. However, if the user space is expected to call register/deregister often, then I agree that the cache better be in user space. The big picture is that it's not really important where the regcache lives, as long as it's out of MPI. Patrick