Re: [OMPI devel] RFC: Linuxes shipping libibverbs
On Thu, May 22, 2008 at 04:19:05PM -0400, Jeff Squyres wrote: > On May 22, 2008, at 4:07 PM, Dirk Eddelbuettel wrote: > > > Is there a test I could run for you? > > Can you see if /dev/infiniband exists? If it does, the OpenFabrics > kernel drivers are running. If not, they aren't. Either that or udev in not configured properly. > > > Also, if this test depends on the Debian kernel packages, then we're > > back to square one as some folks (like myself) run binary kernels, > > other may just hand-compile and this test may not work as we may miss > > the 'Debian trigger' in those cases. > > > The OpenFabrics kernel drivers are implemented as kernel modules, so > it's mainly just a question of loading them it to start them running. > For example, in the official OFED distribution, it comes with /etc/ > init.d/openibd -- "start" loads the kernel modules and does all the > necessary initialization, "stop" unloads everything, etc. > ib_core/mthca/mlx4 should be loaded automatically by hotplug if HW is present. No need for any additional configuration. -- Gleb.
Re: [OMPI devel] RFC: Linuxes shipping libibverbs
On Thu, May 22, 2008 at 08:30:52PM +, Dirk Eddelbuettel wrote: > > > Also, if this test depends on the Debian kernel packages, then we're > > > back to square one as some folks (like myself) run binary kernels, > > > other may just hand-compile and this test may not work as we may miss > > > the 'Debian trigger' in those cases. > > > > > > The OpenFabrics kernel drivers are implemented as kernel modules, so > > it's mainly just a question of loading them it to start them running. > > For example, in the official OFED distribution, it comes with /etc/ > > Do you have any information whether OFED is in fact packaged for > Debian? It may not be, and hence no file ... > AFAIK OFED is not packaged for debian. Ronald packages IB for debian. -- Gleb.
Re: [OMPI devel] RFC: Linuxes shipping libibverbs
On Fri, May 23, 2008 at 09:56:44AM +0300, Gleb Natapov wrote: > On Thu, May 22, 2008 at 08:30:52PM +, Dirk Eddelbuettel wrote: > > > > Also, if this test depends on the Debian kernel packages, then we're > > > > back to square one as some folks (like myself) run binary kernels, > > > > other may just hand-compile and this test may not work as we may miss > > > > the 'Debian trigger' in those cases. > > > > > > > > > The OpenFabrics kernel drivers are implemented as kernel modules, so > > > it's mainly just a question of loading them it to start them running. > > > For example, in the official OFED distribution, it comes with /etc/ > > > > Do you have any information whether OFED is in fact packaged for > > Debian? It may not be, and hence no file ... > > > AFAIK OFED is not packaged for debian. Ronald packages IB for debian. Correct, that is my understanding too. Good point also re udev. Dirk -- Three out of two people have difficulties with fractions.
Re: [OMPI devel] [OMPI users] Open MPI Linux Expectations
FWIW, I always build for the version of Linux that I'm currently running. On May 22, 2008, at 5:33 PM, Don Kerr wrote: Can anyone set my expectations with their real world experiences regarding building Open MPI on one release of Linux and running on another. If I were to... Build OMPI on Redhat 4, will it run on later releases of Redhat, e.g. Redhat 5? Build OMPI on Suse 9, will it run on later releases of Suse, e.g. Suse 10? Build OMPI on Redhat, will it run on Suse? Build OMPI on Suse, will it run on Redhat? Thanks in advance for your insights. -DON ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Question about priority
We may not have this uniform throughout the code base -- this is one of the things we wanted to talk about in the Bay area meeting. I believe that the allowable range for priorities should be [0, 100], and that if you don't want to be selected, you should return NULL (or use some other mechanism to indicate that you didn't want to be selected). That was the original intent of the MCA selection mechanisms, at least. Josh -- is this consistent with what you found when you consolidated a lot of this stuff? On May 22, 2008, at 11:30 AM, Rolf vandeVaart wrote: I know there was some recent discussion about priority of components, but I wanted to double check. I am trying to understand what priority = 0 means. My assumption is the following: priority >= 0 means the component is selectable priority < 0 means the component is not selectable I ask this because in some of the collective code it looks like a priority = 0 means not selectable. Not a big deal, but I am trying to fix a memory leak and I need to get this piece right. And I assume that priority < 0 will give one the same behavior as ^component but the code paths within Open MPI would be different. Rolf ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
[OMPI devel] Memory hooks stuff
Brian and I were chatting the other day about random OMPI stuff and the topic of the memory hooks came up again. Brian was wondering if we should [finally] revisit this topic -- there's a few things that could be done to make life "better". Two things jump to mind: - using mallopt on Linux - doing *something* on Solaris It would probably be worthwhile to have a teleconf about this in the near future for anyone who is interested. I propose any time before 4pm US Eastern on Wednesday, 28 May, 2008. Who would be interested in discussing this stuff? (me, Brian, ? someone from Sun?, ...?) -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Question about priority
Yeah (Sorry I didn't reply earlier). Each component is asked for at least two items of information: priority (int), and module (struct *). The priority can range from [INT_MIN | INT_MAX] with the highest priority selected, even if that priority is negative. If the component does not want to be selected then it should return NULL for the module value. This indicates to the selection logic that no matter what the priority is set to the component should not be a candidate for selection. So a component is selectable if it returns a non-NULL value for the module struct, and is not selectable if it returns NULL. The priority only indicates relative rank between all available components. Does that make sense? I should probably add this comment to the mca_base_select function to preserve it. I'll make a bug for it so it doesn't get lost. -- Josh On May 23, 2008, at 7:14 AM, Jeff Squyres wrote: We may not have this uniform throughout the code base -- this is one of the things we wanted to talk about in the Bay area meeting. I believe that the allowable range for priorities should be [0, 100], and that if you don't want to be selected, you should return NULL (or use some other mechanism to indicate that you didn't want to be selected). That was the original intent of the MCA selection mechanisms, at least. Josh -- is this consistent with what you found when you consolidated a lot of this stuff? On May 22, 2008, at 11:30 AM, Rolf vandeVaart wrote: I know there was some recent discussion about priority of components, but I wanted to double check. I am trying to understand what priority = 0 means. My assumption is the following: priority >= 0 means the component is selectable priority < 0 means the component is not selectable I ask this because in some of the collective code it looks like a priority = 0 means not selectable. Not a big deal, but I am trying to fix a memory leak and I need to get this piece right. And I assume that priority < 0 will give one the same behavior as ^component but the code paths within Open MPI would be different. Rolf ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Question about priority
This mostly makes sense. But let me probe a little more. Can a component return NULL if it looks at its priority and the priority is less than or equal to 0? For example, currently the hierarch component returns NULL when its priority is equal or less than 0. This means that as a user when I set the priority to 0 I am indicating that I do not want the hierarch component selected at all. Or, is the priority only used to specify relative behavior. So, it is not to be used to completely deselect a component. To deselect, you would need to use the ^component format. That is where I am confused. Rolf Josh Hursey wrote: Yeah (Sorry I didn't reply earlier). Each component is asked for at least two items of information: priority (int), and module (struct *). The priority can range from [INT_MIN | INT_MAX] with the highest priority selected, even if that priority is negative. If the component does not want to be selected then it should return NULL for the module value. This indicates to the selection logic that no matter what the priority is set to the component should not be a candidate for selection. So a component is selectable if it returns a non-NULL value for the module struct, and is not selectable if it returns NULL. The priority only indicates relative rank between all available components. Does that make sense? I should probably add this comment to the mca_base_select function to preserve it. I'll make a bug for it so it doesn't get lost. -- Josh On May 23, 2008, at 7:14 AM, Jeff Squyres wrote: We may not have this uniform throughout the code base -- this is one of the things we wanted to talk about in the Bay area meeting. I believe that the allowable range for priorities should be [0, 100], and that if you don't want to be selected, you should return NULL (or use some other mechanism to indicate that you didn't want to be selected). That was the original intent of the MCA selection mechanisms, at least. Josh -- is this consistent with what you found when you consolidated a lot of this stuff? On May 22, 2008, at 11:30 AM, Rolf vandeVaart wrote: I know there was some recent discussion about priority of components, but I wanted to double check. I am trying to understand what priority = 0 means. My assumption is the following: priority >= 0 means the component is selectable priority < 0 means the component is not selectable I ask this because in some of the collective code it looks like a priority = 0 means not selectable. Not a big deal, but I am trying to fix a memory leak and I need to get this piece right. And I assume that priority < 0 will give one the same behavior as ^component but the code paths within Open MPI would be different. Rolf ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- = rolf.vandeva...@sun.com 781-442-3043 =
Re: [OMPI devel] Memory hooks stuff
Jeff Squyres wrote: Brian and I were chatting the other day about random OMPI stuff and the topic of the memory hooks came up again. Brian was wondering if we should [finally] revisit this topic -- there's a few things that could be done to make life "better". Two things jump to mind: - using mallopt on Linux - doing *something* on Solaris It would probably be worthwhile to have a teleconf about this in the near future for anyone who is interested. I propose any time before 4pm US Eastern on Wednesday, 28 May, 2008. Who would be interested in discussing this stuff? (me, Brian, ? someone from Sun?, ...?) I would be interested in attending this discussion for Sun. Note, I am *not* available at 11am-12pm and 1pm-2pm ET this Wednesday. --td
Re: [OMPI devel] Memory hooks stuff
On Fri, May 23, 2008 at 07:19:01AM -0400, Jeff Squyres wrote: > Brian and I were chatting the other day about random OMPI stuff and > the topic of the memory hooks came up again. Brian was wondering if > we should [finally] revisit this topic -- there's a few things that > could be done to make life "better". Two things jump to mind: > > - using mallopt on Linux > - doing *something* on Solaris > > It would probably be worthwhile to have a teleconf about this in the > near future for anyone who is interested. I propose any time before > 4pm US Eastern on Wednesday, 28 May, 2008. > > Who would be interested in discussing this stuff? (me, Brian, ? > someone from Sun?, ...?) > Me. -- Gleb.
Re: [OMPI devel] Question about priority
I think that technically, the component can do whatever it wants (e.g., look at its priority, see 0, and decide to return NULL). However, to be consistent, we should decide on a specific behavior and make it uniform to all components. I'd opt for the ^foo notation to disable a component. On May 23, 2008, at 8:14 AM, Rolf Vandevaart wrote: This mostly makes sense. But let me probe a little more. Can a component return NULL if it looks at its priority and the priority is less than or equal to 0? For example, currently the hierarch component returns NULL when its priority is equal or less than 0. This means that as a user when I set the priority to 0 I am indicating that I do not want the hierarch component selected at all. Or, is the priority only used to specify relative behavior. So, it is not to be used to completely deselect a component. To deselect, you would need to use the ^component format. That is where I am confused. Rolf Josh Hursey wrote: Yeah (Sorry I didn't reply earlier). Each component is asked for at least two items of information: priority (int), and module (struct *). The priority can range from [INT_MIN | INT_MAX] with the highest priority selected, even if that priority is negative. If the component does not want to be selected then it should return NULL for the module value. This indicates to the selection logic that no matter what the priority is set to the component should not be a candidate for selection. So a component is selectable if it returns a non-NULL value for the module struct, and is not selectable if it returns NULL. The priority only indicates relative rank between all available components. Does that make sense? I should probably add this comment to the mca_base_select function to preserve it. I'll make a bug for it so it doesn't get lost. -- Josh On May 23, 2008, at 7:14 AM, Jeff Squyres wrote: We may not have this uniform throughout the code base -- this is one of the things we wanted to talk about in the Bay area meeting. I believe that the allowable range for priorities should be [0, 100], and that if you don't want to be selected, you should return NULL (or use some other mechanism to indicate that you didn't want to be selected). That was the original intent of the MCA selection mechanisms, at least. Josh -- is this consistent with what you found when you consolidated a lot of this stuff? On May 22, 2008, at 11:30 AM, Rolf vandeVaart wrote: I know there was some recent discussion about priority of components, but I wanted to double check. I am trying to understand what priority = 0 means. My assumption is the following: priority >= 0 means the component is selectable priority < 0 means the component is not selectable I ask this because in some of the collective code it looks like a priority = 0 means not selectable. Not a big deal, but I am trying to fix a memory leak and I need to get this piece right. And I assume that priority < 0 will give one the same behavior as ^component but the code paths within Open MPI would be different. Rolf ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- = rolf.vandeva...@sun.com 781-442-3043 = ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Question about priority
Unfortunately, as Jeff pointed out, the behavior of frameworks and components in determining component selection is not consistent in the codebase. The mca_base_select() commit made things much better, but there are still frameworks that do not (or cannot) use it, and there are some behaviors that are just not well defined. Consistency issues lead to user (and developer) confusion and degrade the image of the project. For exactly those reasons I want to talk about a number of such issues in one of our technical meeting this summer (this issue is currently scheduled for the July meeting). The goal is to come out of that meeting with a coding standard behavior for components during open/selection/close. Frameworks and components can diverge from this base standard, but then it is the responsibility of the component writer to make sure this is clearly communicated to users about expectations. To answer your question though, an individual component can determine what to return for the {priority,module} pair based on anything it wishes. For instance the SLURM PLM component will return NULL if it does not see the correct environment variables, and a working module if it does. Collectives are a special type of framework so the selection logic there is specialized, meaning it does not use the mca_base_select function, but uses a more custom version of select. If you supply "^component" then the component is never opened and thus never queried during selection. If you specify 0 for the priority of the hierarch component the the component is opened, and will just return NULL during selection. If you specify > 0 for the priority then the hierarch component will return a module to the selection code. This module will be used if the hierarch component has the 'best' priority, otherwise the hierarch component should be closed [hierarch_component_close] at the end of the selection code. Determining the 'best' priority and whether or not the components are closed at the end of selection is determined by the coll/base select function. I think I may have just made things seem more complex than they probably are. -- Josh On May 23, 2008, at 8:28 AM, Jeff Squyres wrote: I think that technically, the component can do whatever it wants (e.g., look at its priority, see 0, and decide to return NULL). However, to be consistent, we should decide on a specific behavior and make it uniform to all components. I'd opt for the ^foo notation to disable a component. On May 23, 2008, at 8:14 AM, Rolf Vandevaart wrote: This mostly makes sense. But let me probe a little more. Can a component return NULL if it looks at its priority and the priority is less than or equal to 0? For example, currently the hierarch component returns NULL when its priority is equal or less than 0. This means that as a user when I set the priority to 0 I am indicating that I do not want the hierarch component selected at all. Or, is the priority only used to specify relative behavior. So, it is not to be used to completely deselect a component. To deselect, you would need to use the ^component format. That is where I am confused. Rolf Josh Hursey wrote: Yeah (Sorry I didn't reply earlier). Each component is asked for at least two items of information: priority (int), and module (struct *). The priority can range from [INT_MIN | INT_MAX] with the highest priority selected, even if that priority is negative. If the component does not want to be selected then it should return NULL for the module value. This indicates to the selection logic that no matter what the priority is set to the component should not be a candidate for selection. So a component is selectable if it returns a non-NULL value for the module struct, and is not selectable if it returns NULL. The priority only indicates relative rank between all available components. Does that make sense? I should probably add this comment to the mca_base_select function to preserve it. I'll make a bug for it so it doesn't get lost. -- Josh On May 23, 2008, at 7:14 AM, Jeff Squyres wrote: We may not have this uniform throughout the code base -- this is one of the things we wanted to talk about in the Bay area meeting. I believe that the allowable range for priorities should be [0, 100], and that if you don't want to be selected, you should return NULL (or use some other mechanism to indicate that you didn't want to be selected). That was the original intent of the MCA selection mechanisms, at least. Josh -- is this consistent with what you found when you consolidated a lot of this stuff? On May 22, 2008, at 11:30 AM, Rolf vandeVaart wrote: I know there was some recent discussion about priority of components, but I wanted to double check. I am trying to understand what priority = 0 means. My assumption is the following: priority >= 0 means the component is selectable priority < 0 means the component is
Re: [OMPI devel] RFC: Linuxes shipping libibverbs
> OFED is one distribution of the OpenFabrics software. It can be > bundled up and packaged differently, too. I suspect that Debian does > not include OFED directly, because OFED is pretty heavily dependent > upon RPM. So the OpenFabrics kernel bits must be there somewhere > (libibverbs would be useless, otherwise); it would be nice to > understand how they are activated: either manually or automatically. "OpenFabrics kernel bits" doesn't really make sense. Debian just ships a Linux kernel, which has InfiniBand/RDMA drivers. Debian doesn't load the ib_uverbs module by default, nor should it, since the vast majority of users don't have RDMA hardware. So libibverbs and Open MPI should act sanely when no kernel drivers are loaded, /sys/class/infinibad_verbs doesn't exist, etc. There is already a Debian bug open about this for libibverbs: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=418014 I've been meaning to work on this but sadly I have not been able to put much time into it. - R.
Re: [OMPI devel] RFC: Linuxes shipping libibverbs
> Either that or udev in not configured properly. Debian has a correct udev configuration, modulo http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=449081 > ib_core/mthca/mlx4 should be loaded automatically by hotplug if HW is > present. No need for any additional configuration. Yes (although only mlx4_core and not mlx4_ib will be loaded based on PCI IDs), but nothing loads ib_uverbs automatically, and systems that have no RDMA hardware will obviously not have any RDMA drivers autoloaded. - R.
Re: [OMPI devel] Question about priority
On May 23, 2008, at 9:56 AM, Josh Hursey wrote: Unfortunately, as Jeff pointed out, the behavior of frameworks and components in determining component selection is not consistent in the codebase. The mca_base_select() commit made things much better, but there are still frameworks that do not (or cannot) use it, and there are some behaviors that are just not well defined. Consistency issues lead to user (and developer) confusion and degrade the image of the project. For exactly those reasons I want to talk about a number of such issues in one of our technical meeting this summer (this issue is currently scheduled for the July meeting). The goal is to come out of that meeting with a coding standard behavior for components during open/selection/close. Frameworks and components can diverge from this base standard, but then it is the responsibility of the component writer to make sure this is clearly communicated to users about expectations. This is a pretty strong statement and some examples are welcomed. Anyway, we already have a coding standard for the component manipulation, and apparently there are cases when we need a hand crafted selection logic (such as collectives as Josh pointed it out). The ^component is managed at the bottom layer, where we create the list of components to be opened, so this is consistent across the board. To answer your question though, an individual component can determine what to return for the {priority,module} pair based on anything it wishes. For instance the SLURM PLM component will return NULL if it does not see the correct environment variables, and a working module if it does. Collectives are a special type of framework so the selection logic there is specialized, meaning it does not use the mca_base_select function, but uses a more custom version of select. If you supply "^component" then the component is never opened and thus never queried during selection. If you specify 0 for the priority of the hierarch component the the component is opened, and will just return NULL during selection. If you specify > 0 for the priority then the hierarch component will return a module to the selection code. This module will be used if the hierarch component has the 'best' priority, otherwise the hierarch component should be closed [hierarch_component_close] at the end of the selection code. Determining the 'best' priority and whether or not the components are closed at the end of selection is determined by the coll/base select function. I think I may have just made things seem more complex than they probably are. I don't think so. For me the process is straightforward. Here are the possible scenarios: 1) ^component behave as if the corresponding file (i.e. shared library) is not available. 2) init returning a NULL module, means that this component do not desire to be selected. There is no need to clarify the reason why, the outcome is that the component selected to be ignored. 3) returning a non NULL module and a priority allow the selection logic to include the specified module in the selection process. Of course the selection process is different for some framework, but this is to be expected. Keep in mind that while there are one-to-one framework (such as the IO subsystem and the PML) and many-to-one framework (such as the BTLs and the collectives) the priority always allow the selector to order the modules based on the decreasing priority. Then, based on the type of the framework (one-to-one or many- to-one), the selector pick the first or all modules from the list and close the others. As I said ... straightforward :) george. -- Josh On May 23, 2008, at 8:28 AM, Jeff Squyres wrote: I think that technically, the component can do whatever it wants (e.g., look at its priority, see 0, and decide to return NULL). However, to be consistent, we should decide on a specific behavior and make it uniform to all components. I'd opt for the ^foo notation to disable a component. On May 23, 2008, at 8:14 AM, Rolf Vandevaart wrote: This mostly makes sense. But let me probe a little more. Can a component return NULL if it looks at its priority and the priority is less than or equal to 0? For example, currently the hierarch component returns NULL when its priority is equal or less than 0. This means that as a user when I set the priority to 0 I am indicating that I do not want the hierarch component selected at all. Or, is the priority only used to specify relative behavior. So, it is not to be used to completely deselect a component. To deselect, you would need to use the ^component format. That is where I am confused. Rolf Josh Hursey wrote: Yeah (Sorry I didn't reply earlier). Each component is asked for at least two items of information: priority (int), and module (struct *). The priority can range from [INT_MIN | INT_MAX] with the highest priority selected, even if that priority is negative. If the compone
Re: [OMPI devel] [OMPI users] Open MPI Linux Expectations
I build on Debian 4.0 and run on Suse 10 and Fedore Core 6. The only thing I had to enforce is the availability of the corresponding libc library (the one I build with) on the target OS. Moreover, as my nodes have different processors, I have to enforce strict x86 code. george. On May 22, 2008, at 5:33 PM, Don Kerr wrote: Can anyone set my expectations with their real world experiences regarding building Open MPI on one release of Linux and running on another. If I were to... Build OMPI on Redhat 4, will it run on later releases of Redhat, e.g. Redhat 5? Build OMPI on Suse 9, will it run on later releases of Suse, e.g. Suse 10? Build OMPI on Redhat, will it run on Suse? Build OMPI on Suse, will it run on Redhat? Thanks in advance for your insights. -DON ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users smime.p7s Description: S/MIME cryptographic signature