Re: [OMPI devel] [RFC] Default hostfile MCA param
Tim Prins wrote: We have used '^' elsewhere to indicate not, so maybe just have the syntax be if you put '^' at the beginning of a line, that node is not used. So we could have: n0 n1 ^headnode n3 this would sound fine for me. I understand the idea of having a flag to indicate that all nodes below a certain point should be ignored, but I think this might get confusing, and I'm unsure how useful it would be. I just see the usefulness of this to block out a couple of nodes by default. Besides, if you do want to block out many nodes, any reasonable text editor allows you to insert '^' in front of any number of lines easily. Alternatively, for the particular situation that Edgar mentions, it may be good enough just to set rmaps_base_no_schedule_local in the mca params default file. hm, ok, here is another flag which I was not aware of. Anyway, I can think of other scenarios where this feature could be useful, e.g. when hunting down performance problems on a cluster and you would like to avoid to have to get a new allocation or do a major rewrite of the hostfile every time. Or including an I/O node into an allocation (in order to have it exclusively), but make sure that no MPI process gets scheduled onto the node. Thanks Edgar One question though: If I am in a slurm allocation which contains n1, and there is a default hostfile that contains "^n1", will I run on 'n1'? I'm not sure what the answer is, I know we talked about the precedence earlier... Tim Ralph H Castain wrote: I personally have no objection, but I would ask then that the wiki be modified to cover this case. All I require is that someone define the syntax to be used to indicate "this is a node I do -not- want used", or alternatively a flag that indicates "all nodes below are -not- to be used". Implementation isn't too hard once I have that... On 3/3/08 9:44 AM, "Edgar Gabriel" wrote: Ralph, could this mechanism be used also to exclude a node, indicating to never run a job there? Here is the problem that I face quite often: students working on the homework forget to allocate a partition on the cluster, and just type mpirun. Because of that, all jobs end up running on the front-end node. If we would have now the ability to specify in a default hostfile, to never run a job on a specified node (e.g. the front end node), users would get an error message when trying to do that. I am aware that that's a little ugly... THanks edgar Ralph Castain wrote: I forget all the formatting we are supposed to use, so I hope you'll all just bear with me. George brought up the fact that we used to have an MCA param to specify a hostfile to use for a job. The hostfile behavior described on the wiki, however, doesn't provide for that option. It associates a hostfile with a specific app_context, and provides a detailed hierarchical layout of how mpirun is to interpret that information. What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile" to replace the deprecated capability. If found, the system's behavior will be: 1. in a managed environment, the default hostfile will be used to filter the discovered nodes to define the available node pool. Any hostfile and/or dash host options provided to an app_context will be used to further filter the node pool to define the specific nodes for use by that app_context. Thus, nodes in the hostfile and dash host options given to an app_context -must- also be in the default hostfile in order to be available for use by that app_context - any nodes in the app_context options that are not in the default hostfile will be ignored. 2. in an unmanaged environment, the default hostfile will be used to define the available node pool. Any hostfile and/or dash host options provided to an app_context will be used to filter the node pool to define the specific nodes for use by that app_context, subject to the previous caveat. However, add-hostfile and add-host options will add nodes to the node pool for use -only- by the associated app_context. I believe this proposed behavior is consistent with that described on the wiki, and would be relatively easy to implement. If nobody objects, I will do so by end-of-day 3/6. Comments, suggestions, objections - all are welcome! Ralph ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
Re: [OMPI devel] [RFC] Default hostfile MCA param
On 3/4/08 5:51 AM, "Tim Prins" wrote: > We have used '^' elsewhere to indicate not, so maybe just have the > syntax be if you put '^' at the beginning of a line, that node is not used. > > So we could have: > n0 > n1 > ^headnode > n3 > That works for me and sounds like the right solution. > I understand the idea of having a flag to indicate that all nodes below > a certain point should be ignored, but I think this might get confusing, > and I'm unsure how useful it would be. I just see the usefulness of this > to block out a couple of nodes by default. Besides, if you do want to > block out many nodes, any reasonable text editor allows you to insert > '^' in front of any number of lines easily. > > Alternatively, for the particular situation that Edgar mentions, it may > be good enough just to set rmaps_base_no_schedule_local in the mca > params default file. > > One question though: If I am in a slurm allocation which contains n1, > and there is a default hostfile that contains "^n1", will I run on 'n1'? According to the precedence rules in the wiki, you would -not- run on n1. > > I'm not sure what the answer is, I know we talked about the precedence > earlier... > > Tim > > Ralph H Castain wrote: >> I personally have no objection, but I would ask then that the wiki be >> modified to cover this case. All I require is that someone define the syntax >> to be used to indicate "this is a node I do -not- want used", or >> alternatively a flag that indicates "all nodes below are -not- to be used". >> >> Implementation isn't too hard once I have that... >> >> >> On 3/3/08 9:44 AM, "Edgar Gabriel" wrote: >> >>> Ralph, >>> >>> could this mechanism be used also to exclude a node, indicating to never >>> run a job there? Here is the problem that I face quite often: students >>> working on the homework forget to allocate a partition on the cluster, >>> and just type mpirun. Because of that, all jobs end up running on the >>> front-end node. >>> >>> If we would have now the ability to specify in a default hostfile, to >>> never run a job on a specified node (e.g. the front end node), users >>> would get an error message when trying to do that. I am aware that >>> that's a little ugly... >>> >>> THanks >>> edgar >>> >>> Ralph Castain wrote: I forget all the formatting we are supposed to use, so I hope you'll all just bear with me. George brought up the fact that we used to have an MCA param to specify a hostfile to use for a job. The hostfile behavior described on the wiki, however, doesn't provide for that option. It associates a hostfile with a specific app_context, and provides a detailed hierarchical layout of how mpirun is to interpret that information. What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile" to replace the deprecated capability. If found, the system's behavior will be: 1. in a managed environment, the default hostfile will be used to filter the discovered nodes to define the available node pool. Any hostfile and/or dash host options provided to an app_context will be used to further filter the node pool to define the specific nodes for use by that app_context. Thus, nodes in the hostfile and dash host options given to an app_context -must- also be in the default hostfile in order to be available for use by that app_context - any nodes in the app_context options that are not in the default hostfile will be ignored. 2. in an unmanaged environment, the default hostfile will be used to define the available node pool. Any hostfile and/or dash host options provided to an app_context will be used to filter the node pool to define the specific nodes for use by that app_context, subject to the previous caveat. However, add-hostfile and add-host options will add nodes to the node pool for use -only- by the associated app_context. I believe this proposed behavior is consistent with that described on the wiki, and would be relatively easy to implement. If nobody objects, I will do so by end-of-day 3/6. Comments, suggestions, objections - all are welcome! Ralph ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [RFC] Default hostfile MCA param
We have used '^' elsewhere to indicate not, so maybe just have the syntax be if you put '^' at the beginning of a line, that node is not used. So we could have: n0 n1 ^headnode n3 I understand the idea of having a flag to indicate that all nodes below a certain point should be ignored, but I think this might get confusing, and I'm unsure how useful it would be. I just see the usefulness of this to block out a couple of nodes by default. Besides, if you do want to block out many nodes, any reasonable text editor allows you to insert '^' in front of any number of lines easily. Alternatively, for the particular situation that Edgar mentions, it may be good enough just to set rmaps_base_no_schedule_local in the mca params default file. One question though: If I am in a slurm allocation which contains n1, and there is a default hostfile that contains "^n1", will I run on 'n1'? I'm not sure what the answer is, I know we talked about the precedence earlier... Tim Ralph H Castain wrote: I personally have no objection, but I would ask then that the wiki be modified to cover this case. All I require is that someone define the syntax to be used to indicate "this is a node I do -not- want used", or alternatively a flag that indicates "all nodes below are -not- to be used". Implementation isn't too hard once I have that... On 3/3/08 9:44 AM, "Edgar Gabriel" wrote: Ralph, could this mechanism be used also to exclude a node, indicating to never run a job there? Here is the problem that I face quite often: students working on the homework forget to allocate a partition on the cluster, and just type mpirun. Because of that, all jobs end up running on the front-end node. If we would have now the ability to specify in a default hostfile, to never run a job on a specified node (e.g. the front end node), users would get an error message when trying to do that. I am aware that that's a little ugly... THanks edgar Ralph Castain wrote: I forget all the formatting we are supposed to use, so I hope you'll all just bear with me. George brought up the fact that we used to have an MCA param to specify a hostfile to use for a job. The hostfile behavior described on the wiki, however, doesn't provide for that option. It associates a hostfile with a specific app_context, and provides a detailed hierarchical layout of how mpirun is to interpret that information. What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile" to replace the deprecated capability. If found, the system's behavior will be: 1. in a managed environment, the default hostfile will be used to filter the discovered nodes to define the available node pool. Any hostfile and/or dash host options provided to an app_context will be used to further filter the node pool to define the specific nodes for use by that app_context. Thus, nodes in the hostfile and dash host options given to an app_context -must- also be in the default hostfile in order to be available for use by that app_context - any nodes in the app_context options that are not in the default hostfile will be ignored. 2. in an unmanaged environment, the default hostfile will be used to define the available node pool. Any hostfile and/or dash host options provided to an app_context will be used to filter the node pool to define the specific nodes for use by that app_context, subject to the previous caveat. However, add-hostfile and add-host options will add nodes to the node pool for use -only- by the associated app_context. I believe this proposed behavior is consistent with that described on the wiki, and would be relatively easy to implement. If nobody objects, I will do so by end-of-day 3/6. Comments, suggestions, objections - all are welcome! Ralph ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [RFC] Default hostfile MCA param
I personally have no objection, but I would ask then that the wiki be modified to cover this case. All I require is that someone define the syntax to be used to indicate "this is a node I do -not- want used", or alternatively a flag that indicates "all nodes below are -not- to be used". Implementation isn't too hard once I have that... On 3/3/08 9:44 AM, "Edgar Gabriel" wrote: > Ralph, > > could this mechanism be used also to exclude a node, indicating to never > run a job there? Here is the problem that I face quite often: students > working on the homework forget to allocate a partition on the cluster, > and just type mpirun. Because of that, all jobs end up running on the > front-end node. > > If we would have now the ability to specify in a default hostfile, to > never run a job on a specified node (e.g. the front end node), users > would get an error message when trying to do that. I am aware that > that's a little ugly... > > THanks > edgar > > Ralph Castain wrote: >> I forget all the formatting we are supposed to use, so I hope you'll all >> just bear with me. >> >> George brought up the fact that we used to have an MCA param to specify a >> hostfile to use for a job. The hostfile behavior described on the wiki, >> however, doesn't provide for that option. It associates a hostfile with a >> specific app_context, and provides a detailed hierarchical layout of how >> mpirun is to interpret that information. >> >> What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile" >> to replace the deprecated capability. If found, the system's behavior will >> be: >> >> 1. in a managed environment, the default hostfile will be used to filter the >> discovered nodes to define the available node pool. Any hostfile and/or dash >> host options provided to an app_context will be used to further filter the >> node pool to define the specific nodes for use by that app_context. Thus, >> nodes in the hostfile and dash host options given to an app_context -must- >> also be in the default hostfile in order to be available for use by that >> app_context - any nodes in the app_context options that are not in the >> default hostfile will be ignored. >> >> 2. in an unmanaged environment, the default hostfile will be used to define >> the available node pool. Any hostfile and/or dash host options provided to >> an app_context will be used to filter the node pool to define the specific >> nodes for use by that app_context, subject to the previous caveat. However, >> add-hostfile and add-host options will add nodes to the node pool for use >> -only- by the associated app_context. >> >> >> I believe this proposed behavior is consistent with that described on the >> wiki, and would be relatively easy to implement. If nobody objects, I will >> do so by end-of-day 3/6. >> >> Comments, suggestions, objections - all are welcome! >> Ralph >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [RFC] Default hostfile MCA param
Ralph, could this mechanism be used also to exclude a node, indicating to never run a job there? Here is the problem that I face quite often: students working on the homework forget to allocate a partition on the cluster, and just type mpirun. Because of that, all jobs end up running on the front-end node. If we would have now the ability to specify in a default hostfile, to never run a job on a specified node (e.g. the front end node), users would get an error message when trying to do that. I am aware that that's a little ugly... THanks edgar Ralph Castain wrote: I forget all the formatting we are supposed to use, so I hope you'll all just bear with me. George brought up the fact that we used to have an MCA param to specify a hostfile to use for a job. The hostfile behavior described on the wiki, however, doesn't provide for that option. It associates a hostfile with a specific app_context, and provides a detailed hierarchical layout of how mpirun is to interpret that information. What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile" to replace the deprecated capability. If found, the system's behavior will be: 1. in a managed environment, the default hostfile will be used to filter the discovered nodes to define the available node pool. Any hostfile and/or dash host options provided to an app_context will be used to further filter the node pool to define the specific nodes for use by that app_context. Thus, nodes in the hostfile and dash host options given to an app_context -must- also be in the default hostfile in order to be available for use by that app_context - any nodes in the app_context options that are not in the default hostfile will be ignored. 2. in an unmanaged environment, the default hostfile will be used to define the available node pool. Any hostfile and/or dash host options provided to an app_context will be used to filter the node pool to define the specific nodes for use by that app_context, subject to the previous caveat. However, add-hostfile and add-host options will add nodes to the node pool for use -only- by the associated app_context. I believe this proposed behavior is consistent with that described on the wiki, and would be relatively easy to implement. If nobody objects, I will do so by end-of-day 3/6. Comments, suggestions, objections - all are welcome! Ralph ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
[OMPI devel] [RFC] Default hostfile MCA param
I forget all the formatting we are supposed to use, so I hope you'll all just bear with me. George brought up the fact that we used to have an MCA param to specify a hostfile to use for a job. The hostfile behavior described on the wiki, however, doesn't provide for that option. It associates a hostfile with a specific app_context, and provides a detailed hierarchical layout of how mpirun is to interpret that information. What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile" to replace the deprecated capability. If found, the system's behavior will be: 1. in a managed environment, the default hostfile will be used to filter the discovered nodes to define the available node pool. Any hostfile and/or dash host options provided to an app_context will be used to further filter the node pool to define the specific nodes for use by that app_context. Thus, nodes in the hostfile and dash host options given to an app_context -must- also be in the default hostfile in order to be available for use by that app_context - any nodes in the app_context options that are not in the default hostfile will be ignored. 2. in an unmanaged environment, the default hostfile will be used to define the available node pool. Any hostfile and/or dash host options provided to an app_context will be used to filter the node pool to define the specific nodes for use by that app_context, subject to the previous caveat. However, add-hostfile and add-host options will add nodes to the node pool for use -only- by the associated app_context. I believe this proposed behavior is consistent with that described on the wiki, and would be relatively easy to implement. If nobody objects, I will do so by end-of-day 3/6. Comments, suggestions, objections - all are welcome! Ralph