Re: [OMPI devel] -display-map behavior in 1.3
Easier than I thought...done in r21147 Let me know if that meets your needs Ralph On Mon, May 4, 2009 at 9:42 AM, Ralph Castain wrote: > Should be doable > > Since the output was going to stderr, we just let it continue to do so and > tagged it. I think I can redirect it when doing xml tagging as that is > handled as a separate case - shouldn't be too hard to do. > > > On Mon, May 4, 2009 at 9:29 AM, Greg Watson wrote: > >> Ralph, >> I did find another issue in 1.3 though. It looks like with the -xml option >> you're sending output tagged with to stderr, whereas it would >> probably be better if everything was sent to stdout. Otherwise it's >> necessary to parse the stderr stream separately. >> >> Greg >> >> On May 1, 2009, at 10:47 AM, Greg Watson wrote: >> >> Arrgh! Sorry, my bad. I must have been linked against an old version or >> something. When I recompiled the output went away. >> Greg >> >> On May 1, 2009, at 10:09 AM, Ralph Castain wrote: >> >> Interesting - I'm not seeing this behavior: >> >> graywolf54:trunk rhc$ mpirun -n 3 --xml --display-map hostname >> >> >> >> >> >> >> >> graywolf54.lanl.gov >> graywolf54.lanl.gov >> graywolf54.lanl.gov >> graywolf54:trunk rhc$ >> >> Can you tell me more about when you see this? Note that the display-map >> output should always appear on stderr because that is our default output >> device. >> >> >> On Fri, May 1, 2009 at 7:39 AM, Ralph Castain wrote: >> >>> Hmmm...no, that's a bug. I'll fix it. >>> >>> Thanks! >>> >>> >>> >>> On Fri, May 1, 2009 at 7:24 AM, Greg Watson wrote: >>> Ralf, I've just noticed that if I use '-xml -display-map', I get the xml version of the map and then the non-xml version is sent to stderr (wrapped in xml tags). Was this by design? In my view it would be better to suppress the non-xml map altogether if the -xml option. 1.4 seems to do the same. Greg ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > >
Re: [OMPI devel] -display-map behavior in 1.3
Should be doable Since the output was going to stderr, we just let it continue to do so and tagged it. I think I can redirect it when doing xml tagging as that is handled as a separate case - shouldn't be too hard to do. On Mon, May 4, 2009 at 9:29 AM, Greg Watson wrote: > Ralph, > I did find another issue in 1.3 though. It looks like with the -xml option > you're sending output tagged with to stderr, whereas it would > probably be better if everything was sent to stdout. Otherwise it's > necessary to parse the stderr stream separately. > > Greg > > On May 1, 2009, at 10:47 AM, Greg Watson wrote: > > Arrgh! Sorry, my bad. I must have been linked against an old version or > something. When I recompiled the output went away. > Greg > > On May 1, 2009, at 10:09 AM, Ralph Castain wrote: > > Interesting - I'm not seeing this behavior: > > graywolf54:trunk rhc$ mpirun -n 3 --xml --display-map hostname > > > > > > > > graywolf54.lanl.gov > graywolf54.lanl.gov > graywolf54.lanl.gov > graywolf54:trunk rhc$ > > Can you tell me more about when you see this? Note that the display-map > output should always appear on stderr because that is our default output > device. > > > On Fri, May 1, 2009 at 7:39 AM, Ralph Castain wrote: > >> Hmmm...no, that's a bug. I'll fix it. >> >> Thanks! >> >> >> >> On Fri, May 1, 2009 at 7:24 AM, Greg Watson wrote: >> >>> Ralf, >>> >>> I've just noticed that if I use '-xml -display-map', I get the xml >>> version of the map and then the non-xml version is sent to stderr (wrapped >>> in xml tags). Was this by design? In my view it would be better to suppress >>> the non-xml map altogether if the -xml option. 1.4 seems to do the same. >>> >>> Greg >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] -display-map behavior in 1.3
Ralph, I did find another issue in 1.3 though. It looks like with the -xml option you're sending output tagged with to stderr, whereas it would probably be better if everything was sent to stdout. Otherwise it's necessary to parse the stderr stream separately. Greg On May 1, 2009, at 10:47 AM, Greg Watson wrote: Arrgh! Sorry, my bad. I must have been linked against an old version or something. When I recompiled the output went away. Greg On May 1, 2009, at 10:09 AM, Ralph Castain wrote: Interesting - I'm not seeing this behavior: graywolf54:trunk rhc$ mpirun -n 3 --xml --display-map hostname graywolf54.lanl.gov graywolf54.lanl.gov graywolf54.lanl.gov graywolf54:trunk rhc$ Can you tell me more about when you see this? Note that the display- map output should always appear on stderr because that is our default output device. On Fri, May 1, 2009 at 7:39 AM, Ralph Castain wrote: Hmmm...no, that's a bug. I'll fix it. Thanks! On Fri, May 1, 2009 at 7:24 AM, Greg Watson wrote: Ralf, I've just noticed that if I use '-xml -display-map', I get the xml version of the map and then the non-xml version is sent to stderr (wrapped in xml tags). Was this by design? In my view it would be better to suppress the non-xml map altogether if the -xml option. 1.4 seems to do the same. Greg ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] -display-map behavior in 1.3
Arrgh! Sorry, my bad. I must have been linked against an old version or something. When I recompiled the output went away. Greg On May 1, 2009, at 10:09 AM, Ralph Castain wrote: Interesting - I'm not seeing this behavior: graywolf54:trunk rhc$ mpirun -n 3 --xml --display-map hostname graywolf54.lanl.gov graywolf54.lanl.gov graywolf54.lanl.gov graywolf54:trunk rhc$ Can you tell me more about when you see this? Note that the display- map output should always appear on stderr because that is our default output device. On Fri, May 1, 2009 at 7:39 AM, Ralph Castain wrote: Hmmm...no, that's a bug. I'll fix it. Thanks! On Fri, May 1, 2009 at 7:24 AM, Greg Watson wrote: Ralf, I've just noticed that if I use '-xml -display-map', I get the xml version of the map and then the non-xml version is sent to stderr (wrapped in xml tags). Was this by design? In my view it would be better to suppress the non-xml map altogether if the -xml option. 1.4 seems to do the same. Greg ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] -display-map behavior in 1.3
Interesting - I'm not seeing this behavior: graywolf54:trunk rhc$ mpirun -n 3 --xml --display-map hostname graywolf54.lanl.gov graywolf54.lanl.gov graywolf54.lanl.gov graywolf54:trunk rhc$ Can you tell me more about when you see this? Note that the display-map output should always appear on stderr because that is our default output device. On Fri, May 1, 2009 at 7:39 AM, Ralph Castain wrote: > Hmmm...no, that's a bug. I'll fix it. > > Thanks! > > > > On Fri, May 1, 2009 at 7:24 AM, Greg Watson wrote: > >> Ralf, >> >> I've just noticed that if I use '-xml -display-map', I get the xml version >> of the map and then the non-xml version is sent to stderr (wrapped in xml >> tags). Was this by design? In my view it would be better to suppress the >> non-xml map altogether if the -xml option. 1.4 seems to do the same. >> >> Greg >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > >
Re: [OMPI devel] -display-map behavior in 1.3
Hmmm...no, that's a bug. I'll fix it. Thanks! On Fri, May 1, 2009 at 7:24 AM, Greg Watson wrote: > Ralf, > > I've just noticed that if I use '-xml -display-map', I get the xml version > of the map and then the non-xml version is sent to stderr (wrapped in xml > tags). Was this by design? In my view it would be better to suppress the > non-xml map altogether if the -xml option. 1.4 seems to do the same. > > Greg > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
[OMPI devel] -display-map behavior in 1.3
Ralf, I've just noticed that if I use '-xml -display-map', I get the xml version of the map and then the non-xml version is sent to stderr (wrapped in xml tags). Was this by design? In my view it would be better to suppress the non-xml map altogether if the -xml option. 1.4 seems to do the same. Greg
Re: [OMPI devel] -display-map
Looks good now. Thanks! Greg On Jan 20, 2009, at 12:00 PM, Ralph Castain wrote: I'm embarrassed to admit that I never actually implemented the xml option for tag-output...this has been rectified with r20302. Let me know if that works for you - sorry for confusion. Ralph On Jan 20, 2009, at 8:08 AM, Greg Watson wrote: Ralph, The encapsulation is not quite right yet. I'm seeing this: [1,0]n = 0 [1,1]n = 0 but it should be: n = 0 n = 0 Thanks, Greg On Jan 20, 2009, at 9:20 AM, Ralph Castain wrote: You need to add --tag-output - this is a separate option as it applies both to xml and non-xml situations. If you like, I can force tag-output "on" by default whenever -xml is specified. Ralph On Jan 16, 2009, at 12:52 PM, Greg Watson wrote: Ralph, Is there something I need to do to enable stdout/err encapsulation (apart from -xml)? Here's what I see: $ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map - np 5 /Users/greg/Documents/workspace1/testMPI/Debug/testMPI n = 0 n = 0 n = 0 n = 0 n = 0 On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote: Okay, it is in the trunk as of r20284 - I'll file the request to have it moved to 1.3.1. Let me know if you get a chance to test the stdout/err stuff in the trunk - we should try and iterate it so any changes can make 1.3.1 as well. Thanks! Ralph On Jan 15, 2009, at 11:03 AM, Greg Watson wrote: Ralph, I think the second form would be ideal and would simplify things greatly. Greg On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote: Here is what I was able to do - note that the resolve messages are associated with the specific hostname, not the overall map: Will that work for you? If you like, I can remove the name= field from the noderesolve element since the info is specific to the host element that contains it. In other words, I can make it look like this: if that would help. Ralph On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote: We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/ err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user -thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml --tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (including stdout) as this would vastly simplify the parsing we need to do. Regards, Greg On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote: Hmmm...well, I can't do either for 1.3.0 as it is departing this afternoon. The first option would be very hard to do. I would have to expose the display-map option across the code base and check it prior to printing anything about resolving node names. I guess I should ask: do you only want noderesolve statements when we are displaying the map? Right now, I will output them regardless. The second option could be done. I could check if any "display" option has been specified, and output the root at that time (likewise for the end). Anything we output in-between would be encapsulated between the two, but that would include any user output to stdout and/or stderr - which for 1.3.0 is not in xml. Any thoughts? Ralph PS. Guess I should clarify that I was not striving for true XML interaction here, but rather a quasi-XML format that would help you to filter the output. I have no problem
Re: [OMPI devel] -display-map
I'm embarrassed to admit that I never actually implemented the xml option for tag-output...this has been rectified with r20302. Let me know if that works for you - sorry for confusion. Ralph On Jan 20, 2009, at 8:08 AM, Greg Watson wrote: Ralph, The encapsulation is not quite right yet. I'm seeing this: [1,0]n = 0 [1,1]n = 0 but it should be: n = 0 n = 0 Thanks, Greg On Jan 20, 2009, at 9:20 AM, Ralph Castain wrote: You need to add --tag-output - this is a separate option as it applies both to xml and non-xml situations. If you like, I can force tag-output "on" by default whenever -xml is specified. Ralph On Jan 16, 2009, at 12:52 PM, Greg Watson wrote: Ralph, Is there something I need to do to enable stdout/err encapsulation (apart from -xml)? Here's what I see: $ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np 5 /Users/greg/Documents/workspace1/testMPI/Debug/testMPI n = 0 n = 0 n = 0 n = 0 n = 0 On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote: Okay, it is in the trunk as of r20284 - I'll file the request to have it moved to 1.3.1. Let me know if you get a chance to test the stdout/err stuff in the trunk - we should try and iterate it so any changes can make 1.3.1 as well. Thanks! Ralph On Jan 15, 2009, at 11:03 AM, Greg Watson wrote: Ralph, I think the second form would be ideal and would simplify things greatly. Greg On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote: Here is what I was able to do - note that the resolve messages are associated with the specific hostname, not the overall map: Will that work for you? If you like, I can remove the name= field from the noderesolve element since the info is specific to the host element that contains it. In other words, I can make it look like this: if that would help. Ralph On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote: We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user -thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml --tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (including stdout) as this would vastly simplify the parsing we need to do. Regards, Greg On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote: Hmmm...well, I can't do either for 1.3.0 as it is departing this afternoon. The first option would be very hard to do. I would have to expose the display-map option across the code base and check it prior to printing anything about resolving node names. I guess I should ask: do you only want noderesolve statements when we are displaying the map? Right now, I will output them regardless. The second option could be done. I could check if any "display" option has been specified, and output the root at that time (likewise for the end). Anything we output in-between would be encapsulated between the two, but that would include any user output to stdout and/or stderr - which for 1.3.0 is not in xml. Any thoughts? Ralph PS. Guess I should clarify that I was not striving for true XML interaction here, but rather a quasi-XML format that would help you to filter the output. I have no problem trying to get to something more formally correct, but it could be tricky in some p
Re: [OMPI devel] -display-map
Ralph, The encapsulation is not quite right yet. I'm seeing this: [1,0]n = 0 [1,1]n = 0 but it should be: n = 0 n = 0 Thanks, Greg On Jan 20, 2009, at 9:20 AM, Ralph Castain wrote: You need to add --tag-output - this is a separate option as it applies both to xml and non-xml situations. If you like, I can force tag-output "on" by default whenever -xml is specified. Ralph On Jan 16, 2009, at 12:52 PM, Greg Watson wrote: Ralph, Is there something I need to do to enable stdout/err encapsulation (apart from -xml)? Here's what I see: $ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np 5 /Users/greg/Documents/workspace1/testMPI/Debug/testMPI n = 0 n = 0 n = 0 n = 0 n = 0 On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote: Okay, it is in the trunk as of r20284 - I'll file the request to have it moved to 1.3.1. Let me know if you get a chance to test the stdout/err stuff in the trunk - we should try and iterate it so any changes can make 1.3.1 as well. Thanks! Ralph On Jan 15, 2009, at 11:03 AM, Greg Watson wrote: Ralph, I think the second form would be ideal and would simplify things greatly. Greg On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote: Here is what I was able to do - note that the resolve messages are associated with the specific hostname, not the overall map: Will that work for you? If you like, I can remove the name= field from the noderesolve element since the info is specific to the host element that contains it. In other words, I can make it look like this: if that would help. Ralph On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote: We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user -thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml --tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (including stdout) as this would vastly simplify the parsing we need to do. Regards, Greg On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote: Hmmm...well, I can't do either for 1.3.0 as it is departing this afternoon. The first option would be very hard to do. I would have to expose the display-map option across the code base and check it prior to printing anything about resolving node names. I guess I should ask: do you only want noderesolve statements when we are displaying the map? Right now, I will output them regardless. The second option could be done. I could check if any "display" option has been specified, and output the root at that time (likewise for the end). Anything we output in-between would be encapsulated between the two, but that would include any user output to stdout and/or stderr - which for 1.3.0 is not in xml. Any thoughts? Ralph PS. Guess I should clarify that I was not striving for true XML interaction here, but rather a quasi-XML format that would help you to filter the output. I have no problem trying to get to something more formally correct, but it could be tricky in some places to achieve it due to the inherent async nature of the beast. On Jan 13, 2009, at 12:17 PM, Greg Watson wrote: Ralph, The XML is looking better now, but there is still one problem. To be valid, there needs to be only one root elem
Re: [OMPI devel] -display-map
I don't think there's any reason we'd want stdout/err not to be encapsulated, so forcing tag-output makes sense. Greg On Jan 20, 2009, at 9:20 AM, Ralph Castain wrote: You need to add --tag-output - this is a separate option as it applies both to xml and non-xml situations. If you like, I can force tag-output "on" by default whenever -xml is specified. Ralph On Jan 16, 2009, at 12:52 PM, Greg Watson wrote: Ralph, Is there something I need to do to enable stdout/err encapsulation (apart from -xml)? Here's what I see: $ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np 5 /Users/greg/Documents/workspace1/testMPI/Debug/testMPI n = 0 n = 0 n = 0 n = 0 n = 0 On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote: Okay, it is in the trunk as of r20284 - I'll file the request to have it moved to 1.3.1. Let me know if you get a chance to test the stdout/err stuff in the trunk - we should try and iterate it so any changes can make 1.3.1 as well. Thanks! Ralph On Jan 15, 2009, at 11:03 AM, Greg Watson wrote: Ralph, I think the second form would be ideal and would simplify things greatly. Greg On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote: Here is what I was able to do - note that the resolve messages are associated with the specific hostname, not the overall map: Will that work for you? If you like, I can remove the name= field from the noderesolve element since the info is specific to the host element that contains it. In other words, I can make it look like this: if that would help. Ralph On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote: We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user -thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml --tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (including stdout) as this would vastly simplify the parsing we need to do. Regards, Greg On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote: Hmmm...well, I can't do either for 1.3.0 as it is departing this afternoon. The first option would be very hard to do. I would have to expose the display-map option across the code base and check it prior to printing anything about resolving node names. I guess I should ask: do you only want noderesolve statements when we are displaying the map? Right now, I will output them regardless. The second option could be done. I could check if any "display" option has been specified, and output the root at that time (likewise for the end). Anything we output in-between would be encapsulated between the two, but that would include any user output to stdout and/or stderr - which for 1.3.0 is not in xml. Any thoughts? Ralph PS. Guess I should clarify that I was not striving for true XML interaction here, but rather a quasi-XML format that would help you to filter the output. I have no problem trying to get to something more formally correct, but it could be tricky in some places to achieve it due to the inherent async nature of the beast. On Jan 13, 2009, at 12:17 PM, Greg Watson wrote: Ralph, The XML is looking better now, but there is still one problem. To be valid, there needs to be only one root element, but current
Re: [OMPI devel] -display-map
You need to add --tag-output - this is a separate option as it applies both to xml and non-xml situations. If you like, I can force tag-output "on" by default whenever -xml is specified. Ralph On Jan 16, 2009, at 12:52 PM, Greg Watson wrote: Ralph, Is there something I need to do to enable stdout/err encapsulation (apart from -xml)? Here's what I see: $ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np 5 /Users/greg/Documents/workspace1/testMPI/Debug/testMPI n = 0 n = 0 n = 0 n = 0 n = 0 On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote: Okay, it is in the trunk as of r20284 - I'll file the request to have it moved to 1.3.1. Let me know if you get a chance to test the stdout/err stuff in the trunk - we should try and iterate it so any changes can make 1.3.1 as well. Thanks! Ralph On Jan 15, 2009, at 11:03 AM, Greg Watson wrote: Ralph, I think the second form would be ideal and would simplify things greatly. Greg On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote: Here is what I was able to do - note that the resolve messages are associated with the specific hostname, not the overall map: Will that work for you? If you like, I can remove the name= field from the noderesolve element since the info is specific to the host element that contains it. In other words, I can make it look like this: if that would help. Ralph On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote: We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user -thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml --tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (including stdout) as this would vastly simplify the parsing we need to do. Regards, Greg On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote: Hmmm...well, I can't do either for 1.3.0 as it is departing this afternoon. The first option would be very hard to do. I would have to expose the display-map option across the code base and check it prior to printing anything about resolving node names. I guess I should ask: do you only want noderesolve statements when we are displaying the map? Right now, I will output them regardless. The second option could be done. I could check if any "display" option has been specified, and output the root at that time (likewise for the end). Anything we output in-between would be encapsulated between the two, but that would include any user output to stdout and/or stderr - which for 1.3.0 is not in xml. Any thoughts? Ralph PS. Guess I should clarify that I was not striving for true XML interaction here, but rather a quasi-XML format that would help you to filter the output. I have no problem trying to get to something more formally correct, but it could be tricky in some places to achieve it due to the inherent async nature of the beast. On Jan 13, 2009, at 12:17 PM, Greg Watson wrote: Ralph, The XML is looking better now, but there is still one problem. To be valid, there needs to be only one root element, but currently you don't have any (or many). So rather than: the XML should
Re: [OMPI devel] -display-map
Ralph, Is there something I need to do to enable stdout/err encapsulation (apart from -xml)? Here's what I see: $ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np 5 / Users/greg/Documents/workspace1/testMPI/Debug/testMPI n = 0 n = 0 n = 0 n = 0 n = 0 On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote: Okay, it is in the trunk as of r20284 - I'll file the request to have it moved to 1.3.1. Let me know if you get a chance to test the stdout/err stuff in the trunk - we should try and iterate it so any changes can make 1.3.1 as well. Thanks! Ralph On Jan 15, 2009, at 11:03 AM, Greg Watson wrote: Ralph, I think the second form would be ideal and would simplify things greatly. Greg On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote: Here is what I was able to do - note that the resolve messages are associated with the specific hostname, not the overall map: Will that work for you? If you like, I can remove the name= field from the noderesolve element since the info is specific to the host element that contains it. In other words, I can make it look like this: if that would help. Ralph On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote: We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user -thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml --tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (including stdout) as this would vastly simplify the parsing we need to do. Regards, Greg On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote: Hmmm...well, I can't do either for 1.3.0 as it is departing this afternoon. The first option would be very hard to do. I would have to expose the display-map option across the code base and check it prior to printing anything about resolving node names. I guess I should ask: do you only want noderesolve statements when we are displaying the map? Right now, I will output them regardless. The second option could be done. I could check if any "display" option has been specified, and output the root at that time (likewise for the end). Anything we output in-between would be encapsulated between the two, but that would include any user output to stdout and/or stderr - which for 1.3.0 is not in xml. Any thoughts? Ralph PS. Guess I should clarify that I was not striving for true XML interaction here, but rather a quasi-XML format that would help you to filter the output. I have no problem trying to get to something more formally correct, but it could be tricky in some places to achieve it due to the inherent async nature of the beast. On Jan 13, 2009, at 12:17 PM, Greg Watson wrote: Ralph, The XML is looking better now, but there is still one problem. To be valid, there needs to be only one root element, but currently you don't have any (or many). So rather than: the XML should be: or:
Re: [OMPI devel] -display-map
Fixed in r20288. Thanks for the catch. On Jan 16, 2009, at 2:04 PM, Greg Watson wrote: FYI, if I configure with --with-platform=contrib/platform/lanl/ macosx-dynamic the build succeeds. Greg On Jan 16, 2009, at 1:08 PM, Jeff Squyres wrote: References: <78c4b4d7-d9bc-4268-97cf-8cbad...@computer.org> > <9317bd55-13a2-44be-bcc0-3e42e2322...@computer.org> <5cb48a5d-1ce3-48f7-8890-c99239b0a...@lanl.gov > <22ebe824--47f1-a954-8b54536bf...@computer.org> > <6dda0348-96b4-4e3f-91b4-490631cfe...@computer.org> > <460591d2-bd7b-43ca-9b1e-1b4e02127...@lanl.gov > > <4d997767-d893-43e7-bd4a-41266c9b4...@lanl.gov> <206dc9cd-aa61-4e7c-8a28-7dd3279ce...@computer.org > <5175dc9a-ee1f-4b38-be89-eb55fcef3...@lanl.gov > <66736892-ce43-464c-b439-7ed03ddb0...@computer.org> > <8d67c754-d192-45ee-b4e8-071f67d78...@lanl.gov> <6D116CE6-9A8B-407E-A2D7-! 1 f716e827...@computer.org> <7d175b97-df6a-42db-81b9-4d9663861...@lanl.gov > <19ef4971-0390-4992-a8a7-cbc6b7189...@computer.org> X-Mailer: Apple Mail (2.930.3) Return-Path: jsquy...@cisco.com X-OriginalArrivalTime: 16 Jan 2009 18:08:11.0165 (UTC) FILETIME=[646AA4D0:01C97805] Er... whoops. This looks like my mistake (I just recently add MPI_REDUCE_LOCAL to the trunk -- not v1.3). I could have sworn that I tested this on a Mac, multiple times. I'll test again... On Jan 16, 2009, at 12:58 PM, Greg Watson wrote: When I try to build trunk, it fails with: i_f77.lax/libmpi_f77_pmpi.a/pwin_unlock_f.o .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/pwin_wait_f.o .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/pwtick_f.o .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/pwtime_f.o ../../../ompi/.libs/libmpi. 0.0.0.dylib /usr/local/openmpi-1.4-devel/lib/libopen-rte. 0.0.0.dylib /usr/local/openmpi-1.4-devel/lib/libopen-pal. 0.0.0.dylib -install_name /usr/local/openmpi-1.4-devel/lib/ libmpi_f77.0.dylib -compatibility_version 1 -current_version 1.0 ld: duplicate symbol _mpi_reduce_local_f in .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/preduce_local_f.o and .libs/reduce_local_f.o collect2: ld returned 1 exit status make[3]: *** [libmpi_f77.la] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 I'm using the default configure command (./configure --prefix=xxx) on Mac OS X 10.5. This works fine on the 1.3 branch. Greg On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote: Okay, it is in the trunk as of r20284 - I'll file the request to have it moved to 1.3.1. Let me know if you get a chance to test the stdout/err stuff in the trunk - we should try and iterate it so any changes can make 1.3.1 as well. Thanks! Ralph On Jan 15, 2009, at 11:03 AM, Greg Watson wrote: Ralph, I think the second form would be ideal and would simplify things greatly. Greg On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote: Here is what I was able to do - note that the resolve messages are associated with the specific hostname, not the overall map: Will that work for you? If you like, I can remove the name= field from the noderesolve element since the info is specific to the host element that contains it. In other words, I can make it look like this: if that would help. Ralph On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote: We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user -thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml --tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (includ
Re: [OMPI devel] -display-map
FYI, if I configure with --with-platform=contrib/platform/lanl/macosx- dynamic the build succeeds. Greg On Jan 16, 2009, at 1:08 PM, Jeff Squyres wrote: References: <78c4b4d7-d9bc-4268-97cf-8cbad...@computer.org> > <9317bd55-13a2-44be-bcc0-3e42e2322...@computer.org> <5cb48a5d-1ce3-48f7-8890-c99239b0a...@lanl.gov > <22ebe824--47f1-a954-8b54536bf...@computer.org> > <6dda0348-96b4-4e3f-91b4-490631cfe...@computer.org> > <460591d2-bd7b-43ca-9b1e-1b4e02127...@lanl.gov > > <4d997767-d893-43e7-bd4a-41266c9b4...@lanl.gov> <206dc9cd-aa61-4e7c-8a28-7dd3279ce...@computer.org > <5175dc9a-ee1f-4b38-be89-eb55fcef3...@lanl.gov > <66736892-ce43-464c-b439-7ed03ddb0...@computer.org> > <8d67c754-d192-45ee-b4e8-071f67d78...@lanl.gov> <6D116CE6-9A8B-407E-A2D7-! 1 f716e827...@computer.org> <7d175b97-df6a-42db-81b9-4d9663861...@lanl.gov > <19ef4971-0390-4992-a8a7-cbc6b7189...@computer.org> X-Mailer: Apple Mail (2.930.3) Return-Path: jsquy...@cisco.com X-OriginalArrivalTime: 16 Jan 2009 18:08:11.0165 (UTC) FILETIME=[646AA4D0:01C97805] Er... whoops. This looks like my mistake (I just recently add MPI_REDUCE_LOCAL to the trunk -- not v1.3). I could have sworn that I tested this on a Mac, multiple times. I'll test again... On Jan 16, 2009, at 12:58 PM, Greg Watson wrote: When I try to build trunk, it fails with: i_f77.lax/libmpi_f77_pmpi.a/pwin_unlock_f.o .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/pwin_wait_f.o .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/pwtick_f.o .libs/libmpi_f77.lax/libmpi_f77_pmpi.a/ pwtime_f.o ../../../ompi/.libs/libmpi.0.0.0.dylib /usr/local/ openmpi-1.4-devel/lib/libopen-rte.0.0.0.dylib /usr/local/ openmpi-1.4-devel/lib/libopen-pal.0.0.0.dylib -install_name /usr/ local/openmpi-1.4-devel/lib/libmpi_f77.0.dylib - compatibility_version 1 -current_version 1.0 ld: duplicate symbol _mpi_reduce_local_f in .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/preduce_local_f.o and .libs/reduce_local_f.o collect2: ld returned 1 exit status make[3]: *** [libmpi_f77.la] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 I'm using the default configure command (./configure --prefix=xxx) on Mac OS X 10.5. This works fine on the 1.3 branch. Greg On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote: Okay, it is in the trunk as of r20284 - I'll file the request to have it moved to 1.3.1. Let me know if you get a chance to test the stdout/err stuff in the trunk - we should try and iterate it so any changes can make 1.3.1 as well. Thanks! Ralph On Jan 15, 2009, at 11:03 AM, Greg Watson wrote: Ralph, I think the second form would be ideal and would simplify things greatly. Greg On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote: Here is what I was able to do - note that the resolve messages are associated with the specific hostname, not the overall map: Will that work for you? If you like, I can remove the name= field from the noderesolve element since the info is specific to the host element that contains it. In other words, I can make it look like this: if that would help. Ralph On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote: We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user -thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml --tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (including stdout) as this would vastly simplify the parsing we need to do. Regards, Greg On
Re: [OMPI devel] -display-map
References: <78c4b4d7-d9bc-4268-97cf-8cbad...@computer.org> <9317bd55-13a2-44be-bcc0-3e42e2322...@computer.org> <5cb48a5d-1ce3-48f7-8890-c99239b0a...@lanl.gov> <22ebe824--47f1-a954-8b54536bf...@computer.org> <6dda0348-96b4-4e3f-91b4-490631cfe...@computer.org> <460591d2-bd7b-43ca-9b1e-1b4e02127...@lanl.gov> <4d997767-d893-43e7-bd4a-41266c9b4...@lanl.gov> <206dc9cd-aa61-4e7c-8a28-7dd3279ce...@computer.org> <5175dc9a-ee1f-4b38-be89-eb55fcef3...@lanl.gov> <66736892-ce43-464c-b439-7ed03ddb0...@computer.org> <8d67c754-d192-45ee-b4e8-071f67d78...@lanl.gov> <6D116CE6-9A8B-407E-A2D7-1 f716e827...@computer.org> <7d175b97-df6a-42db-81b9-4d9663861...@lanl.gov> <19ef4971-0390-4992-a8a7-cbc6b7189...@computer.org> X-Mailer: Apple Mail (2.930.3) Return-Path: jsquy...@cisco.com X-OriginalArrivalTime: 16 Jan 2009 18:08:11.0165 (UTC) FILETIME=[646AA4D0:01C97805] Er... whoops. This looks like my mistake (I just recently add MPI_REDUCE_LOCAL to the trunk -- not v1.3). I could have sworn that I tested this on a Mac, multiple times. I'll test again... On Jan 16, 2009, at 12:58 PM, Greg Watson wrote: When I try to build trunk, it fails with: i_f77.lax/libmpi_f77_pmpi.a/pwin_unlock_f.o .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/pwin_wait_f.o .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/pwtick_f.o .libs/libmpi_f77.lax/libmpi_f77_pmpi.a/ pwtime_f.o ../../../ompi/.libs/libmpi.0.0.0.dylib /usr/local/ openmpi-1.4-devel/lib/libopen-rte.0.0.0.dylib /usr/local/openmpi-1.4- devel/lib/libopen-pal.0.0.0.dylib -install_name /usr/local/ openmpi-1.4-devel/lib/libmpi_f77.0.dylib -compatibility_version 1 - current_version 1.0 ld: duplicate symbol _mpi_reduce_local_f in .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/preduce_local_f.o and .libs/reduce_local_f.o collect2: ld returned 1 exit status make[3]: *** [libmpi_f77.la] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 I'm using the default configure command (./configure --prefix=xxx) on Mac OS X 10.5. This works fine on the 1.3 branch. Greg On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote: Okay, it is in the trunk as of r20284 - I'll file the request to have it moved to 1.3.1. Let me know if you get a chance to test the stdout/err stuff in the trunk - we should try and iterate it so any changes can make 1.3.1 as well. Thanks! Ralph On Jan 15, 2009, at 11:03 AM, Greg Watson wrote: Ralph, I think the second form would be ideal and would simplify things greatly. Greg On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote: Here is what I was able to do - note that the resolve messages are associated with the specific hostname, not the overall map: Will that work for you? If you like, I can remove the name= field from the noderesolve element since the info is specific to the host element that contains it. In other words, I can make it look like this: if that would help. Ralph On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote: We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user -thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml --tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (including stdout) as this would vastly simplify the parsing we need to do. Regards, Greg On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote: Hmmm...well, I can't do either for 1.3.0 as it is departing this afternoon. The first option would be very hard to do. I
Re: [OMPI devel] -display-map
When I try to build trunk, it fails with: i_f77.lax/libmpi_f77_pmpi.a/pwin_unlock_f.o .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/pwin_wait_f.o .libs/libmpi_f77.lax/libmpi_f77_pmpi.a/ pwtick_f.o .libs/libmpi_f77.lax/libmpi_f77_pmpi.a/ pwtime_f.o ../../../ompi/.libs/libmpi.0.0.0.dylib /usr/local/ openmpi-1.4-devel/lib/libopen-rte.0.0.0.dylib /usr/local/openmpi-1.4- devel/lib/libopen-pal.0.0.0.dylib -install_name /usr/local/ openmpi-1.4-devel/lib/libmpi_f77.0.dylib -compatibility_version 1 - current_version 1.0 ld: duplicate symbol _mpi_reduce_local_f in .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/preduce_local_f.o and .libs/reduce_local_f.o collect2: ld returned 1 exit status make[3]: *** [libmpi_f77.la] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 I'm using the default configure command (./configure --prefix=xxx) on Mac OS X 10.5. This works fine on the 1.3 branch. Greg On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote: Okay, it is in the trunk as of r20284 - I'll file the request to have it moved to 1.3.1. Let me know if you get a chance to test the stdout/err stuff in the trunk - we should try and iterate it so any changes can make 1.3.1 as well. Thanks! Ralph On Jan 15, 2009, at 11:03 AM, Greg Watson wrote: Ralph, I think the second form would be ideal and would simplify things greatly. Greg On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote: Here is what I was able to do - note that the resolve messages are associated with the specific hostname, not the overall map: Will that work for you? If you like, I can remove the name= field from the noderesolve element since the info is specific to the host element that contains it. In other words, I can make it look like this: if that would help. Ralph On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote: We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user -thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml --tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (including stdout) as this would vastly simplify the parsing we need to do. Regards, Greg On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote: Hmmm...well, I can't do either for 1.3.0 as it is departing this afternoon. The first option would be very hard to do. I would have to expose the display-map option across the code base and check it prior to printing anything about resolving node names. I guess I should ask: do you only want noderesolve statements when we are displaying the map? Right now, I will output them regardless. The second option could be done. I could check if any "display" option has been specified, and output the root at that time (likewise for the end). Anything we output in-between would be encapsulated between the two, but that would include any user output to stdout and/or stderr - which for 1.3.0 is not in xml. Any thoughts? Ralph PS. Guess I should clarify that I was not striving for true XML interaction here, but rather a quasi-XML format that would help you to filter the output. I have no problem trying to get to something more formally correct, but it could be tricky in some places to achieve it due to the inherent async nature of the beast. On Jan 13, 2009, at 12:17 PM, Greg Watson wrote: Ralph, The XML is looking better now, but there is still one problem. To be valid, there needs to be only one root el
Re: [OMPI devel] -display-map
Okay, it is in the trunk as of r20284 - I'll file the request to have it moved to 1.3.1. Let me know if you get a chance to test the stdout/err stuff in the trunk - we should try and iterate it so any changes can make 1.3.1 as well. Thanks! Ralph On Jan 15, 2009, at 11:03 AM, Greg Watson wrote: Ralph, I think the second form would be ideal and would simplify things greatly. Greg On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote: Here is what I was able to do - note that the resolve messages are associated with the specific hostname, not the overall map: Will that work for you? If you like, I can remove the name= field from the noderesolve element since the info is specific to the host element that contains it. In other words, I can make it look like this: if that would help. Ralph On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote: We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user - thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml --tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (including stdout) as this would vastly simplify the parsing we need to do. Regards, Greg On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote: Hmmm...well, I can't do either for 1.3.0 as it is departing this afternoon. The first option would be very hard to do. I would have to expose the display-map option across the code base and check it prior to printing anything about resolving node names. I guess I should ask: do you only want noderesolve statements when we are displaying the map? Right now, I will output them regardless. The second option could be done. I could check if any "display" option has been specified, and output the root at that time (likewise for the end). Anything we output in-between would be encapsulated between the two, but that would include any user output to stdout and/or stderr - which for 1.3.0 is not in xml. Any thoughts? Ralph PS. Guess I should clarify that I was not striving for true XML interaction here, but rather a quasi-XML format that would help you to filter the output. I have no problem trying to get to something more formally correct, but it could be tricky in some places to achieve it due to the inherent async nature of the beast. On Jan 13, 2009, at 12:17 PM, Greg Watson wrote: Ralph, The XML is looking better now, but there is still one problem. To be valid, there needs to be only one root element, but currently you don't have any (or many). So rather than: the XML should be: or: Would either of these be possible? Thanks, Greg On Dec 8, 2008, at 2:18 PM, Greg Watson wrote: Ok thanks. I'll test from trunk in future. Greg On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote: Working its way around the CMR process now. Might be easier in the future if we could test/debug this in the trunk, though. Otherwise, the CMR procedure will fall behind and a fix might miss a release window. Anyway, hopefully this one will make the 1.3.0 release cutoff. Thanks Ralph
Re: [OMPI devel] -display-map
Ralph, I think the second form would be ideal and would simplify things greatly. Greg On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote: Here is what I was able to do - note that the resolve messages are associated with the specific hostname, not the overall map: Will that work for you? If you like, I can remove the name= field from the noderesolve element since the info is specific to the host element that contains it. In other words, I can make it look like this: if that would help. Ralph On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote: We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user -thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml --tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (including stdout) as this would vastly simplify the parsing we need to do. Regards, Greg On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote: Hmmm...well, I can't do either for 1.3.0 as it is departing this afternoon. The first option would be very hard to do. I would have to expose the display-map option across the code base and check it prior to printing anything about resolving node names. I guess I should ask: do you only want noderesolve statements when we are displaying the map? Right now, I will output them regardless. The second option could be done. I could check if any "display" option has been specified, and output the root at that time (likewise for the end). Anything we output in-between would be encapsulated between the two, but that would include any user output to stdout and/or stderr - which for 1.3.0 is not in xml. Any thoughts? Ralph PS. Guess I should clarify that I was not striving for true XML interaction here, but rather a quasi-XML format that would help you to filter the output. I have no problem trying to get to something more formally correct, but it could be tricky in some places to achieve it due to the inherent async nature of the beast. On Jan 13, 2009, at 12:17 PM, Greg Watson wrote: Ralph, The XML is looking better now, but there is still one problem. To be valid, there needs to be only one root element, but currently you don't have any (or many). So rather than: the XML should be: or: Would either of these be possible? Thanks, Greg On Dec 8, 2008, at 2:18 PM, Greg Watson wrote: Ok thanks. I'll test from trunk in future. Greg On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote: Working its way around the CMR process now. Might be easier in the future if we could test/debug this in the trunk, though. Otherwise, the CMR procedure will fall behind and a fix might miss a release window. Anyway, hopefully this one will make the 1.3.0 release cutoff. Thanks Ralph On Dec 8, 2008, at 9:56 AM, Greg Watson wrote: Hi Ralph, This is now in 1.3rc2, thanks. However there are a couple of problems. Here is what I see: [Jarrah.watson.ibm.com:58957] resolved="Jarrah.watson.ibm.com"> For some reason each line is prefixed with "[...]", any idea why this is? Also the end t
Re: [OMPI devel] -display-map
Here is what I was able to do - note that the resolve messages are associated with the specific hostname, not the overall map: Will that work for you? If you like, I can remove the name= field from the noderesolve element since the info is specific to the host element that contains it. In other words, I can make it look like this: if that would help. Ralph On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote: We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user -thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml --tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (including stdout) as this would vastly simplify the parsing we need to do. Regards, Greg On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote: Hmmm...well, I can't do either for 1.3.0 as it is departing this afternoon. The first option would be very hard to do. I would have to expose the display-map option across the code base and check it prior to printing anything about resolving node names. I guess I should ask: do you only want noderesolve statements when we are displaying the map? Right now, I will output them regardless. The second option could be done. I could check if any "display" option has been specified, and output the root at that time (likewise for the end). Anything we output in-between would be encapsulated between the two, but that would include any user output to stdout and/or stderr - which for 1.3.0 is not in xml. Any thoughts? Ralph PS. Guess I should clarify that I was not striving for true XML interaction here, but rather a quasi-XML format that would help you to filter the output. I have no problem trying to get to something more formally correct, but it could be tricky in some places to achieve it due to the inherent async nature of the beast. On Jan 13, 2009, at 12:17 PM, Greg Watson wrote: Ralph, The XML is looking better now, but there is still one problem. To be valid, there needs to be only one root element, but currently you don't have any (or many). So rather than: the XML should be: or: Would either of these be possible? Thanks, Greg On Dec 8, 2008, at 2:18 PM, Greg Watson wrote: Ok thanks. I'll test from trunk in future. Greg On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote: Working its way around the CMR process now. Might be easier in the future if we could test/debug this in the trunk, though. Otherwise, the CMR procedure will fall behind and a fix might miss a release window. Anyway, hopefully this one will make the 1.3.0 release cutoff. Thanks Ralph On Dec 8, 2008, at 9:56 AM, Greg Watson wrote: Hi Ralph, This is now in 1.3rc2, thanks. However there are a couple of problems. Here is what I see: [Jarrah.watson.ibm.com:58957] resolved="Jarrah.watson.ibm.com"> For some reason each line is prefixed with "[...]", any idea why this is? Also the end tag should be "/>" not ">". Thanks, Greg On Nov 24, 2008, at 3:06 PM, Greg Watson wrote: Great, thanks. I'll take a look once it comes over to
Re: [OMPI devel] -display-map
We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user -thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml -- tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (including stdout) as this would vastly simplify the parsing we need to do. Regards, Greg On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote: Hmmm...well, I can't do either for 1.3.0 as it is departing this afternoon. The first option would be very hard to do. I would have to expose the display-map option across the code base and check it prior to printing anything about resolving node names. I guess I should ask: do you only want noderesolve statements when we are displaying the map? Right now, I will output them regardless. The second option could be done. I could check if any "display" option has been specified, and output the root at that time (likewise for the end). Anything we output in-between would be encapsulated between the two, but that would include any user output to stdout and/or stderr - which for 1.3.0 is not in xml. Any thoughts? Ralph PS. Guess I should clarify that I was not striving for true XML interaction here, but rather a quasi-XML format that would help you to filter the output. I have no problem trying to get to something more formally correct, but it could be tricky in some places to achieve it due to the inherent async nature of the beast. On Jan 13, 2009, at 12:17 PM, Greg Watson wrote: Ralph, The XML is looking better now, but there is still one problem. To be valid, there needs to be only one root element, but currently you don't have any (or many). So rather than: the XML should be: or: Would either of these be possible? Thanks, Greg On Dec 8, 2008, at 2:18 PM, Greg Watson wrote: Ok thanks. I'll test from trunk in future. Greg On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote: Working its way around the CMR process now. Might be easier in the future if we could test/debug this in the trunk, though. Otherwise, the CMR procedure will fall behind and a fix might miss a release window. Anyway, hopefully this one will make the 1.3.0 release cutoff. Thanks Ralph On Dec 8, 2008, at 9:56 AM, Greg Watson wrote: Hi Ralph, This is now in 1.3rc2, thanks. However there are a couple of problems. Here is what I see: [Jarrah.watson.ibm.com:58957] resolved="Jarrah.watson.ibm.com"> For some reason each line is prefixed with "[...]", any idea why this is? Also the end tag should be "/>" not ">". Thanks, Greg On Nov 24, 2008, at 3:06 PM, Greg Watson wrote: Great, thanks. I'll take a look once it comes over to 1.3. Cheers, Greg On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote: Yo Greg This is in the trunk as of r20032. I'll bring it over to 1.3 in a few days. I implemented it as another MCA param "orte_show_resolved_nodenames" so you can actually get the info as you execute the job, if you want. The xml tag is "noderesolve" - let me know if you need any changes. Ralph On Oct 22, 2008, at 11:55 AM, Greg Watson wrote: Ralph, I guess the issue for us is that we will have to run two commands to get the information we need. One to get the configuration information, such as vers
Re: [OMPI devel] -display-map
Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (including stdout) as this would vastly simplify the parsing we need to do. Regards, Greg On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote: Hmmm...well, I can't do either for 1.3.0 as it is departing this afternoon. The first option would be very hard to do. I would have to expose the display-map option across the code base and check it prior to printing anything about resolving node names. I guess I should ask: do you only want noderesolve statements when we are displaying the map? Right now, I will output them regardless. The second option could be done. I could check if any "display" option has been specified, and output the root at that time (likewise for the end). Anything we output in-between would be encapsulated between the two, but that would include any user output to stdout and/or stderr - which for 1.3.0 is not in xml. Any thoughts? Ralph PS. Guess I should clarify that I was not striving for true XML interaction here, but rather a quasi-XML format that would help you to filter the output. I have no problem trying to get to something more formally correct, but it could be tricky in some places to achieve it due to the inherent async nature of the beast. On Jan 13, 2009, at 12:17 PM, Greg Watson wrote: Ralph, The XML is looking better now, but there is still one problem. To be valid, there needs to be only one root element, but currently you don't have any (or many). So rather than: the XML should be: or: Would either of these be possible? Thanks, Greg On Dec 8, 2008, at 2:18 PM, Greg Watson wrote: Ok thanks. I'll test from trunk in future. Greg On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote: Working its way around the CMR process now. Might be easier in the future if we could test/debug this in the trunk, though. Otherwise, the CMR procedure will fall behind and a fix might miss a release window. Anyway, hopefully this one will make the 1.3.0 release cutoff. Thanks Ralph On Dec 8, 2008, at 9:56 AM, Greg Watson wrote: Hi Ralph, This is now in 1.3rc2, thanks. However there are a couple of problems. Here is what I see: [Jarrah.watson.ibm.com:58957] resolved="Jarrah.watson.ibm.com"> For some reason each line is prefixed with "[...]", any idea why this is? Also the end tag should be "/>" not ">". Thanks, Greg On Nov 24, 2008, at 3:06 PM, Greg Watson wrote: Great, thanks. I'll take a look once it comes over to 1.3. Cheers, Greg On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote: Yo Greg This is in the trunk as of r20032. I'll bring it over to 1.3 in a few days. I implemented it as another MCA param "orte_show_resolved_nodenames" so you can actually get the info as you execute the job, if you want. The xml tag is "noderesolve" - let me know if you need any changes. Ralph On Oct 22, 2008, at 11:55 AM, Greg Watson wrote: Ralph, I guess the issue for us is that we will have to run two commands to get the information we need. One to get the configuration information, such as version and MCA parameters, and one to get the host information, whereas it would seem more logical that this should all be available via some kind of "configuration discovery" command. I understand the issue with supplying the hostfile though, so maybe this just points at the need for us to separate configuration information from the host information. In any case, we'll work with what you think is best. Greg On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote: Hmmm...just to be sure we are all clear on this. The reason we proposed to use mpirun is that "hostfile" has no meaning outside of mpirun. That's why ompi_info can't do anything in this regard. We have no idea what hostfile the user may specify until we actually get the mpirun cmd line. They may have specified a default-hostfile, but they could also specify hostfiles for the individual app_contexts. These may or may not include the node upon which mpirun is executing. So the only way to provide you with a separate command to get a hostfile<->nodename mapping would require you to provide us with the default-hostifle and/or hostfile cmd line options just as if you were issuing th
Re: [OMPI devel] -display-map
Hmmm...well, I can't do either for 1.3.0 as it is departing this afternoon. The first option would be very hard to do. I would have to expose the display-map option across the code base and check it prior to printing anything about resolving node names. I guess I should ask: do you only want noderesolve statements when we are displaying the map? Right now, I will output them regardless. The second option could be done. I could check if any "display" option has been specified, and output the root at that time (likewise for the end). Anything we output in-between would be encapsulated between the two, but that would include any user output to stdout and/ or stderr - which for 1.3.0 is not in xml. Any thoughts? Ralph PS. Guess I should clarify that I was not striving for true XML interaction here, but rather a quasi-XML format that would help you to filter the output. I have no problem trying to get to something more formally correct, but it could be tricky in some places to achieve it due to the inherent async nature of the beast. On Jan 13, 2009, at 12:17 PM, Greg Watson wrote: Ralph, The XML is looking better now, but there is still one problem. To be valid, there needs to be only one root element, but currently you don't have any (or many). So rather than: the XML should be: or: Would either of these be possible? Thanks, Greg On Dec 8, 2008, at 2:18 PM, Greg Watson wrote: Ok thanks. I'll test from trunk in future. Greg On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote: Working its way around the CMR process now. Might be easier in the future if we could test/debug this in the trunk, though. Otherwise, the CMR procedure will fall behind and a fix might miss a release window. Anyway, hopefully this one will make the 1.3.0 release cutoff. Thanks Ralph On Dec 8, 2008, at 9:56 AM, Greg Watson wrote: Hi Ralph, This is now in 1.3rc2, thanks. However there are a couple of problems. Here is what I see: [Jarrah.watson.ibm.com:58957] resolved="Jarrah.watson.ibm.com"> For some reason each line is prefixed with "[...]", any idea why this is? Also the end tag should be "/>" not ">". Thanks, Greg On Nov 24, 2008, at 3:06 PM, Greg Watson wrote: Great, thanks. I'll take a look once it comes over to 1.3. Cheers, Greg On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote: Yo Greg This is in the trunk as of r20032. I'll bring it over to 1.3 in a few days. I implemented it as another MCA param "orte_show_resolved_nodenames" so you can actually get the info as you execute the job, if you want. The xml tag is "noderesolve" - let me know if you need any changes. Ralph On Oct 22, 2008, at 11:55 AM, Greg Watson wrote: Ralph, I guess the issue for us is that we will have to run two commands to get the information we need. One to get the configuration information, such as version and MCA parameters, and one to get the host information, whereas it would seem more logical that this should all be available via some kind of "configuration discovery" command. I understand the issue with supplying the hostfile though, so maybe this just points at the need for us to separate configuration information from the host information. In any case, we'll work with what you think is best. Greg On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote: Hmmm...just to be sure we are all clear on this. The reason we proposed to use mpirun is that "hostfile" has no meaning outside of mpirun. That's why ompi_info can't do anything in this regard. We have no idea what hostfile the user may specify until we actually get the mpirun cmd line. They may have specified a default-hostfile, but they could also specify hostfiles for the individual app_contexts. These may or may not include the node upon which mpirun is executing. So the only way to provide you with a separate command to get a hostfile<->nodename mapping would require you to provide us with the default-hostifle and/or hostfile cmd line options just as if you were issuing the mpirun cmd. We just wouldn't launch - but it would be the exact equivalent of doing "mpirun --do-not-launch". Am I missing something? If so, please do correct me - I would be happy to provide a tool if that would make it easier. Just not sure what that tool would do. Thanks Ralph On Oct 19, 2008, at 1:59 PM, Greg Watson wrote: Ralph, It seems a little strange to be using mpirun for this, but barring providing a separate command, or using ompi_info, I t
Re: [OMPI devel] -display-map
Ralph, The XML is looking better now, but there is still one problem. To be valid, there needs to be only one root element, but currently you don't have any (or many). So rather than: the XML should be: or: Would either of these be possible? Thanks, Greg On Dec 8, 2008, at 2:18 PM, Greg Watson wrote: Ok thanks. I'll test from trunk in future. Greg On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote: Working its way around the CMR process now. Might be easier in the future if we could test/debug this in the trunk, though. Otherwise, the CMR procedure will fall behind and a fix might miss a release window. Anyway, hopefully this one will make the 1.3.0 release cutoff. Thanks Ralph On Dec 8, 2008, at 9:56 AM, Greg Watson wrote: Hi Ralph, This is now in 1.3rc2, thanks. However there are a couple of problems. Here is what I see: [Jarrah.watson.ibm.com:58957] resolved="Jarrah.watson.ibm.com"> For some reason each line is prefixed with "[...]", any idea why this is? Also the end tag should be "/>" not ">". Thanks, Greg On Nov 24, 2008, at 3:06 PM, Greg Watson wrote: Great, thanks. I'll take a look once it comes over to 1.3. Cheers, Greg On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote: Yo Greg This is in the trunk as of r20032. I'll bring it over to 1.3 in a few days. I implemented it as another MCA param "orte_show_resolved_nodenames" so you can actually get the info as you execute the job, if you want. The xml tag is "noderesolve" - let me know if you need any changes. Ralph On Oct 22, 2008, at 11:55 AM, Greg Watson wrote: Ralph, I guess the issue for us is that we will have to run two commands to get the information we need. One to get the configuration information, such as version and MCA parameters, and one to get the host information, whereas it would seem more logical that this should all be available via some kind of "configuration discovery" command. I understand the issue with supplying the hostfile though, so maybe this just points at the need for us to separate configuration information from the host information. In any case, we'll work with what you think is best. Greg On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote: Hmmm...just to be sure we are all clear on this. The reason we proposed to use mpirun is that "hostfile" has no meaning outside of mpirun. That's why ompi_info can't do anything in this regard. We have no idea what hostfile the user may specify until we actually get the mpirun cmd line. They may have specified a default-hostfile, but they could also specify hostfiles for the individual app_contexts. These may or may not include the node upon which mpirun is executing. So the only way to provide you with a separate command to get a hostfile<->nodename mapping would require you to provide us with the default-hostifle and/or hostfile cmd line options just as if you were issuing the mpirun cmd. We just wouldn't launch - but it would be the exact equivalent of doing "mpirun --do-not-launch". Am I missing something? If so, please do correct me - I would be happy to provide a tool if that would make it easier. Just not sure what that tool would do. Thanks Ralph On Oct 19, 2008, at 1:59 PM, Greg Watson wrote: Ralph, It seems a little strange to be using mpirun for this, but barring providing a separate command, or using ompi_info, I think this would solve our problem. Thanks, Greg On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote: Sorry for delay - had to ponder this one for awhile. Jeff and I agree that adding something to ompi_info would not be a good idea. Ompi_info has no knowledge or understanding of hostfiles, and adding that capability to it would be a major distortion of its intended use. However, we think we can offer an alternative that might better solve the problem. Remember, we now treat hostfiles in a very different manner than before - see the wiki page for a complete description, or "man orte_hosts". So the problem is that, to provide you with what you want, we need to "dump" the information from whatever default- hostfile was provided, and, if no default-hostfile was provided, then the information from each hostfile that was provided with an app_context. The best way we could think of to do this is to add another mpirun cmd line option --dump-hostfiles that would output the line-by-line name from the hostfile plus the name we resolved it to. Of course, --xml would cause it to be in xml form
Re: [OMPI devel] -display-map
Ok thanks. I'll test from trunk in future. Greg On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote: Working its way around the CMR process now. Might be easier in the future if we could test/debug this in the trunk, though. Otherwise, the CMR procedure will fall behind and a fix might miss a release window. Anyway, hopefully this one will make the 1.3.0 release cutoff. Thanks Ralph On Dec 8, 2008, at 9:56 AM, Greg Watson wrote: Hi Ralph, This is now in 1.3rc2, thanks. However there are a couple of problems. Here is what I see: [Jarrah.watson.ibm.com:58957] resolved="Jarrah.watson.ibm.com"> For some reason each line is prefixed with "[...]", any idea why this is? Also the end tag should be "/>" not ">". Thanks, Greg On Nov 24, 2008, at 3:06 PM, Greg Watson wrote: Great, thanks. I'll take a look once it comes over to 1.3. Cheers, Greg On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote: Yo Greg This is in the trunk as of r20032. I'll bring it over to 1.3 in a few days. I implemented it as another MCA param "orte_show_resolved_nodenames" so you can actually get the info as you execute the job, if you want. The xml tag is "noderesolve" - let me know if you need any changes. Ralph On Oct 22, 2008, at 11:55 AM, Greg Watson wrote: Ralph, I guess the issue for us is that we will have to run two commands to get the information we need. One to get the configuration information, such as version and MCA parameters, and one to get the host information, whereas it would seem more logical that this should all be available via some kind of "configuration discovery" command. I understand the issue with supplying the hostfile though, so maybe this just points at the need for us to separate configuration information from the host information. In any case, we'll work with what you think is best. Greg On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote: Hmmm...just to be sure we are all clear on this. The reason we proposed to use mpirun is that "hostfile" has no meaning outside of mpirun. That's why ompi_info can't do anything in this regard. We have no idea what hostfile the user may specify until we actually get the mpirun cmd line. They may have specified a default-hostfile, but they could also specify hostfiles for the individual app_contexts. These may or may not include the node upon which mpirun is executing. So the only way to provide you with a separate command to get a hostfile<->nodename mapping would require you to provide us with the default-hostifle and/or hostfile cmd line options just as if you were issuing the mpirun cmd. We just wouldn't launch - but it would be the exact equivalent of doing "mpirun --do- not-launch". Am I missing something? If so, please do correct me - I would be happy to provide a tool if that would make it easier. Just not sure what that tool would do. Thanks Ralph On Oct 19, 2008, at 1:59 PM, Greg Watson wrote: Ralph, It seems a little strange to be using mpirun for this, but barring providing a separate command, or using ompi_info, I think this would solve our problem. Thanks, Greg On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote: Sorry for delay - had to ponder this one for awhile. Jeff and I agree that adding something to ompi_info would not be a good idea. Ompi_info has no knowledge or understanding of hostfiles, and adding that capability to it would be a major distortion of its intended use. However, we think we can offer an alternative that might better solve the problem. Remember, we now treat hostfiles in a very different manner than before - see the wiki page for a complete description, or "man orte_hosts". So the problem is that, to provide you with what you want, we need to "dump" the information from whatever default-hostfile was provided, and, if no default-hostfile was provided, then the information from each hostfile that was provided with an app_context. The best way we could think of to do this is to add another mpirun cmd line option --dump-hostfiles that would output the line-by-line name from the hostfile plus the name we resolved it to. Of course, --xml would cause it to be in xml format. Would that meet your needs? Ralph On Oct 15, 2008, at 3:12 PM, Greg Watson wrote: Hi Ralph, We've been discussing this back and forth a bit internally and don't really see an easy solution. Our problem is that Eclipse is not running on the head node, so gethostbyname will not necessarily resolve to the same address. For example, the hostfile might refer to the head node by an internal network address that is not visible to the outside world. Since gethostname also looks in /etc/hosts, it may resolve locally but not on a remote system. The only think I can think of would be, rather than us reading the hostfile directly as we do now, to provide an option to ompi_info that would dump the hostfile using the sam
Re: [OMPI devel] -display-map
Working its way around the CMR process now. Might be easier in the future if we could test/debug this in the trunk, though. Otherwise, the CMR procedure will fall behind and a fix might miss a release window. Anyway, hopefully this one will make the 1.3.0 release cutoff. Thanks Ralph On Dec 8, 2008, at 9:56 AM, Greg Watson wrote: Hi Ralph, This is now in 1.3rc2, thanks. However there are a couple of problems. Here is what I see: [Jarrah.watson.ibm.com:58957] resolved="Jarrah.watson.ibm.com"> For some reason each line is prefixed with "[...]", any idea why this is? Also the end tag should be "/>" not ">". Thanks, Greg On Nov 24, 2008, at 3:06 PM, Greg Watson wrote: Great, thanks. I'll take a look once it comes over to 1.3. Cheers, Greg On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote: Yo Greg This is in the trunk as of r20032. I'll bring it over to 1.3 in a few days. I implemented it as another MCA param "orte_show_resolved_nodenames" so you can actually get the info as you execute the job, if you want. The xml tag is "noderesolve" - let me know if you need any changes. Ralph On Oct 22, 2008, at 11:55 AM, Greg Watson wrote: Ralph, I guess the issue for us is that we will have to run two commands to get the information we need. One to get the configuration information, such as version and MCA parameters, and one to get the host information, whereas it would seem more logical that this should all be available via some kind of "configuration discovery" command. I understand the issue with supplying the hostfile though, so maybe this just points at the need for us to separate configuration information from the host information. In any case, we'll work with what you think is best. Greg On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote: Hmmm...just to be sure we are all clear on this. The reason we proposed to use mpirun is that "hostfile" has no meaning outside of mpirun. That's why ompi_info can't do anything in this regard. We have no idea what hostfile the user may specify until we actually get the mpirun cmd line. They may have specified a default-hostfile, but they could also specify hostfiles for the individual app_contexts. These may or may not include the node upon which mpirun is executing. So the only way to provide you with a separate command to get a hostfile<->nodename mapping would require you to provide us with the default-hostifle and/or hostfile cmd line options just as if you were issuing the mpirun cmd. We just wouldn't launch - but it would be the exact equivalent of doing "mpirun --do-not- launch". Am I missing something? If so, please do correct me - I would be happy to provide a tool if that would make it easier. Just not sure what that tool would do. Thanks Ralph On Oct 19, 2008, at 1:59 PM, Greg Watson wrote: Ralph, It seems a little strange to be using mpirun for this, but barring providing a separate command, or using ompi_info, I think this would solve our problem. Thanks, Greg On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote: Sorry for delay - had to ponder this one for awhile. Jeff and I agree that adding something to ompi_info would not be a good idea. Ompi_info has no knowledge or understanding of hostfiles, and adding that capability to it would be a major distortion of its intended use. However, we think we can offer an alternative that might better solve the problem. Remember, we now treat hostfiles in a very different manner than before - see the wiki page for a complete description, or "man orte_hosts". So the problem is that, to provide you with what you want, we need to "dump" the information from whatever default-hostfile was provided, and, if no default-hostfile was provided, then the information from each hostfile that was provided with an app_context. The best way we could think of to do this is to add another mpirun cmd line option --dump-hostfiles that would output the line-by-line name from the hostfile plus the name we resolved it to. Of course, --xml would cause it to be in xml format. Would that meet your needs? Ralph On Oct 15, 2008, at 3:12 PM, Greg Watson wrote: Hi Ralph, We've been discussing this back and forth a bit internally and don't really see an easy solution. Our problem is that Eclipse is not running on the head node, so gethostbyname will not necessarily resolve to the same address. For example, the hostfile might refer to the head node by an internal network address that is not visible to the outside world. Since gethostname also looks in /etc/hosts, it may resolve locally but not on a remote system. The only think I can think of would be, rather than us reading the hostfile directly as we do now, to provide an option to ompi_info that would dump the hostfile using the same rules that you apply when you're using the hostfile. Would that be feasible? Greg On Sep 22, 200
Re: [OMPI devel] -display-map
Hi Ralph, This is now in 1.3rc2, thanks. However there are a couple of problems. Here is what I see: [Jarrah.watson.ibm.com:58957] resolved="Jarrah.watson.ibm.com"> For some reason each line is prefixed with "[...]", any idea why this is? Also the end tag should be "/>" not ">". Thanks, Greg On Nov 24, 2008, at 3:06 PM, Greg Watson wrote: Great, thanks. I'll take a look once it comes over to 1.3. Cheers, Greg On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote: Yo Greg This is in the trunk as of r20032. I'll bring it over to 1.3 in a few days. I implemented it as another MCA param "orte_show_resolved_nodenames" so you can actually get the info as you execute the job, if you want. The xml tag is "noderesolve" - let me know if you need any changes. Ralph On Oct 22, 2008, at 11:55 AM, Greg Watson wrote: Ralph, I guess the issue for us is that we will have to run two commands to get the information we need. One to get the configuration information, such as version and MCA parameters, and one to get the host information, whereas it would seem more logical that this should all be available via some kind of "configuration discovery" command. I understand the issue with supplying the hostfile though, so maybe this just points at the need for us to separate configuration information from the host information. In any case, we'll work with what you think is best. Greg On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote: Hmmm...just to be sure we are all clear on this. The reason we proposed to use mpirun is that "hostfile" has no meaning outside of mpirun. That's why ompi_info can't do anything in this regard. We have no idea what hostfile the user may specify until we actually get the mpirun cmd line. They may have specified a default-hostfile, but they could also specify hostfiles for the individual app_contexts. These may or may not include the node upon which mpirun is executing. So the only way to provide you with a separate command to get a hostfile<->nodename mapping would require you to provide us with the default-hostifle and/or hostfile cmd line options just as if you were issuing the mpirun cmd. We just wouldn't launch - but it would be the exact equivalent of doing "mpirun --do-not-launch". Am I missing something? If so, please do correct me - I would be happy to provide a tool if that would make it easier. Just not sure what that tool would do. Thanks Ralph On Oct 19, 2008, at 1:59 PM, Greg Watson wrote: Ralph, It seems a little strange to be using mpirun for this, but barring providing a separate command, or using ompi_info, I think this would solve our problem. Thanks, Greg On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote: Sorry for delay - had to ponder this one for awhile. Jeff and I agree that adding something to ompi_info would not be a good idea. Ompi_info has no knowledge or understanding of hostfiles, and adding that capability to it would be a major distortion of its intended use. However, we think we can offer an alternative that might better solve the problem. Remember, we now treat hostfiles in a very different manner than before - see the wiki page for a complete description, or "man orte_hosts". So the problem is that, to provide you with what you want, we need to "dump" the information from whatever default-hostfile was provided, and, if no default-hostfile was provided, then the information from each hostfile that was provided with an app_context. The best way we could think of to do this is to add another mpirun cmd line option --dump-hostfiles that would output the line-by-line name from the hostfile plus the name we resolved it to. Of course, --xml would cause it to be in xml format. Would that meet your needs? Ralph On Oct 15, 2008, at 3:12 PM, Greg Watson wrote: Hi Ralph, We've been discussing this back and forth a bit internally and don't really see an easy solution. Our problem is that Eclipse is not running on the head node, so gethostbyname will not necessarily resolve to the same address. For example, the hostfile might refer to the head node by an internal network address that is not visible to the outside world. Since gethostname also looks in /etc/hosts, it may resolve locally but not on a remote system. The only think I can think of would be, rather than us reading the hostfile directly as we do now, to provide an option to ompi_info that would dump the hostfile using the same rules that you apply when you're using the hostfile. Would that be feasible? Greg On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote: Sorry for delay - was on vacation and am now trying to work my way back to the surface. I'm not sure I can fix this one for two reasons: 1. In general, OMPI doesn't really care what name is used for the node. However, the problem is that it needs to be consistent. In this case, ORTE has already use
Re: [OMPI devel] -display-map
It slipped thru the cracks - will be in rc2. Thanks for the reminder! Ralph On Dec 2, 2008, at 2:03 PM, Greg Watson wrote: Ralph, will this be in 1.3rc1? Thanks, Greg On Nov 24, 2008, at 3:06 PM, Greg Watson wrote: Great, thanks. I'll take a look once it comes over to 1.3. Cheers, Greg On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote: Yo Greg This is in the trunk as of r20032. I'll bring it over to 1.3 in a few days. I implemented it as another MCA param "orte_show_resolved_nodenames" so you can actually get the info as you execute the job, if you want. The xml tag is "noderesolve" - let me know if you need any changes. Ralph On Oct 22, 2008, at 11:55 AM, Greg Watson wrote: Ralph, I guess the issue for us is that we will have to run two commands to get the information we need. One to get the configuration information, such as version and MCA parameters, and one to get the host information, whereas it would seem more logical that this should all be available via some kind of "configuration discovery" command. I understand the issue with supplying the hostfile though, so maybe this just points at the need for us to separate configuration information from the host information. In any case, we'll work with what you think is best. Greg On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote: Hmmm...just to be sure we are all clear on this. The reason we proposed to use mpirun is that "hostfile" has no meaning outside of mpirun. That's why ompi_info can't do anything in this regard. We have no idea what hostfile the user may specify until we actually get the mpirun cmd line. They may have specified a default-hostfile, but they could also specify hostfiles for the individual app_contexts. These may or may not include the node upon which mpirun is executing. So the only way to provide you with a separate command to get a hostfile<->nodename mapping would require you to provide us with the default-hostifle and/or hostfile cmd line options just as if you were issuing the mpirun cmd. We just wouldn't launch - but it would be the exact equivalent of doing "mpirun --do-not- launch". Am I missing something? If so, please do correct me - I would be happy to provide a tool if that would make it easier. Just not sure what that tool would do. Thanks Ralph On Oct 19, 2008, at 1:59 PM, Greg Watson wrote: Ralph, It seems a little strange to be using mpirun for this, but barring providing a separate command, or using ompi_info, I think this would solve our problem. Thanks, Greg On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote: Sorry for delay - had to ponder this one for awhile. Jeff and I agree that adding something to ompi_info would not be a good idea. Ompi_info has no knowledge or understanding of hostfiles, and adding that capability to it would be a major distortion of its intended use. However, we think we can offer an alternative that might better solve the problem. Remember, we now treat hostfiles in a very different manner than before - see the wiki page for a complete description, or "man orte_hosts". So the problem is that, to provide you with what you want, we need to "dump" the information from whatever default-hostfile was provided, and, if no default-hostfile was provided, then the information from each hostfile that was provided with an app_context. The best way we could think of to do this is to add another mpirun cmd line option --dump-hostfiles that would output the line-by-line name from the hostfile plus the name we resolved it to. Of course, --xml would cause it to be in xml format. Would that meet your needs? Ralph On Oct 15, 2008, at 3:12 PM, Greg Watson wrote: Hi Ralph, We've been discussing this back and forth a bit internally and don't really see an easy solution. Our problem is that Eclipse is not running on the head node, so gethostbyname will not necessarily resolve to the same address. For example, the hostfile might refer to the head node by an internal network address that is not visible to the outside world. Since gethostname also looks in /etc/hosts, it may resolve locally but not on a remote system. The only think I can think of would be, rather than us reading the hostfile directly as we do now, to provide an option to ompi_info that would dump the hostfile using the same rules that you apply when you're using the hostfile. Would that be feasible? Greg On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote: Sorry for delay - was on vacation and am now trying to work my way back to the surface. I'm not sure I can fix this one for two reasons: 1. In general, OMPI doesn't really care what name is used for the node. However, the problem is that it needs to be consistent. In this case, ORTE has already used the name returned by gethostname to create its session directory structure long before mpirun reads a hostfile. This is why
Re: [OMPI devel] -display-map
Ralph, will this be in 1.3rc1? Thanks, Greg On Nov 24, 2008, at 3:06 PM, Greg Watson wrote: Great, thanks. I'll take a look once it comes over to 1.3. Cheers, Greg On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote: Yo Greg This is in the trunk as of r20032. I'll bring it over to 1.3 in a few days. I implemented it as another MCA param "orte_show_resolved_nodenames" so you can actually get the info as you execute the job, if you want. The xml tag is "noderesolve" - let me know if you need any changes. Ralph On Oct 22, 2008, at 11:55 AM, Greg Watson wrote: Ralph, I guess the issue for us is that we will have to run two commands to get the information we need. One to get the configuration information, such as version and MCA parameters, and one to get the host information, whereas it would seem more logical that this should all be available via some kind of "configuration discovery" command. I understand the issue with supplying the hostfile though, so maybe this just points at the need for us to separate configuration information from the host information. In any case, we'll work with what you think is best. Greg On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote: Hmmm...just to be sure we are all clear on this. The reason we proposed to use mpirun is that "hostfile" has no meaning outside of mpirun. That's why ompi_info can't do anything in this regard. We have no idea what hostfile the user may specify until we actually get the mpirun cmd line. They may have specified a default-hostfile, but they could also specify hostfiles for the individual app_contexts. These may or may not include the node upon which mpirun is executing. So the only way to provide you with a separate command to get a hostfile<->nodename mapping would require you to provide us with the default-hostifle and/or hostfile cmd line options just as if you were issuing the mpirun cmd. We just wouldn't launch - but it would be the exact equivalent of doing "mpirun --do-not-launch". Am I missing something? If so, please do correct me - I would be happy to provide a tool if that would make it easier. Just not sure what that tool would do. Thanks Ralph On Oct 19, 2008, at 1:59 PM, Greg Watson wrote: Ralph, It seems a little strange to be using mpirun for this, but barring providing a separate command, or using ompi_info, I think this would solve our problem. Thanks, Greg On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote: Sorry for delay - had to ponder this one for awhile. Jeff and I agree that adding something to ompi_info would not be a good idea. Ompi_info has no knowledge or understanding of hostfiles, and adding that capability to it would be a major distortion of its intended use. However, we think we can offer an alternative that might better solve the problem. Remember, we now treat hostfiles in a very different manner than before - see the wiki page for a complete description, or "man orte_hosts". So the problem is that, to provide you with what you want, we need to "dump" the information from whatever default-hostfile was provided, and, if no default-hostfile was provided, then the information from each hostfile that was provided with an app_context. The best way we could think of to do this is to add another mpirun cmd line option --dump-hostfiles that would output the line-by-line name from the hostfile plus the name we resolved it to. Of course, --xml would cause it to be in xml format. Would that meet your needs? Ralph On Oct 15, 2008, at 3:12 PM, Greg Watson wrote: Hi Ralph, We've been discussing this back and forth a bit internally and don't really see an easy solution. Our problem is that Eclipse is not running on the head node, so gethostbyname will not necessarily resolve to the same address. For example, the hostfile might refer to the head node by an internal network address that is not visible to the outside world. Since gethostname also looks in /etc/hosts, it may resolve locally but not on a remote system. The only think I can think of would be, rather than us reading the hostfile directly as we do now, to provide an option to ompi_info that would dump the hostfile using the same rules that you apply when you're using the hostfile. Would that be feasible? Greg On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote: Sorry for delay - was on vacation and am now trying to work my way back to the surface. I'm not sure I can fix this one for two reasons: 1. In general, OMPI doesn't really care what name is used for the node. However, the problem is that it needs to be consistent. In this case, ORTE has already used the name returned by gethostname to create its session directory structure long before mpirun reads a hostfile. This is why we retain the value from gethostname instead of allowing it to be overwritten by the name in whatever allocation we are give
Re: [OMPI devel] -display-map
Great, thanks. I'll take a look once it comes over to 1.3. Cheers, Greg On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote: Yo Greg This is in the trunk as of r20032. I'll bring it over to 1.3 in a few days. I implemented it as another MCA param "orte_show_resolved_nodenames" so you can actually get the info as you execute the job, if you want. The xml tag is "noderesolve" - let me know if you need any changes. Ralph On Oct 22, 2008, at 11:55 AM, Greg Watson wrote: Ralph, I guess the issue for us is that we will have to run two commands to get the information we need. One to get the configuration information, such as version and MCA parameters, and one to get the host information, whereas it would seem more logical that this should all be available via some kind of "configuration discovery" command. I understand the issue with supplying the hostfile though, so maybe this just points at the need for us to separate configuration information from the host information. In any case, we'll work with what you think is best. Greg On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote: Hmmm...just to be sure we are all clear on this. The reason we proposed to use mpirun is that "hostfile" has no meaning outside of mpirun. That's why ompi_info can't do anything in this regard. We have no idea what hostfile the user may specify until we actually get the mpirun cmd line. They may have specified a default-hostfile, but they could also specify hostfiles for the individual app_contexts. These may or may not include the node upon which mpirun is executing. So the only way to provide you with a separate command to get a hostfile<->nodename mapping would require you to provide us with the default-hostifle and/or hostfile cmd line options just as if you were issuing the mpirun cmd. We just wouldn't launch - but it would be the exact equivalent of doing "mpirun --do-not-launch". Am I missing something? If so, please do correct me - I would be happy to provide a tool if that would make it easier. Just not sure what that tool would do. Thanks Ralph On Oct 19, 2008, at 1:59 PM, Greg Watson wrote: Ralph, It seems a little strange to be using mpirun for this, but barring providing a separate command, or using ompi_info, I think this would solve our problem. Thanks, Greg On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote: Sorry for delay - had to ponder this one for awhile. Jeff and I agree that adding something to ompi_info would not be a good idea. Ompi_info has no knowledge or understanding of hostfiles, and adding that capability to it would be a major distortion of its intended use. However, we think we can offer an alternative that might better solve the problem. Remember, we now treat hostfiles in a very different manner than before - see the wiki page for a complete description, or "man orte_hosts". So the problem is that, to provide you with what you want, we need to "dump" the information from whatever default-hostfile was provided, and, if no default-hostfile was provided, then the information from each hostfile that was provided with an app_context. The best way we could think of to do this is to add another mpirun cmd line option --dump-hostfiles that would output the line-by-line name from the hostfile plus the name we resolved it to. Of course, --xml would cause it to be in xml format. Would that meet your needs? Ralph On Oct 15, 2008, at 3:12 PM, Greg Watson wrote: Hi Ralph, We've been discussing this back and forth a bit internally and don't really see an easy solution. Our problem is that Eclipse is not running on the head node, so gethostbyname will not necessarily resolve to the same address. For example, the hostfile might refer to the head node by an internal network address that is not visible to the outside world. Since gethostname also looks in /etc/hosts, it may resolve locally but not on a remote system. The only think I can think of would be, rather than us reading the hostfile directly as we do now, to provide an option to ompi_info that would dump the hostfile using the same rules that you apply when you're using the hostfile. Would that be feasible? Greg On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote: Sorry for delay - was on vacation and am now trying to work my way back to the surface. I'm not sure I can fix this one for two reasons: 1. In general, OMPI doesn't really care what name is used for the node. However, the problem is that it needs to be consistent. In this case, ORTE has already used the name returned by gethostname to create its session directory structure long before mpirun reads a hostfile. This is why we retain the value from gethostname instead of allowing it to be overwritten by the name in whatever allocation we are given. Using the name in hostfile would require that I either find some way to remember any prior
Re: [OMPI devel] -display-map
Yo Greg This is in the trunk as of r20032. I'll bring it over to 1.3 in a few days. I implemented it as another MCA param "orte_show_resolved_nodenames" so you can actually get the info as you execute the job, if you want. The xml tag is "noderesolve" - let me know if you need any changes. Ralph On Oct 22, 2008, at 11:55 AM, Greg Watson wrote: Ralph, I guess the issue for us is that we will have to run two commands to get the information we need. One to get the configuration information, such as version and MCA parameters, and one to get the host information, whereas it would seem more logical that this should all be available via some kind of "configuration discovery" command. I understand the issue with supplying the hostfile though, so maybe this just points at the need for us to separate configuration information from the host information. In any case, we'll work with what you think is best. Greg On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote: Hmmm...just to be sure we are all clear on this. The reason we proposed to use mpirun is that "hostfile" has no meaning outside of mpirun. That's why ompi_info can't do anything in this regard. We have no idea what hostfile the user may specify until we actually get the mpirun cmd line. They may have specified a default- hostfile, but they could also specify hostfiles for the individual app_contexts. These may or may not include the node upon which mpirun is executing. So the only way to provide you with a separate command to get a hostfile<->nodename mapping would require you to provide us with the default-hostifle and/or hostfile cmd line options just as if you were issuing the mpirun cmd. We just wouldn't launch - but it would be the exact equivalent of doing "mpirun --do-not-launch". Am I missing something? If so, please do correct me - I would be happy to provide a tool if that would make it easier. Just not sure what that tool would do. Thanks Ralph On Oct 19, 2008, at 1:59 PM, Greg Watson wrote: Ralph, It seems a little strange to be using mpirun for this, but barring providing a separate command, or using ompi_info, I think this would solve our problem. Thanks, Greg On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote: Sorry for delay - had to ponder this one for awhile. Jeff and I agree that adding something to ompi_info would not be a good idea. Ompi_info has no knowledge or understanding of hostfiles, and adding that capability to it would be a major distortion of its intended use. However, we think we can offer an alternative that might better solve the problem. Remember, we now treat hostfiles in a very different manner than before - see the wiki page for a complete description, or "man orte_hosts". So the problem is that, to provide you with what you want, we need to "dump" the information from whatever default-hostfile was provided, and, if no default-hostfile was provided, then the information from each hostfile that was provided with an app_context. The best way we could think of to do this is to add another mpirun cmd line option --dump-hostfiles that would output the line-by-line name from the hostfile plus the name we resolved it to. Of course, --xml would cause it to be in xml format. Would that meet your needs? Ralph On Oct 15, 2008, at 3:12 PM, Greg Watson wrote: Hi Ralph, We've been discussing this back and forth a bit internally and don't really see an easy solution. Our problem is that Eclipse is not running on the head node, so gethostbyname will not necessarily resolve to the same address. For example, the hostfile might refer to the head node by an internal network address that is not visible to the outside world. Since gethostname also looks in /etc/hosts, it may resolve locally but not on a remote system. The only think I can think of would be, rather than us reading the hostfile directly as we do now, to provide an option to ompi_info that would dump the hostfile using the same rules that you apply when you're using the hostfile. Would that be feasible? Greg On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote: Sorry for delay - was on vacation and am now trying to work my way back to the surface. I'm not sure I can fix this one for two reasons: 1. In general, OMPI doesn't really care what name is used for the node. However, the problem is that it needs to be consistent. In this case, ORTE has already used the name returned by gethostname to create its session directory structure long before mpirun reads a hostfile. This is why we retain the value from gethostname instead of allowing it to be overwritten by the name in whatever allocation we are given. Using the name in hostfile would require that I either find some way to remember any prior name, or that I tear down and rebuild the session directory tree - neither seems attractive nor simple (e.g., what happens wh
Re: [OMPI devel] -display-map
Ralph, I guess the issue for us is that we will have to run two commands to get the information we need. One to get the configuration information, such as version and MCA parameters, and one to get the host information, whereas it would seem more logical that this should all be available via some kind of "configuration discovery" command. I understand the issue with supplying the hostfile though, so maybe this just points at the need for us to separate configuration information from the host information. In any case, we'll work with what you think is best. Greg On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote: Hmmm...just to be sure we are all clear on this. The reason we proposed to use mpirun is that "hostfile" has no meaning outside of mpirun. That's why ompi_info can't do anything in this regard. We have no idea what hostfile the user may specify until we actually get the mpirun cmd line. They may have specified a default-hostfile, but they could also specify hostfiles for the individual app_contexts. These may or may not include the node upon which mpirun is executing. So the only way to provide you with a separate command to get a hostfile<->nodename mapping would require you to provide us with the default-hostifle and/or hostfile cmd line options just as if you were issuing the mpirun cmd. We just wouldn't launch - but it would be the exact equivalent of doing "mpirun --do-not-launch". Am I missing something? If so, please do correct me - I would be happy to provide a tool if that would make it easier. Just not sure what that tool would do. Thanks Ralph On Oct 19, 2008, at 1:59 PM, Greg Watson wrote: Ralph, It seems a little strange to be using mpirun for this, but barring providing a separate command, or using ompi_info, I think this would solve our problem. Thanks, Greg On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote: Sorry for delay - had to ponder this one for awhile. Jeff and I agree that adding something to ompi_info would not be a good idea. Ompi_info has no knowledge or understanding of hostfiles, and adding that capability to it would be a major distortion of its intended use. However, we think we can offer an alternative that might better solve the problem. Remember, we now treat hostfiles in a very different manner than before - see the wiki page for a complete description, or "man orte_hosts". So the problem is that, to provide you with what you want, we need to "dump" the information from whatever default-hostfile was provided, and, if no default-hostfile was provided, then the information from each hostfile that was provided with an app_context. The best way we could think of to do this is to add another mpirun cmd line option --dump-hostfiles that would output the line-by- line name from the hostfile plus the name we resolved it to. Of course, --xml would cause it to be in xml format. Would that meet your needs? Ralph On Oct 15, 2008, at 3:12 PM, Greg Watson wrote: Hi Ralph, We've been discussing this back and forth a bit internally and don't really see an easy solution. Our problem is that Eclipse is not running on the head node, so gethostbyname will not necessarily resolve to the same address. For example, the hostfile might refer to the head node by an internal network address that is not visible to the outside world. Since gethostname also looks in /etc/hosts, it may resolve locally but not on a remote system. The only think I can think of would be, rather than us reading the hostfile directly as we do now, to provide an option to ompi_info that would dump the hostfile using the same rules that you apply when you're using the hostfile. Would that be feasible? Greg On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote: Sorry for delay - was on vacation and am now trying to work my way back to the surface. I'm not sure I can fix this one for two reasons: 1. In general, OMPI doesn't really care what name is used for the node. However, the problem is that it needs to be consistent. In this case, ORTE has already used the name returned by gethostname to create its session directory structure long before mpirun reads a hostfile. This is why we retain the value from gethostname instead of allowing it to be overwritten by the name in whatever allocation we are given. Using the name in hostfile would require that I either find some way to remember any prior name, or that I tear down and rebuild the session directory tree - neither seems attractive nor simple (e.g., what happens when the user provides multiple entries in the hostfile for the node, each with a different IP address based on another interface in that node? Sounds crazy, but we have already seen it done - which one do I use?). 2. We don't actually store the hostfile info anywhere - we just use it and forget it. For us to add an XML attribute containing any host
Re: [OMPI devel] -display-map
Hmmm...just to be sure we are all clear on this. The reason we proposed to use mpirun is that "hostfile" has no meaning outside of mpirun. That's why ompi_info can't do anything in this regard. We have no idea what hostfile the user may specify until we actually get the mpirun cmd line. They may have specified a default-hostfile, but they could also specify hostfiles for the individual app_contexts. These may or may not include the node upon which mpirun is executing. So the only way to provide you with a separate command to get a hostfile<->nodename mapping would require you to provide us with the default-hostifle and/or hostfile cmd line options just as if you were issuing the mpirun cmd. We just wouldn't launch - but it would be the exact equivalent of doing "mpirun --do-not-launch". Am I missing something? If so, please do correct me - I would be happy to provide a tool if that would make it easier. Just not sure what that tool would do. Thanks Ralph On Oct 19, 2008, at 1:59 PM, Greg Watson wrote: Ralph, It seems a little strange to be using mpirun for this, but barring providing a separate command, or using ompi_info, I think this would solve our problem. Thanks, Greg On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote: Sorry for delay - had to ponder this one for awhile. Jeff and I agree that adding something to ompi_info would not be a good idea. Ompi_info has no knowledge or understanding of hostfiles, and adding that capability to it would be a major distortion of its intended use. However, we think we can offer an alternative that might better solve the problem. Remember, we now treat hostfiles in a very different manner than before - see the wiki page for a complete description, or "man orte_hosts". So the problem is that, to provide you with what you want, we need to "dump" the information from whatever default-hostfile was provided, and, if no default-hostfile was provided, then the information from each hostfile that was provided with an app_context. The best way we could think of to do this is to add another mpirun cmd line option --dump-hostfiles that would output the line-by-line name from the hostfile plus the name we resolved it to. Of course, --xml would cause it to be in xml format. Would that meet your needs? Ralph On Oct 15, 2008, at 3:12 PM, Greg Watson wrote: Hi Ralph, We've been discussing this back and forth a bit internally and don't really see an easy solution. Our problem is that Eclipse is not running on the head node, so gethostbyname will not necessarily resolve to the same address. For example, the hostfile might refer to the head node by an internal network address that is not visible to the outside world. Since gethostname also looks in /etc/hosts, it may resolve locally but not on a remote system. The only think I can think of would be, rather than us reading the hostfile directly as we do now, to provide an option to ompi_info that would dump the hostfile using the same rules that you apply when you're using the hostfile. Would that be feasible? Greg On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote: Sorry for delay - was on vacation and am now trying to work my way back to the surface. I'm not sure I can fix this one for two reasons: 1. In general, OMPI doesn't really care what name is used for the node. However, the problem is that it needs to be consistent. In this case, ORTE has already used the name returned by gethostname to create its session directory structure long before mpirun reads a hostfile. This is why we retain the value from gethostname instead of allowing it to be overwritten by the name in whatever allocation we are given. Using the name in hostfile would require that I either find some way to remember any prior name, or that I tear down and rebuild the session directory tree - neither seems attractive nor simple (e.g., what happens when the user provides multiple entries in the hostfile for the node, each with a different IP address based on another interface in that node? Sounds crazy, but we have already seen it done - which one do I use?). 2. We don't actually store the hostfile info anywhere - we just use it and forget it. For us to add an XML attribute containing any hostfile-related info would therefore require us to re-read the hostfile. I could have it do that -only- in the case of "XML output required", but it seems rather ugly. An alternative might be for you to simply do a "gethostbyname" lookup of the IP address or hostname to see if it matches instead of just doing a strcmp. This is what we have to do internally as we frequently have problems with FQDN vs. non-FQDN vs. IP addresses etc. If the local OS hasn't cached the IP address for the node in question it can take a little time to DNS resolve it, but otherwise works fine. I can point you to the code in OPAL that we use - I w
Re: [OMPI devel] -display-map
Ralph, It seems a little strange to be using mpirun for this, but barring providing a separate command, or using ompi_info, I think this would solve our problem. Thanks, Greg On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote: Sorry for delay - had to ponder this one for awhile. Jeff and I agree that adding something to ompi_info would not be a good idea. Ompi_info has no knowledge or understanding of hostfiles, and adding that capability to it would be a major distortion of its intended use. However, we think we can offer an alternative that might better solve the problem. Remember, we now treat hostfiles in a very different manner than before - see the wiki page for a complete description, or "man orte_hosts". So the problem is that, to provide you with what you want, we need to "dump" the information from whatever default-hostfile was provided, and, if no default-hostfile was provided, then the information from each hostfile that was provided with an app_context. The best way we could think of to do this is to add another mpirun cmd line option --dump-hostfiles that would output the line-by-line name from the hostfile plus the name we resolved it to. Of course, -- xml would cause it to be in xml format. Would that meet your needs? Ralph On Oct 15, 2008, at 3:12 PM, Greg Watson wrote: Hi Ralph, We've been discussing this back and forth a bit internally and don't really see an easy solution. Our problem is that Eclipse is not running on the head node, so gethostbyname will not necessarily resolve to the same address. For example, the hostfile might refer to the head node by an internal network address that is not visible to the outside world. Since gethostname also looks in /etc/hosts, it may resolve locally but not on a remote system. The only think I can think of would be, rather than us reading the hostfile directly as we do now, to provide an option to ompi_info that would dump the hostfile using the same rules that you apply when you're using the hostfile. Would that be feasible? Greg On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote: Sorry for delay - was on vacation and am now trying to work my way back to the surface. I'm not sure I can fix this one for two reasons: 1. In general, OMPI doesn't really care what name is used for the node. However, the problem is that it needs to be consistent. In this case, ORTE has already used the name returned by gethostname to create its session directory structure long before mpirun reads a hostfile. This is why we retain the value from gethostname instead of allowing it to be overwritten by the name in whatever allocation we are given. Using the name in hostfile would require that I either find some way to remember any prior name, or that I tear down and rebuild the session directory tree - neither seems attractive nor simple (e.g., what happens when the user provides multiple entries in the hostfile for the node, each with a different IP address based on another interface in that node? Sounds crazy, but we have already seen it done - which one do I use?). 2. We don't actually store the hostfile info anywhere - we just use it and forget it. For us to add an XML attribute containing any hostfile-related info would therefore require us to re-read the hostfile. I could have it do that -only- in the case of "XML output required", but it seems rather ugly. An alternative might be for you to simply do a "gethostbyname" lookup of the IP address or hostname to see if it matches instead of just doing a strcmp. This is what we have to do internally as we frequently have problems with FQDN vs. non-FQDN vs. IP addresses etc. If the local OS hasn't cached the IP address for the node in question it can take a little time to DNS resolve it, but otherwise works fine. I can point you to the code in OPAL that we use - I would think something similar would be easy to implement in your code and would readily solve the problem. Ralph On Sep 19, 2008, at 7:18 AM, Greg Watson wrote: Ralph, The problem we're seeing is just with the head node. If I specify a particular IP address for the head node in the hostfile, it gets changed to the FQDN when displayed in the map. This is a problem for us as we need to be able to match the two, and since we're not necessarily running on the head node, we can't always do the same resolution you're doing. Would it be possible to use the same address that is specified in the hostfile, or alternatively provide an XML attribute that contains this information? Thanks, Greg On Sep 11, 2008, at 9:06 AM, Ralph Castain wrote: Not in that regard, depending upon what you mean by "recently". The only changes I am aware of wrt nodes consisted of some changes to the order in which we use the nodes when specified by hostfile or -host, and a little #if protectionism needed by Brian for the Cray port
Re: [OMPI devel] -display-map
Sorry for delay - had to ponder this one for awhile. Jeff and I agree that adding something to ompi_info would not be a good idea. Ompi_info has no knowledge or understanding of hostfiles, and adding that capability to it would be a major distortion of its intended use. However, we think we can offer an alternative that might better solve the problem. Remember, we now treat hostfiles in a very different manner than before - see the wiki page for a complete description, or "man orte_hosts". So the problem is that, to provide you with what you want, we need to "dump" the information from whatever default-hostfile was provided, and, if no default-hostfile was provided, then the information from each hostfile that was provided with an app_context. The best way we could think of to do this is to add another mpirun cmd line option --dump-hostfiles that would output the line-by-line name from the hostfile plus the name we resolved it to. Of course, --xml would cause it to be in xml format. Would that meet your needs? Ralph On Oct 15, 2008, at 3:12 PM, Greg Watson wrote: Hi Ralph, We've been discussing this back and forth a bit internally and don't really see an easy solution. Our problem is that Eclipse is not running on the head node, so gethostbyname will not necessarily resolve to the same address. For example, the hostfile might refer to the head node by an internal network address that is not visible to the outside world. Since gethostname also looks in /etc/hosts, it may resolve locally but not on a remote system. The only think I can think of would be, rather than us reading the hostfile directly as we do now, to provide an option to ompi_info that would dump the hostfile using the same rules that you apply when you're using the hostfile. Would that be feasible? Greg On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote: Sorry for delay - was on vacation and am now trying to work my way back to the surface. I'm not sure I can fix this one for two reasons: 1. In general, OMPI doesn't really care what name is used for the node. However, the problem is that it needs to be consistent. In this case, ORTE has already used the name returned by gethostname to create its session directory structure long before mpirun reads a hostfile. This is why we retain the value from gethostname instead of allowing it to be overwritten by the name in whatever allocation we are given. Using the name in hostfile would require that I either find some way to remember any prior name, or that I tear down and rebuild the session directory tree - neither seems attractive nor simple (e.g., what happens when the user provides multiple entries in the hostfile for the node, each with a different IP address based on another interface in that node? Sounds crazy, but we have already seen it done - which one do I use?). 2. We don't actually store the hostfile info anywhere - we just use it and forget it. For us to add an XML attribute containing any hostfile-related info would therefore require us to re-read the hostfile. I could have it do that -only- in the case of "XML output required", but it seems rather ugly. An alternative might be for you to simply do a "gethostbyname" lookup of the IP address or hostname to see if it matches instead of just doing a strcmp. This is what we have to do internally as we frequently have problems with FQDN vs. non-FQDN vs. IP addresses etc. If the local OS hasn't cached the IP address for the node in question it can take a little time to DNS resolve it, but otherwise works fine. I can point you to the code in OPAL that we use - I would think something similar would be easy to implement in your code and would readily solve the problem. Ralph On Sep 19, 2008, at 7:18 AM, Greg Watson wrote: Ralph, The problem we're seeing is just with the head node. If I specify a particular IP address for the head node in the hostfile, it gets changed to the FQDN when displayed in the map. This is a problem for us as we need to be able to match the two, and since we're not necessarily running on the head node, we can't always do the same resolution you're doing. Would it be possible to use the same address that is specified in the hostfile, or alternatively provide an XML attribute that contains this information? Thanks, Greg On Sep 11, 2008, at 9:06 AM, Ralph Castain wrote: Not in that regard, depending upon what you mean by "recently". The only changes I am aware of wrt nodes consisted of some changes to the order in which we use the nodes when specified by hostfile or -host, and a little #if protectionism needed by Brian for the Cray port. Are you seeing this for every node? Reason I ask: I can't offhand think of anything in the code base that would replace a host name with the FQDN because we don't get that info for remote nodes. The only exception is the head n
Re: [OMPI devel] -display-map
Hi Ralph, We've been discussing this back and forth a bit internally and don't really see an easy solution. Our problem is that Eclipse is not running on the head node, so gethostbyname will not necessarily resolve to the same address. For example, the hostfile might refer to the head node by an internal network address that is not visible to the outside world. Since gethostname also looks in /etc/hosts, it may resolve locally but not on a remote system. The only think I can think of would be, rather than us reading the hostfile directly as we do now, to provide an option to ompi_info that would dump the hostfile using the same rules that you apply when you're using the hostfile. Would that be feasible? Greg On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote: Sorry for delay - was on vacation and am now trying to work my way back to the surface. I'm not sure I can fix this one for two reasons: 1. In general, OMPI doesn't really care what name is used for the node. However, the problem is that it needs to be consistent. In this case, ORTE has already used the name returned by gethostname to create its session directory structure long before mpirun reads a hostfile. This is why we retain the value from gethostname instead of allowing it to be overwritten by the name in whatever allocation we are given. Using the name in hostfile would require that I either find some way to remember any prior name, or that I tear down and rebuild the session directory tree - neither seems attractive nor simple (e.g., what happens when the user provides multiple entries in the hostfile for the node, each with a different IP address based on another interface in that node? Sounds crazy, but we have already seen it done - which one do I use?). 2. We don't actually store the hostfile info anywhere - we just use it and forget it. For us to add an XML attribute containing any hostfile-related info would therefore require us to re-read the hostfile. I could have it do that -only- in the case of "XML output required", but it seems rather ugly. An alternative might be for you to simply do a "gethostbyname" lookup of the IP address or hostname to see if it matches instead of just doing a strcmp. This is what we have to do internally as we frequently have problems with FQDN vs. non-FQDN vs. IP addresses etc. If the local OS hasn't cached the IP address for the node in question it can take a little time to DNS resolve it, but otherwise works fine. I can point you to the code in OPAL that we use - I would think something similar would be easy to implement in your code and would readily solve the problem. Ralph On Sep 19, 2008, at 7:18 AM, Greg Watson wrote: Ralph, The problem we're seeing is just with the head node. If I specify a particular IP address for the head node in the hostfile, it gets changed to the FQDN when displayed in the map. This is a problem for us as we need to be able to match the two, and since we're not necessarily running on the head node, we can't always do the same resolution you're doing. Would it be possible to use the same address that is specified in the hostfile, or alternatively provide an XML attribute that contains this information? Thanks, Greg On Sep 11, 2008, at 9:06 AM, Ralph Castain wrote: Not in that regard, depending upon what you mean by "recently". The only changes I am aware of wrt nodes consisted of some changes to the order in which we use the nodes when specified by hostfile or -host, and a little #if protectionism needed by Brian for the Cray port. Are you seeing this for every node? Reason I ask: I can't offhand think of anything in the code base that would replace a host name with the FQDN because we don't get that info for remote nodes. The only exception is the head node (where mpirun sits) - in that lone case, we default to the name returned to us by gethostname(). We do that because the head node is frequently accessible on a more global basis than the compute nodes - thus, the FQDN is required to ensure that there is no address confusion on the network. If the user refers to compute nodes in a hostfile or -host (or in an allocation from a resource manager) by non-FQDN, we just assume they know what they are doing and the name will correctly resolve to a unique address. On Sep 10, 2008, at 9:45 AM, Greg Watson wrote: Hi, Has there been a change in the behavior of the -display-map option has changed recently in the 1.3 branch. We're now seeing the host name as a fully resolved DN rather than the entry that was specified in the hostfile. Is there any particular reason for this? If so, would it be possible to add the hostfile entry to the output since we need to be able to match the two? Thanks, Greg ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/dev
Re: [OMPI devel] -display-map
Sorry for delay - was on vacation and am now trying to work my way back to the surface. I'm not sure I can fix this one for two reasons: 1. In general, OMPI doesn't really care what name is used for the node. However, the problem is that it needs to be consistent. In this case, ORTE has already used the name returned by gethostname to create its session directory structure long before mpirun reads a hostfile. This is why we retain the value from gethostname instead of allowing it to be overwritten by the name in whatever allocation we are given. Using the name in hostfile would require that I either find some way to remember any prior name, or that I tear down and rebuild the session directory tree - neither seems attractive nor simple (e.g., what happens when the user provides multiple entries in the hostfile for the node, each with a different IP address based on another interface in that node? Sounds crazy, but we have already seen it done - which one do I use?). 2. We don't actually store the hostfile info anywhere - we just use it and forget it. For us to add an XML attribute containing any hostfile- related info would therefore require us to re-read the hostfile. I could have it do that -only- in the case of "XML output required", but it seems rather ugly. An alternative might be for you to simply do a "gethostbyname" lookup of the IP address or hostname to see if it matches instead of just doing a strcmp. This is what we have to do internally as we frequently have problems with FQDN vs. non-FQDN vs. IP addresses etc. If the local OS hasn't cached the IP address for the node in question it can take a little time to DNS resolve it, but otherwise works fine. I can point you to the code in OPAL that we use - I would think something similar would be easy to implement in your code and would readily solve the problem. Ralph On Sep 19, 2008, at 7:18 AM, Greg Watson wrote: Ralph, The problem we're seeing is just with the head node. If I specify a particular IP address for the head node in the hostfile, it gets changed to the FQDN when displayed in the map. This is a problem for us as we need to be able to match the two, and since we're not necessarily running on the head node, we can't always do the same resolution you're doing. Would it be possible to use the same address that is specified in the hostfile, or alternatively provide an XML attribute that contains this information? Thanks, Greg On Sep 11, 2008, at 9:06 AM, Ralph Castain wrote: Not in that regard, depending upon what you mean by "recently". The only changes I am aware of wrt nodes consisted of some changes to the order in which we use the nodes when specified by hostfile or - host, and a little #if protectionism needed by Brian for the Cray port. Are you seeing this for every node? Reason I ask: I can't offhand think of anything in the code base that would replace a host name with the FQDN because we don't get that info for remote nodes. The only exception is the head node (where mpirun sits) - in that lone case, we default to the name returned to us by gethostname(). We do that because the head node is frequently accessible on a more global basis than the compute nodes - thus, the FQDN is required to ensure that there is no address confusion on the network. If the user refers to compute nodes in a hostfile or -host (or in an allocation from a resource manager) by non-FQDN, we just assume they know what they are doing and the name will correctly resolve to a unique address. On Sep 10, 2008, at 9:45 AM, Greg Watson wrote: Hi, Has there been a change in the behavior of the -display-map option has changed recently in the 1.3 branch. We're now seeing the host name as a fully resolved DN rather than the entry that was specified in the hostfile. Is there any particular reason for this? If so, would it be possible to add the hostfile entry to the output since we need to be able to match the two? Thanks, Greg ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] -display-map and mpi_spawn
We always output the entire map, so you'll see the parent procs as well as the child On Sep 16, 2008, at 12:52 PM, Greg Watson wrote: Hi Ralph, No I'm happy to get a map at the beginning and at every spawn. Do you send the whole map again, or only an update? Regards, Greg On Sep 11, 2008, at 9:09 AM, Ralph Castain wrote: It already somewhat does. If you use --display-map at mpirun, you automatically get display-map whenever MPI_Spawn is called. We didn't provide a mechanism by which you could only display-map for MPI_Spawn (and not for the original mpirun), but it would be trivial to do so - just have to define an info-key for that purpose. Is that what you need? On Sep 11, 2008, at 5:35 AM, Greg Watson wrote: Ralph, At the moment -display-map shows the process mapping when mpirun first starts, but I'm wondering about processes created dynamically. Would it be possible to trigger a map update when MPI_Spawn is called? Regards, Greg ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] -display-map
Ralph, The problem we're seeing is just with the head node. If I specify a particular IP address for the head node in the hostfile, it gets changed to the FQDN when displayed in the map. This is a problem for us as we need to be able to match the two, and since we're not necessarily running on the head node, we can't always do the same resolution you're doing. Would it be possible to use the same address that is specified in the hostfile, or alternatively provide an XML attribute that contains this information? Thanks, Greg On Sep 11, 2008, at 9:06 AM, Ralph Castain wrote: Not in that regard, depending upon what you mean by "recently". The only changes I am aware of wrt nodes consisted of some changes to the order in which we use the nodes when specified by hostfile or - host, and a little #if protectionism needed by Brian for the Cray port. Are you seeing this for every node? Reason I ask: I can't offhand think of anything in the code base that would replace a host name with the FQDN because we don't get that info for remote nodes. The only exception is the head node (where mpirun sits) - in that lone case, we default to the name returned to us by gethostname(). We do that because the head node is frequently accessible on a more global basis than the compute nodes - thus, the FQDN is required to ensure that there is no address confusion on the network. If the user refers to compute nodes in a hostfile or -host (or in an allocation from a resource manager) by non-FQDN, we just assume they know what they are doing and the name will correctly resolve to a unique address. On Sep 10, 2008, at 9:45 AM, Greg Watson wrote: Hi, Has there been a change in the behavior of the -display-map option has changed recently in the 1.3 branch. We're now seeing the host name as a fully resolved DN rather than the entry that was specified in the hostfile. Is there any particular reason for this? If so, would it be possible to add the hostfile entry to the output since we need to be able to match the two? Thanks, Greg ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] -display-map and mpi_spawn
> thanks, applied oops, replied to the wrong message ;)
Re: [OMPI devel] -display-map and mpi_spawn
thanks, applied
Re: [OMPI devel] -display-map and mpi_spawn
Hi Ralph, No I'm happy to get a map at the beginning and at every spawn. Do you send the whole map again, or only an update? Regards, Greg On Sep 11, 2008, at 9:09 AM, Ralph Castain wrote: It already somewhat does. If you use --display-map at mpirun, you automatically get display-map whenever MPI_Spawn is called. We didn't provide a mechanism by which you could only display-map for MPI_Spawn (and not for the original mpirun), but it would be trivial to do so - just have to define an info-key for that purpose. Is that what you need? On Sep 11, 2008, at 5:35 AM, Greg Watson wrote: Ralph, At the moment -display-map shows the process mapping when mpirun first starts, but I'm wondering about processes created dynamically. Would it be possible to trigger a map update when MPI_Spawn is called? Regards, Greg ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] -display-map and mpi_spawn
It already somewhat does. If you use --display-map at mpirun, you automatically get display-map whenever MPI_Spawn is called. We didn't provide a mechanism by which you could only display-map for MPI_Spawn (and not for the original mpirun), but it would be trivial to do so - just have to define an info-key for that purpose. Is that what you need? On Sep 11, 2008, at 5:35 AM, Greg Watson wrote: Ralph, At the moment -display-map shows the process mapping when mpirun first starts, but I'm wondering about processes created dynamically. Would it be possible to trigger a map update when MPI_Spawn is called? Regards, Greg ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] -display-map
Not in that regard, depending upon what you mean by "recently". The only changes I am aware of wrt nodes consisted of some changes to the order in which we use the nodes when specified by hostfile or -host, and a little #if protectionism needed by Brian for the Cray port. Are you seeing this for every node? Reason I ask: I can't offhand think of anything in the code base that would replace a host name with the FQDN because we don't get that info for remote nodes. The only exception is the head node (where mpirun sits) - in that lone case, we default to the name returned to us by gethostname(). We do that because the head node is frequently accessible on a more global basis than the compute nodes - thus, the FQDN is required to ensure that there is no address confusion on the network. If the user refers to compute nodes in a hostfile or -host (or in an allocation from a resource manager) by non-FQDN, we just assume they know what they are doing and the name will correctly resolve to a unique address. On Sep 10, 2008, at 9:45 AM, Greg Watson wrote: Hi, Has there been a change in the behavior of the -display-map option has changed recently in the 1.3 branch. We're now seeing the host name as a fully resolved DN rather than the entry that was specified in the hostfile. Is there any particular reason for this? If so, would it be possible to add the hostfile entry to the output since we need to be able to match the two? Thanks, Greg ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] -display-map and mpi_spawn
Ralph, At the moment -display-map shows the process mapping when mpirun first starts, but I'm wondering about processes created dynamically. Would it be possible to trigger a map update when MPI_Spawn is called? Regards, Greg
[OMPI devel] -display-map
Hi, Has there been a change in the behavior of the -display-map option has changed recently in the 1.3 branch. We're now seeing the host name as a fully resolved DN rather than the entry that was specified in the hostfile. Is there any particular reason for this? If so, would it be possible to add the hostfile entry to the output since we need to be able to match the two? Thanks, Greg
[OMPI devel] Display map and allocation
Hi folks I am giving a series of talks here about OMPI 1.3, beginning with a description of the user-oriented features - i.e., cmd line options, etc. In working on the presentation, and showing a draft to some users, questions arose about two options: --display-map and --display- allocation. To be fair, Greg Watson had raised similar questions before. The questions revolve around the fact that the data provided by those options contains a lot of stuff that, while immensely useful to an OMPI developer, are of no use to a user and actually cause confusion. What we propose, therefore, is to revise these options: --display-map: displays a list of nodes, to include node name and state and a list of the procs on that node. For each proc, show the MPI rank, local and node ranks, any slot list for that proc (if given), and state. --display-allocation: displays a list of nodes to include node name, slot info, username (if given), and state ("unknown" if not known) We would then add two new options that show the broader output we have today: --debug-display-map, and --debug-display-allocation. Anybody have heartburn and/or comments on this? If not, I plan to make the change by the end of the week. Ralph