On Jul 30, 2012, at 2:37 AM, George Bosilca wrote:

> I think that as long as there is a single home area per cluster the 
> difference between the different approaches might seem irrelevant to most of 
> the people.

Yeah, I agree - after thinking about it, it probably didn't accomplish much.

> 
> My problem is twofold. First, I have a common home area across several 
> different development clusters. Thus I have direct access through ssh to any 
> machine. If I create a single large machinefile, it turns out that every 
> mpirun will spawn a daemon on every single node, even if I only run a 
> ping-pong test.

That shouldn't happen if you specify the hosts you want to use, either via 
-host or -hostfile. I assume you are specifying nothing and so you get that 
behavior?

> Second, while I usually run my apps on the same set of resources I need on a 
> regular base to switch my nodes for few tests.
> 
> What I was hoping to achieve is a machinefile containing the "default" 
> development cluster (aka. the cluster where I'm almost alone so my deamons 
> have minimal chances to disturb other people experiences), and then use a 
> machinefile to sporadicly change the cluster where I run for smaller tests. 
> Unfortunately, this doesn't work due to the filtering behavior described in 
> my original email.

Why not just set the default hostfile to point to the new machinefile via the 
"--default-hostfile foo" option to mpirun, or you can use the corresponding MCA 
param?

I'm not trying to re-open the hostfile discussion, but I would be interested to 
hear how you feel -hostfile should work. I kinda gather you feel it should 
override the default hostfile instead of filter it, yes? My point being that I 
don't particularly know if anyone would disagree with that behavior, so we 
might decide to modify things if you want to propose it.

Ralph


> 
>  george.
> 
> 
> On Jul 28, 2012, at 19:24 , Ralph Castain wrote:
> 
>> It's been awhile, but I vaguely remember the discussion. IIRC, the rationale 
>> was that the default hostfile was equivalent to an RM allocation and should 
>> be treated the same. So hostfile and -host become filters in that case.
>> 
>> FWIW, I believe the discussion was split on that question. I added a "none" 
>> option to the default hostfile MCA param so it would be ignored in the case 
>> where (a) the sys admin has given a default hostfile, but (b) someone wants 
>> to use hosts outside of it.
>> 
>>               MCA orte: parameter "orte_default_hostfile" (current value: 
>> <none>, data source: default value)
>>                         Name of the default hostfile (relative or absolute 
>> path, "none" to ignore environmental or default MCA param setting)
>> 
>> That said, I can see a use-case argument for behaving somewhat differently. 
>> We've even had cases where users have gotten an allocation from an RM, but 
>> want to add hosts that are external to the cluster to the job.
>> 
>> It would be rather trivial to modify the logic:
>> 
>> 1. read the default hostfile or RM allocation for our baseline
>> 
>> 2. remove any hosts on that list that are *not* in the given hostfile
>> 
>> 3. add any hosts that are in the given hostfile, but weren't in the default 
>> hostfile
>> 
>> And subsequently do the same for -host. I think that would retain the spirit 
>> of the discussion, but provide more flexibility and provide a tad more 
>> "expected" behavior.
>> 
>> I don't have an iron in this fire as I don't use hostfiles, so I'm happy to 
>> implement whatever the community would like to see.
>> Ralph
>> 
>> On Jul 27, 2012, at 6:30 PM, George Bosilca wrote:
>> 
>>> I'm somewhat puzzled by the behavior of the -hostfile in Open MPI. Based on 
>>> the FAQ it is supposed to provide a list of resources to be used by the 
>>> launcher (in my case ssh) to start the processes. Make sense so far.
>>> 
>>> However, if the configuration file contain a value for 
>>> orte_default_hostfile, then the behavior of the hostfile option change 
>>> drastically, and the option become a filter (the machines must be on the 
>>> original list or a cryptic error message is displayed).
>>> 
>>> Overall, we have a well defined [mostly] consistent behavior for parameters 
>>> in Open MPI. We have an order of precedence of sources of MCA parameters, 
>>> clearly defined which make understanding where a value comes 
>>> straightforward. I'm absolutely certain there was a group discussion about 
>>> this unique "eccentricity" regarding the hostfile option, but I fail to 
>>> remember what was the reason we decided to go this way. Can I have a quick 
>>> refresh please?
>>> 
>>> Thanks,
>>> george.
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to