[ 
https://issues.apache.org/jira/browse/HADOOP-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nate Woody updated HADOOP-5441:
-------------------------------

        Fix Version/s: 0.19.1
    Affects Version/s: 0.19.1
         Release Note: Allow dynamic loading of nodePool objects and moves 
remote start (pbsdsh) functionality out of Scheduler objects
               Status: Patch Available  (was: Open)

This patch removes the pbsdsh command from Schedulers/torque and moves it into 
a new module.  NodePool parent object was given a new method to allow selection 
of the appropriate remote start object at runtime from a configuration method.  
Common/desc was modified to provide access to the remote-start config-file 
option and sets pbsdsh as the default.  Common/nodepoolutil was modified to 
allow dynamic loading of nodePool objects based on the naming scheme used for 
the TorquePool class.    

> HOD refactoring to ease integration with scheduler/resource managers other 
> than torque
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hod
>    Affects Versions: 0.19.1
>         Environment: All
>            Reporter: Nate Woody
>             Fix For: 0.19.1
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Situation: HOD currently uses the pbsdsh (a distributed shell that works via 
> Torque's TM interface to start remote processes) command to start processes 
> on all nodes in the job.  This call is provided as part of a torqueInterface 
> class that is meant to abstract interactions with the torque resource 
> managers (RMs).  However, this is not functionality typically provided by 
> other RMs, and is instead typically performed by an distributed command 
> available on the HPC system, mpiexec, ssh, or site-specific scripts.  The 
> specificity of pbsdsh to Torque makes writing HOD interfaces to other RMs 
> somewhat difficult as it forces the implementer to choose the remote start 
> method on a somewhat faulty per-RM basis.
> Proposal: Refactor the torqueInterface and nodePool classes so that the 
> choice of remote start method is available as a configuration option in 
> hodrc.  This involves fairly simple changes to remove the pbsdsh command from 
> the Scheduler class and addition configuration step of starting the 
> appropriate remote start wrapper.  The selection of the nodePool class will 
> be altered to allow dynamic loading of classes, so that new interfaces people 
> choose to write will not require altering HOD code.  Provide remote start 
> classes for pbsdsh, mpiexec, ssh, as well as custom scripts (sites often 
> provide mpiexec wrappers that ensure proper selection of network interfaces, 
> etc).  Provide interface classes to SGE and Moab, as well as updated Torque 
> class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to