[ 
https://issues.apache.org/jira/browse/AIRAVATA-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431989#comment-13431989
 ] 

Milinda Lakmal Pathirage commented on AIRAVATA-477:
---------------------------------------------------

Proposed changes to GFac
====================


Handler Based Architecture
----------------------------------------

Execution of a GFac job includes step like input file staging, initializing 
execution environment(cluster setup, etc.) and output file staging. In the 
current trunk implementation some of these steps(input data movement) happen 
inside provider initialization and it limits the flexibility of having 
different execution sequences. For example in Hadoop cloud bursting scenario, 
we may have to setup HDFS cluster first then move the data to HDFS and execute 
the Hadoop job. So in this case we need to have flexibility to setup HDFS 
cluster before doing any data movements. If we follow the current 
implementation pattern, we had to implement support for these different 
scenarios in the providers initialization method and this will make provider 
logic complex.

Also handler based architecture allow us to reuse already available handlers in 
different execution flows. For example we can have GridFTP to Amazon S3 data 
movement handler which can be used with different providers(Hadoop, etc.).

Another improvement we can do on top of handler architecture is dynamically 
changing handler execution sequence based on application description. Initial 
implementation will come with the static handler sequence. But I am planning to 
scheduler which decides handler sequence based on application description.

Flow of information between handlers can be done via parameters and handler 
implementor should document required parameters by the handler and what are the 
parameters that handler will make available to other handlers.

Attached "GFac Improvements.png" visually explain the new architecture.


Changes to context hierarchy
------------------------------------------

JobExecutionContext carries the data and configuration information related to 
single job execution. JobExecutionContext will contain input and output 
MessageContext instances which carry data related to input and output. There 
will be a SecurityContext which can be used to set security related properties 
to each message context. ApplicationContext contains information related to 
application and there will be a ApplicationContext associated with 
JobExecutionContext.


New GFac code can be found at [1].

[1] 
https://svn.codespot.com/a/apache-extras.org/airavata-gsoc-sandbox/trunk/milinda


                
> Refactoring GFac
> ----------------
>
>                 Key: AIRAVATA-477
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-477
>             Project: Airavata
>          Issue Type: Improvement
>          Components: GFac
>    Affects Versions: WISHLIST
>            Reporter: Milinda Lakmal Pathirage
>             Fix For: WISHLIST
>
>         Attachments: GFac Improvements.png
>
>
> While implementing Hadoop provider for GFac we identified several issues with 
> the current implementation. Mainly in data transfer and GFac API. Plan is to 
> branch out current GFac implementation and merge once the refactoring is 
> done. There will be several sub issues created for each and every tasks 
> involved in this process and this issue will act as the main task. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to