[
https://issues.apache.org/jira/browse/AIRAVATA-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431989#comment-13431989
]
Milinda Lakmal Pathirage commented on AIRAVATA-477:
---------------------------------------------------
Proposed changes to GFac
====================
Handler Based Architecture
----------------------------------------
Execution of a GFac job includes step like input file staging, initializing
execution environment(cluster setup, etc.) and output file staging. In the
current trunk implementation some of these steps(input data movement) happen
inside provider initialization and it limits the flexibility of having
different execution sequences. For example in Hadoop cloud bursting scenario,
we may have to setup HDFS cluster first then move the data to HDFS and execute
the Hadoop job. So in this case we need to have flexibility to setup HDFS
cluster before doing any data movements. If we follow the current
implementation pattern, we had to implement support for these different
scenarios in the providers initialization method and this will make provider
logic complex.
Also handler based architecture allow us to reuse already available handlers in
different execution flows. For example we can have GridFTP to Amazon S3 data
movement handler which can be used with different providers(Hadoop, etc.).
Another improvement we can do on top of handler architecture is dynamically
changing handler execution sequence based on application description. Initial
implementation will come with the static handler sequence. But I am planning to
scheduler which decides handler sequence based on application description.
Flow of information between handlers can be done via parameters and handler
implementor should document required parameters by the handler and what are the
parameters that handler will make available to other handlers.
Attached "GFac Improvements.png" visually explain the new architecture.
Changes to context hierarchy
------------------------------------------
JobExecutionContext carries the data and configuration information related to
single job execution. JobExecutionContext will contain input and output
MessageContext instances which carry data related to input and output. There
will be a SecurityContext which can be used to set security related properties
to each message context. ApplicationContext contains information related to
application and there will be a ApplicationContext associated with
JobExecutionContext.
New GFac code can be found at [1].
[1]
https://svn.codespot.com/a/apache-extras.org/airavata-gsoc-sandbox/trunk/milinda
> Refactoring GFac
> ----------------
>
> Key: AIRAVATA-477
> URL: https://issues.apache.org/jira/browse/AIRAVATA-477
> Project: Airavata
> Issue Type: Improvement
> Components: GFac
> Affects Versions: WISHLIST
> Reporter: Milinda Lakmal Pathirage
> Fix For: WISHLIST
>
> Attachments: GFac Improvements.png
>
>
> While implementing Hadoop provider for GFac we identified several issues with
> the current implementation. Mainly in data transfer and GFac API. Plan is to
> branch out current GFac implementation and merge once the refactoring is
> done. There will be several sub issues created for each and every tasks
> involved in this process and this issue will act as the main task.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira