Thanks Lahiru for your input. I see your point on loose coupling. If we could clearly separate out the task as IN and OUT, then current mechanism would be ideal. But in my case it is actually a wrapper around task execution. I think I have not explained in detail, what I am trying to implement. Basically I am trying to integrate with OODT for input file staging and ingesting the out back into OODT file manager server and metadata catalog. For both of these tasks I need to maintain connection with OODT file manage server. In my case ideal implementation would be an wrapper for task execution. PGETaskInstance is not an heavy weight object and adding it into JobexceutionContext won't be a good solution in my case (If we added it into JobexecutionContext, it will remain in memory throughout the execution as well). Actually PGETaskInstance is an similar wrapper implementation for OODT workflow execution and I have seen similar kind of implementation in Traverna which is based on interceptors. I have added my changes as review request and added Airavata reviewers into reviewers group. You could have a look at on ApacheAiravataWorkfloeInstanceImpl for what I am trying to achieve. As you mentioned, I could implement OUT handler for post processing part and it just need to reinitialize connection/configuration with OODT (I just feel that it is unnecessary and we could have avoid that if we have a wrapper kind of solution). I'll go ahead and complete the integration with OUT wrapper.
Best Regards, Sanjaya On Tue, Jun 11, 2013 at 6:47 PM, Lahiru Gunathilake <[email protected]>wrote: > Hi Sanjaya, > > Please see my inline comments. I am proposing a solution for your issue > which looks more efficient rather writing two handlers. You can set a this > PGETaskInstance in to Jobexecutioncontext(see the AbstractContext class) > and use it in your outHandler, you really don't have to create two > instances if u want to reuse it. > On Tue, Jun 11, 2013 at 8:22 AM, Sanjaya Medonsa <[email protected] > >wrote: > > > Hi, > > As per current design of Handlers, there are two types of handlers. > > 1. IN Handlers > > 2. OUT Handlers > > > > Basically IN handler does the pre processing and out handler does the > post > > processing. With Airavata OODT integration, I am planning to implement IN > > handler to perform file staging and out handler perform output ingesting > > into CAS. That means two handler instances to handle pre/post processing. > > In my scenario, this approach seems bit inefficient. Both IN/OUT handlers > > are based on OODT PGETaskInstance. Due to current handler architecture, I > > need to create two instance of PGETaskInstance (One for IN handler and > one > > for OUT handler). I guess we could have avoid this situation by having > just > > GFac handlers which could either be IN, OUT or IN/OUT. In my case, I > > actually need to implement IN/OUT handler. In high level, I am proposing > > the following approach. > > 1. At configuration level no differentiation on IN/OUT handlers > > > Even now there's no difference in IN/OUT Handlers, it becomes IN or OUT > based on how you configure, its same interface, you can use one handler as > IN in one configuration and OUT in another configuration or in the same > configuration. > > > 2. Instead GFacHandler interface should contain two methods > > (preInvoke/postInvoke). Depending on the type of handler either pre/post > > method should be implemented. > > PRE - preInvoke > > POST - postInvoke > > PRE/POST - Both preInvoke/postInvoke > > > IMHO, current approach is more cleaner one which handle loose coupling with > one task at a time. If we do everything in single handler we need to keep > these data during whole time of execution. Job execution time could be huge > and we will be keeping all the Handler configuration in memory during the > whole execution time. I am not sure what is PGETaskInstance and whether its > efficient to keep this loaded during the whole execution period, other than > loading on demand by configuration. > > > 3. Either we could instantiate all the handlers initially and > > invoke all pre methods prior to task execution and invoke all post > methods > > after task execution. > > If this approach is bit inefficient, then we could > > introduce type into handlers (PRE/POST/PREPOST). Prior to task execution > we > > could instantiate PRE/PREPOST and invoke pre execution method.After task > > execution we could instantiate POST handlers and invoke postExceution > > method for both POST/PREPOST handlers. > > > > I guess Handler may not be the correct name here, we could name these > > handlers as task wrappers as it refers in OODT. Let me know your > feedback. > > > > Cheers, > > Sanjaya > > > > > > -- > System Analyst Programmer > PTI Lab > Indiana University >
