Re: Orchestration Component implementation review

Marlon Pierce Mon, 20 Jan 2014 06:47:57 -0800

+1 for real use cases first. We have at least 3.  But I'm sure we will
want to make it as easy as possible for developers to pass back the
correct, created experimentID when invoking launchExperiment.



Marlon

On 1/17/14 2:57 PM, Saminda Wijeratne wrote:
> Marlon, I think until we put this to real use we wont get much feedback on
> what aspects we should focus on more and in what features we should expand
> or prioritize on. So how about having a test plan for the Orchestrator.
> Expose it to real usecases and see how it will survive. WDYT?
>
> It might be a little confusing to return a "JobRequest" object from the
> Orchestrator (since its a response). Or perhaps it should be renamed?
>
> Sachith, I think we should have a google hangout or a separate mail thread
> (or both) to discuss muti-threaded support. Could you organize this please?
>
>
> On Fri, Jan 17, 2014 at 10:29 AM, Amila Jayasekara
> <thejaka.am...@gmail.com>wrote:
>
>>
>>
>> On Fri, Jan 17, 2014 at 10:32 AM, Saminda Wijeratne 
>> <samin...@gmail.com>wrote:
>>
>>> Following are few thoughts I had during my review of the component,
>>>
>>> *Multi-threaded vs single threaded*
>>> If we are going to have multi-threaded job submission the implementation
>>> should work on handling race conditions. Essentially JobSubmitter should be
>>> able to "lock" an experiment request before continuing processing that
>>> request so that other JobSubmitters accessing the experiment requests a the
>>> same time would skip it.
>>>
>> +1. These are implementation details.
>>
>>
>>> *Orchestrator service*
>>> We might want to think of the possibility in future where we will be
>>> having multiple deployments of an Airavata service. This could particularly
>>> be true for SciGaP. We may have to think how some of the internal data
>>> structures/SPIs should be updated to accomodate such requirements in future.
>>>
>> +1.
>>
>>
>>> *Orchestrator Component configurations*
>>> I see alot of places where the orchestrator can have configurations. I
>>> think its too early finalize them, but I think we can start refactoring
>>> them out perhaps to the airavata-server.properties. I'm also seeing the
>>> orchestrator is now hardcoded to use default/admin gateway and username. I
>>> think it should come from the request itself.
>>>
>> +1. But in overall we may need to change the way we handle configurations
>> within Airavata. Currently we have multiple configuration files and
>> multiple places where we read configurations. IMO we should have a separate
>> module to handle configurations. Only this module should be aware how to
>> intepret configurations in the file and provide a component interface to
>> access those configuration values.
>>
> +1 we tried this once with "ServerSettings" and "ApplicationSettings", but
> apparently again more configuration files seems to have spawned. So far
> however they seemed to be localized for their component now.
>
>>
>>> *Visibility of API functions*
>>> I think initialize(), shutdown() and startJobSubmitter() functions should
>>> not be part of the API because I don't see a scenario where the gateway
>>> developer would be responsible for using them. They serve a more internal
>>> purpose of managing the orchestrator component IMO. As Amila pointed out so
>>> long ago (wink) functions that do not concern outside parties should not be
>>> used as part of the API.
>>>
>> +1
>>
>>
>>> *Return values of Orchestrator API*
>>> IMO unless it is specifically required to do so I think the functions
>>> does not necessarily need to return anything other than throw exceptions
>>> when needed. For example the launchExperiment can simply return void if all
>>> is succesful and return an exception if something fails. Handling issues
>>> with a try catch is not only simpler but also the explanations are readily
>>> available for the user.
>>>
>> +1. Also try to have different exception for different scenarios. For
>> example if persistence (hypothetical) fails, DatabasePersistenceException,
>> if validation fails, ValidationFailedException etc ... Then the developer
>> who uses the API can catch these different exceptions and act on them
>> appropriately.
>>
> +1. What needs to be understood here is that the Exception should be a
> Gateway friendly exception. i.e. it should not expose internal details of
> Airavata at the top-level exception and exception message should be self
> explanatory enough for the gateway developer not to remain scratching
> his/her head after reading the exception. A feedback from Sudhakar sometime
> back was to provide suggestions in the exception message on how to resolve
> the issue.
>
>>
>>> *Data persisted in registry*
>>> ExperimentRequest.getUsername() : I think we should clarify what this
>>> username denotes. In current API, in experiment submission we consider two
>>> types of users. Submission user (the user who submits the experiment to the
>>> Airavata Server - this is inferred by the request itself) and the execution
>>> user (the user who corelates to the application executions of the gateway -
>>> thus this user can be a different user for different gateway, eg: community
>>> user, gateway user).
>>> I think we should persist the date/time of the experiment request as
>>> well.
>>>
>> +1
>>
>>>  Also when retrying of API functions in the case of a failure in an
>>> previous attempt there should be a way to not to repeat already performed
>>> steps or gracefully roleback and redo those required steps as necessary.
>>> While such actions could be transparent to the user sometimes it might make
>>> sense to allow user to be notified of success/failure of a retry. However
>>> this might mean keeping additional records at the registry level.
>>>
>> In addition we should also have a way of cleaning up unsubmitted
>> experiment ids. (But not sure whether you want to address this right now).
>> The way I see this is to have a periodic thread which goes through the
>> table and clear up experiments which are not submitted for a defined time.
>>
> +1. Something else we may have to think of later is the data archiving
> capabilities. We keep running in to performance issues when the database
> grows with experiment results. Unless we become experts of distributed
> database management we should have a way better way to manage our db
> performance issues.
>
>
>> BTW, nice review notes, Saminda.
>>
>> Thanks
>> Amila
>>
>>
>>

Re: Orchestration Component implementation review

Reply via email to