Hi Everyone,

During the past few days, I’ve been implementing the tasks which are
related to the Data Parsing. To give a heads up, the following image
depicts the top level architecture of the implementation.


​
Following are the main task components have been identified,

*1. DataParsing Task*

This task will get the stored output and will find the matching Parser
(Gaussian, Lammps, QChem, etc.) and send the output through the selected
parser to get a well-structured JSON


*2. Validating Task*

This is to validate the desired JSON output is achieved or not. That is
JSON output should match with the respective schema(Gaussian Schema, Lammps
Schema, QChem Schema, etc.)


*3. Persisting Task*

This task will persist the validated JSON outputs

The successfully stored outputs will be exposed to the outer world.


According to the diagram the generated JSON should be shared between the
tasks(DataParsing, Validating, and, Persisting tasks). Neither DataParsing
task nor Validating task persists the JSON, therefore, helix task framework
should make sure to share the content between the tasks.

In this Helix tutorial [1] it says how to share the content between Helix
tasks. The problem is, the method [2] which has been given only capable of
sharing String typed key-value data.
However, I can come up with an implementation to share all the values
related to the JSON output. That involves calling this method [2] many
times. I believe that is not a very efficient method because Helix task
framework has to call this [3] method many times (taking into consideration
that the generated JSON output can be larger).

I have already sent an email to the Helix mailing list to clarify whether
there is another way and also will it be efficient if this method [2] is
called multiple times to get the work done.

Am I on the right track? Your suggestions would be very helpful and please
add if anything is missing.


[1]
http://helix.apache.org/0.8.0-docs/tutorial_task_framework.html#Share_Content_Across_Tasks_and_Jobs
[2]
https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/task/UserContentStore.java#L75
[3]
https://github.com/apache/helix/blob/helix-0.6.x/helix-core/src/main/java/org/apache/helix/task/TaskUtil.java#L361

Thanks,
Lahiru

On 26 March 2018 at 19:44, Lahiru Jayathilake <[email protected]>
wrote:

> Hi Dimuthu, Suresh,
>
> Thanks a lot for the feedback. I will update the proposal accordingly.
>
> Regards,
> Lahiru
>
> On 26 March 2018 at 08:48, Suresh Marru <[email protected]> wrote:
>
>> Hi Lahiru,
>>
>> I echo Dimuthu’s comment. You have a good starting point, it will be nice
>> if you can cover how users can interact with the parsed data. Essentially
>> adding API access to the parsed metadata database and having proof of
>> concept UI’s. This task could be challenging as the queries are very data
>> specific and generalizing API access and building custom UI’s can be
>> explanatory (less  defined) portions of your proposal.
>>
>> Cheers,
>> Suresh
>>
>>
>> On Mar 25, 2018, at 8:12 PM, DImuthu Upeksha <[email protected]>
>> wrote:
>>
>> Hi Lahiru,
>>
>> Nice document. And I like how you illustrate the systems through
>> diagrams. However try to address how you are going to expose parsed data to
>> outside through thrift APIs and how to design those data APIs in
>> application specific manner. And in the persisting task, you have to make
>> sure data integrity is preserved. For example in a Gaussian parsed output,
>> you might have to validate the parsed output using a schema before
>> persisting them in the database.
>>
>> Thanks
>> Dimuthu
>>
>> On Sun, Mar 25, 2018 at 5:05 PM, Lahiru Jayathilake <
>> [email protected]> wrote:
>>
>>> Hi Everyone,
>>>
>>> I have shared a draft proposal [1] for the GSoC project, AIRAVATA-2718
>>> [2]. Any comments would be very helpful to improve it.
>>>
>>> [1] https://docs.google.com/document/d/1xhgL1w9Yn_c1d5PpabxJ
>>> JNNLTbkgggasMBM-GsBjVHM/edit?usp=sharing
>>> [2] https://issues.apache.org/jira/browse/AIRAVATA-2718
>>>
>>> Thanks & Regards,
>>> --
>>> Lahiru Jayathilake
>>> Department of Computer Science and Engineering,
>>> Faculty of Engineering,
>>> University of Moratuwa
>>>
>>> <https://lk.linkedin.com/in/lahirujayathilake>
>>>
>>
>>
>>
>
>
> --
> Lahiru Jayathilake
> Department of Computer Science and Engineering,
> Faculty of Engineering,
> University of Moratuwa
>
> <https://lk.linkedin.com/in/lahirujayathilake>
>



-- 
Lahiru Jayathilake
Department of Computer Science and Engineering,
Faculty of Engineering,
University of Moratuwa

<https://lk.linkedin.com/in/lahirujayathilake>

Reply via email to