Re: Implementation of JPA

2017-08-01 Thread Apoorv Palkar
After talking to marcus/marlon, I've decided to go with just the JDBC 
implementation. I just need the ability to read/write to the MySQL DB. I didn't 
want to spend too much time learning the material required for JPA.


Palkar.



--shoutout Marcus 



-Original Message-
From: Shenoy, Gourav Ganesh 
To: dev 
Sent: Tue, Aug 1, 2017 10:31 am
Subject: Re: Implementation of JPA



Hi Apoorv,
 
Well it’s difficult to say which one is absolutely better over the other. But 
yes, generally Hibernate is considered to be more optimized for 
Persistence/Retrieval on average for large number of entities. Hibernate also 
offers more utility methods which at times simplifies the extra code you would 
have to write in OpenJPA. But I have used OpenJPA for a long enough time, and 
once you get beyond learning the functionalities you realize that it’s easier 
to deal with a minimal set of annotations in OpenJPA; whereas Hibernate has 
some extra wrapper annotations.
 
An important consideration is compatibility – OpenJPA annotations are certain 
to work with most JPA implementations, but not vice-versa. This plays a big 
role when you want to switch your JPA implementations (generally does not 
happen). Having said that, Hibernate has way more documentation and helpful 
sources online – if you’re facing any issues, etc.
 
Thanks and Regards,
Gourav Shenoy
 

From: Apoorv Palkar 
Reply-To: "dev@airavata.apache.org" 
Date: Monday, July 31, 2017 at 10:03 AM
To: "dev@airavata.apache.org" 
Subject: Implementation of JPA

 

Dear Dev,

 

I'm currently developing the code for the registry to be used for the 
monitoring system in Airavata. I'm looking at the pros/cons of each JPA 
implementation and was wondering if anyone has any recommendations. I'm 
choosing between Hibernate, OpenJPA, and EclipseLink. I understand Hibernate is 
the most mature, widely used technology. I was trying to determine Hibernate's 
cons. Does anybody have previous knowledge about Hibernate ? My use case for 
the database(most likely MySQL DB) is to read/write/store data about experiment 
ID, name, and statuses.

 

 

Thanks,

 

A. Palkar 

 

--shoutout marcus





Implementation of JPA

2017-07-31 Thread Apoorv Palkar
Dear Dev,


I'm currently developing the code for the registry to be used for the 
monitoring system in Airavata. I'm looking at the pros/cons of each JPA 
implementation and was wondering if anyone has any recommendations. I'm 
choosing between Hibernate, OpenJPA, and EclipseLink. I understand Hibernate is 
the most mature, widely used technology. I was trying to determine Hibernate's 
cons. Does anybody have previous knowledge about Hibernate ? My use case for 
the database(most likely MySQL DB) is to read/write/store data about experiment 
ID, name, and statuses.




Thanks,


A. Palkar 


--shoutout marcus


Re: Monitoring System Diagram

2017-07-25 Thread Apoorv Palkar
Yes Gourav,


Currently, I'm working on making the monitoring aspect of Airavata as 
independent as possible. I'm approaching the problem as if it doesn't matter 
whether the current architecture is being used or Helix is being used. After 
coming up with a good design and implementation code, we can proceed to see how 
to connect the pieces. From there we can probably split the DAG into two parts 
instead of having 1 DAG with the monitoring system.


- A Palkar.




shout out Marcus --



-Original Message-
From: Shenoy, Gourav Ganesh 
To: dev 
Sent: Tue, Jul 25, 2017 11:04 am
Subject: Re: Monitoring System Diagram



Apoorv,
 
Good work with the architecture. But I think the "request to monitor via helix 
mechanism" and "output to helix orchestration" are very crucial pieces here – 
which need to be detailed. Rest of it is kind of trivial, but what is important 
is how this system blends in to the Helix design. Do you think you can add more 
details to the above statements?
 
Once again, you are heading in the right direction – good job!
 
Thanks and Regards,
Gourav Shenoy
 

From: Apoorv Palkar 
Reply-To: "dev@airavata.apache.org" 
Date: Monday, July 24, 2017 at 1:14 PM
To: "dev@airavata.apache.org" 
Subject: Monitoring System Diagram

 

K Dev,

 

I have attached a architecture diagram for the monitoring system. Currently the 
challenges we are facing is that GFAC is heavily tied to the monitoring system 
via task execution. The ultimate goal to separate this from the current GFAC. I 
understand Marlon doesn't want me looking at the code too much to avoid bias. I 
have glanced at some specifics/couple lines to get an idea of how monitoring is 
currently implemented in Airavata.

 

Previously, I had been working on the parsing of the particular emails: PBS, 
slurs, UGA, etc. Over the weekend, I ran some performance matrix tests on the 
current parsing code as Shameera suggested. The current code written is quite 
balanced in terms of large scale processing. It is able to quickly parse the 
emails and still maintain a high degree of simplicity. I improved on a couple 
lines without using regex, however the code proved to be highly unmaintainable. 
As shameera/marlon pointed out, these emails change relatively frequently as 
servers/machines are upgraded/replaced. It is important for this code to be 
highly maintainable.

 

In addition to this, I have been working with Supun to develop a new 
architecture for the mailing list. At first, there was a debate on whether to 
use Zookeeper and/or Redis in a global state. I conducted some research to 
identify the pros and cons of each technology. As Suresh/Gourav suggested, 
airaveata currently uses zookeeper. Also, zookeeper would provide less overhead 
than a database such as Regis. A big problem with this development strategy is 
the complexity of the code we will have to write. In the scenario of multiple 
GFaCs, a global zookeeper makes some sense. However the problem comes if a job 
is cancelled. This can potentially cause edge case scenario problems where say 
GFaC A accidentally processes GFAC B's emails. Therefore, we have to imagine on 
a low level, a clever implementation of locks for who needs to access data and 
who doesn't. This can prove to be a hassle.

 

 

Another potential solution we can have is to implement a work queue similar to 
our job submission in Airavata. The work queue delegates the work of 
parsing/reading emails to multiple gfacs. This potentially could avoid 
lock/thread/dangerous situations. If a GFAC fails somehow, there needs to be a 
mechanism in place to handle the particular emails that GFAC is handed. We 
still have to decide on the correct implementation before the code can be 
implemented. I've been also working on the Thrift/RabbitMQ scenario, where data 
is parsed, serialized, and then sent over the network. I will upload the code 
by today/tomorrow.

 

 

SHOUT OUT @Marcus !





Monitoring System Diagram

2017-07-24 Thread Apoorv Palkar
K Dev,




I have attached a architecture diagram for the monitoring system. Currently the 
challenges we are facing is that GFAC is heavily tied to the monitoring system 
via task execution. The ultimate goal to separate this from the current GFAC. I 
understand Marlon doesn't want me looking at the code too much to avoid bias. I 
have glanced at some specifics/couple lines to get an idea of how monitoring is 
currently implemented in Airavata.


Previously, I had been working on the parsing of the particular emails: PBS, 
slurs, UGA, etc. Over the weekend, I ran some performance matrix tests on the 
current parsing code as Shameera suggested. The current code written is quite 
balanced in terms of large scale processing. It is able to quickly parse the 
emails and still maintain a high degree of simplicity. I improved on a couple 
lines without using regex, however the code proved to be highly unmaintainable. 
As shameera/marlon pointed out, these emails change relatively frequently as 
servers/machines are upgraded/replaced. It is important for this code to be 
highly maintainable.


In addition to this, I have been working with Supun to develop a new 
architecture for the mailing list. At first, there was a debate on whether to 
use Zookeeper and/or Redis in a global state. I conducted some research to 
identify the pros and cons of each technology. As Suresh/Gourav suggested, 
airaveata currently uses zookeeper. Also, zookeeper would provide less overhead 
than a database such as Regis. A big problem with this development strategy is 
the complexity of the code we will have to write. In the scenario of multiple 
GFaCs, a global zookeeper makes some sense. However the problem comes if a job 
is cancelled. This can potentially cause edge case scenario problems where say 
GFaC A accidentally processes GFAC B's emails. Therefore, we have to imagine on 
a low level, a clever implementation of locks for who needs to access data and 
who doesn't. This can prove to be a hassle.




Another potential solution we can have is to implement a work queue similar to 
our job submission in Airavata. The work queue delegates the work of 
parsing/reading emails to multiple gfacs. This potentially could avoid 
lock/thread/dangerous situations. If a GFAC fails somehow, there needs to be a 
mechanism in place to handle the particular emails that GFAC is handed. We 
still have to decide on the correct implementation before the code can be 
implemented. I've been also working on the Thrift/RabbitMQ scenario, where data 
is parsed, serialized, and then sent over the network. I will upload the code 
by today/tomorrow.




SHOUT OUT @Marcus !


Using Redis or Zookeeper for Email Monitoring

2017-07-19 Thread Apoorv Palkar
Hey Dev,


I'm working on the email monitoring system for Airavata. Currently I'm trying 
to solve the problem of done emails coming before start emails and making the 
system scalable. Today, we have only one GFaC that handles the email monitoring 
system. As we are moving towards a microservices approach from the monolithic 
code we have, this email monitoring system also needs to adapt to these 
changes. Currently, in the gfac code, a concurrent hashmap is kept to keep 
track of start/end of emails using their respective experiment ID's. Instead of 
keeping the hashmap locally, we should keep it in a global state so in the 
future multiple GFaCs can handle the map. Supun has suggested to use Zookeeper 
for this as it has high avaliability and realibility. I was also thinking that 
since these experiment IDs are a key value pair, Redis would be a good option 
for such a use case. What do you guys think about each one. I understand 
airavata currently uses zookeeper, so development wise there seems to be an 
edge toward it. Would Redis be a good for such use case?




-- shoutout Marcus.


Re: Helix + Mailing System

2017-07-17 Thread Apoorv Palkar
My B. Code didn't save properly onto Netbeans/JitHub. Let me update link.



-Original Message-
From: Apoorv Palkar 
To: dev 
Sent: Mon, Jul 17, 2017 11:31 am
Subject: Helix + Mailing System


Hey Dev,


For the past 3-3.5 weeks, I've been investigating the use of Helix in Airavata 
and been working on the email monitoring problem. I went through the 
Curator/Zookeeper code to test out the internal workings of Helix. A particular 
question I had was, what is the difference between external view and current 
state? I understood that helix uses the resource model to maintain both the 
ideal state and current state. Why is it necessary to have an external view? In 
addition to this, what is the purpose of a spectator node. In the 
documentation, it states that a "spectator" reacts to changes in a distributed 
system. Why have the particular node have limited abilities when you can give 
it full access? These questions may be highly important to consider when 
writing the Helix paper for submission. As for the mailing/monitoring system, I 
have decided to move forward with the JavaMail API + IMAP implementation. I 
used the gw155j...@scigap.org (gmail) address as a basis for running my test 
code. For this particular use case, I didn't use the Gmail API because it had 
limited capabilities in terms of function/library uses. I played around with 
the Gmail API, however, I was unsuccessful in getting it to work in a clean and 
efficient manner. As such, I decided to use the JavaMail api provided via 
imported libraries. IMAP was considered because it had greater capabilities 
than POP3. POP3 was inefficient when fetching the emails. In terms of first 
reading the emails, the first challenge was to set up the code correctly to 
read from Gmail. Previously the issue was that the emails were being read every 
time the read() function was called in the Inbox class. This meant that every 
message would be pulled even if one email was unread. This proved to be highly 
time costly as the scigap email address has 1+ emails at any given time. I 
set up boolean flags for email addresses that were read and ones that were 
unread. As a result, all messages don't have to be pulled; only the ones with a 
"false" flag need to be read. These messages were pulled and then put into a 
Message[] array. This array was then compared using lambda expression as 
JavaMail retrieves the most current message last. After these messages are put 
into the array and dealt with, the messages are marked as "read" to avoid 
reading them again. Currently, I'm working on improving the implementations of 
all four email parsers. It is highly important to make sure these parsers run 
effeciently as many emails would be read. I didn't want to use regex as it is 
slightly slower than string operations. For my demo code, I have currently used 
string operations to parse the subject title/content. In reality, an array or 
StringBuilder class shoulder be used when implemented professionally to improve 
on speed. Currently, I'm refactoring the PBS code to run a bit more optimally 
and run test cases for the other two email types. Below is a link for the gmail 
implementation + SLURM interpreter. Basically the idea is to have 4 classes 
that handle each type and then proceed to parse the messages from the Message[] 
array. The idea is to then take this COMMON data collected such as job_id, 
name, status, time and then put it into a thrift data model file. Using this 
thrift, then create a java thrift object to send over a AMPQ message queue, 
RabbitMQ, to then potentially be used in a MySQL/SQL database. As of now, the 
database part is not clear, but it would most likely a registery that needs to 
be updated via use of Java JPA libary/SQL queries. 


https://github.com/chessman179/gmailtestinged  <<<<<<<<<<<<< 
code.





** big shout out to Marcus --



Helix + Mailing System

2017-07-17 Thread Apoorv Palkar
Hey Dev,


For the past 3-3.5 weeks, I've been investigating the use of Helix in Airavata 
and been working on the email monitoring problem. I went through the 
Curator/Zookeeper code to test out the internal workings of Helix. A particular 
question I had was, what is the difference between external view and current 
state? I understood that helix uses the resource model to maintain both the 
ideal state and current state. Why is it necessary to have an external view? In 
addition to this, what is the purpose of a spectator node. In the 
documentation, it states that a "spectator" reacts to changes in a distributed 
system. Why have the particular node have limited abilities when you can give 
it full access? These questions may be highly important to consider when 
writing the Helix paper for submission. As for the mailing/monitoring system, I 
have decided to move forward with the JavaMail API + IMAP implementation. I 
used the gw155j...@scigap.org (gmail) address as a basis for running my test 
code. For this particular use case, I didn't use the Gmail API because it had 
limited capabilities in terms of function/library uses. I played around with 
the Gmail API, however, I was unsuccessful in getting it to work in a clean and 
efficient manner. As such, I decided to use the JavaMail api provided via 
imported libraries. IMAP was considered because it had greater capabilities 
than POP3. POP3 was inefficient when fetching the emails. In terms of first 
reading the emails, the first challenge was to set up the code correctly to 
read from Gmail. Previously the issue was that the emails were being read every 
time the read() function was called in the Inbox class. This meant that every 
message would be pulled even if one email was unread. This proved to be highly 
time costly as the scigap email address has 1+ emails at any given time. I 
set up boolean flags for email addresses that were read and ones that were 
unread. As a result, all messages don't have to be pulled; only the ones with a 
"false" flag need to be read. These messages were pulled and then put into a 
Message[] array. This array was then compared using lambda expression as 
JavaMail retrieves the most current message last. After these messages are put 
into the array and dealt with, the messages are marked as "read" to avoid 
reading them again. Currently, I'm working on improving the implementations of 
all four email parsers. It is highly important to make sure these parsers run 
effeciently as many emails would be read. I didn't want to use regex as it is 
slightly slower than string operations. For my demo code, I have currently used 
string operations to parse the subject title/content. In reality, an array or 
StringBuilder class shoulder be used when implemented professionally to improve 
on speed. Currently, I'm refactoring the PBS code to run a bit more optimally 
and run test cases for the other two email types. Below is a link for the gmail 
implementation + SLURM interpreter. Basically the idea is to have 4 classes 
that handle each type and then proceed to parse the messages from the Message[] 
array. The idea is to then take this COMMON data collected such as job_id, 
name, status, time and then put it into a thrift data model file. Using this 
thrift, then create a java thrift object to send over a AMPQ message queue, 
RabbitMQ, to then potentially be used in a MySQL/SQL database. As of now, the 
database part is not clear, but it would most likely a registery that needs to 
be updated via use of Java JPA libary/SQL queries. 


https://github.com/chessman179/gmailtestinged  < 
code.





** big shout out to Marcus --


Mailing System

2017-07-05 Thread Apoorv Palkar
Hey Dev,


Anyone who worked on the Mailing system for the gmail emails, what were some 
problems with the Gmail API versus the IMAP implementation. I'm currently 
tasked on building this system and I would like some inputs from previous 
developers who worked on it.




Thanks.


Storm Mock Services

2017-06-20 Thread Apoorv Palkar

Hey Dev,


I've been working on the mock services examples for Apache Storm. The three 
mock services I looked into were: file transfer, job submission, and 
replication. I got all three to work successfully with each service in its own 
dummy class. Though the program technically executed, it doesn't mean that 
Storm should be a correct fit with Airavata. A problem that I see with Storm is 
that it is fundamentally built to process a stream of data tuples in real time. 
As such, its internal function nextTuple() is being called indefinitely during 
the execution of the topology. Judging from the actual Airavata tasks, it seems 
that we won't receive a continuous stream of data input. This is a major 
drawback of using Storm. There needs to be extra implementation that is added 
to fit the Airavata use case. We should aim to fit Storm in as cleanly as 
possible. This seems to be a small con. Another problem I'm seeming to have 
keeping track of DAG's that consist of internal DAGs. Currently, we can run a 
complex/simple DAG in Storm as long as it isn't layered. As Suresh mentioned, 
if we want to have a multiple layers of DAGs, then we might have to implement 
this functionality ourselves. Lastly, there isn't an internal mechanism for 
handling errors. Storm handles basic error cases, but we have to implement more 
complicated methods for our use case. Other than these three aspects, Storm 
seems to be doing a good job of execution. We will try to pick between Storm 
and Helix by the end of this week. You can see via my previous post on Github 
the execution code of Storm.


* shout out @Marcus for explaining generics/enum use case in Java.







Re: Apache Flink Execution

2017-06-12 Thread Apoorv Palkar
continuous streaming data and therefore the 
fight is for offering low latency processing; these might not necessarily be 
that important for the Airavata use-case (tasks may take time to complete).
 
Thanks and Regards,
Gourav Shenoy
 

From: "Pierce, Marlon" 
Reply-To: 
Date: Wednesday, May 24, 2017 at 11:36 AM
To: "dev@airavata.apache.org" 
Subject: Re: Apache Flink Execution

 

Thanks, Apoorv.  Note for everyone else: request access if you’d like to leave 
a comment or make a suggestion.
 
Marlon
 

From: Apoorv Palkar 
Reply-To: "dev@airavata.apache.org" 
Date: Wednesday, May 24, 2017 at 11:32 AM
To: "dev@airavata.apache.org" 
Subject: Apache Flink Execution

 

https://docs.google.com/document/d/1GDh8kEbAXVY9Gv1mmFvq__zLN_JP6m2_KbfN-9C0uO0/edit?usp=sharing

 

LINK for Flink Use/fundamental




Storm code example

2017-06-09 Thread Apoorv Palkar
here is teh storm demo using math topologies::


https://github.com/hista25/storm-example









Re: Using Docker images to run Thrift

2017-06-07 Thread Apoorv Palkar
+1, IDK what thrift and/or docker are(gourav explained me), shoutout marcus for 
being smart.



-Original Message-
From: Christie, Marcus Aaron 
To: dev 
Sent: Wed, Jun 7, 2017 3:02 pm
Subject: Using Docker images to run Thrift


Dev,


After running into difficulties getting Thrift to build on my laptop I started 
exploring the possibility of using Docker images to run Thrift.  I’ve created a 
pull request of my changes here: https://github.com/apache/airavata/pull/112


One question: I opted to just switch the scripts to using Docker, but I thought 
perhaps that could be a command line flag whether to use Docker or not.  My 
hope is that using Docker images to run Thrift will be a lot more convenient 
than requiring developers to install Thrift.


Your feedback is welcome.


Thanks,


Marcus



log4j2.x

2017-06-06 Thread Apoorv Palkar
Hey Dev,


I posted on the Storm user list about log4j + storm use. Anyone here familiar 
with log4j2.x use? in particular related to the output control using a xml file?


thanks


XML tips

2017-06-02 Thread Apoorv Palkar
Hey Dev,


I've completed my spouts/bolts for the Storm demo of the distributed workflow 
manager. I'm now putting together the pieces ( ie creating the mavin project, 
editing the xml, adding dependencies, getting config file to work.) Is anybody 
very familiar with XML? If so, what resources should I use?


Thanks


wireless/openwhisk

2017-05-31 Thread Apoorv Palkar
https://docs.google.com/document/d/1xF_p4FUEK0rXJxC2i_fiBtTydAL5rcKD74PuSV9oyTo/edit?usp=sharing




>From my analysis, I think Storm && Flink are better than spark/openwhisk. So i 
>created an analysis for openwhisk above. The documentation is not as good as 
>the technology is new. Could serve as a problem for actual 
>prototyping/development.


thanks.


OpenWhisk

2017-05-30 Thread Apoorv Palkar
Hey dev,


Suresh has told me to also look into OpenWhisk this week as a potential 
candidate in addition to Spark, Storm, and Flink. I will be writing a 1-2 page 
detailed report on its inner workings/ relevant use case/ potential uses in 
Airavata. Does anybody have any background in serverless architecture or 
framework as a service?






Re: Spark context diagrams

2017-05-26 Thread Apoorv Palkar
ok will do.



-Original Message-
From: Pamidighantam, Sudhakar V 
To: dev 
Sent: Fri, May 26, 2017 11:36 am
Subject: Re: Spark context diagrams


Apoorv:


Can you create these diagrams with creately or some software and annotate them 
better. 


It is a bit difficult for old eyes to read them..


Thanks,
Sudhakar.



On May 26, 2017, at 11:25 AM, Apoorv Palkar  wrote:


Hey I've been working on teh spark details and posted 2 diagrams on google docs 
in link below. Hopefully i can with the grove and have it be working with/as 
the potential orchestrator.






https://docs.google.com/document/d/1kjIBC0ianDVJlSuPs8FanCTO8ili1VETA5xKeFqo1gY/edit?usp=sharing












Spark context diagrams

2017-05-26 Thread Apoorv Palkar
Hey I've been working on teh spark details and posted 2 diagrams on google docs 
in link below. Hopefully i can with the grove and have it be working with/as 
the potential orchestrator.






https://docs.google.com/document/d/1kjIBC0ianDVJlSuPs8FanCTO8ili1VETA5xKeFqo1gY/edit?usp=sharing








Dissecting Apache Spark

2017-05-24 Thread Apoorv Palkar
Hey Dev,


Now I'm diving deep into the internals(code implementation) of Spark after 
looking @ spark, storm, flink. as suresh pointed out, i want to see which is 
easier to take apart piece by piece for use in Airavata. Also, does anybody 
have experience on Apache Falcon ? If anyone has used it, I'd b interested in 
discussing its use case. I am now working on a report to identify key aspects 
of the core engine for use. I'll try to include as many specifics in the report 
as possible. 


As always any suggestions are nice. kk


Thanks.


Apache Flink Execution

2017-05-24 Thread Apoorv Palkar
https://docs.google.com/document/d/1GDh8kEbAXVY9Gv1mmFvq__zLN_JP6m2_KbfN-9C0uO0/edit?usp=sharing


LINK for Flink Use/fundamental


Storm + Spark Analysis

2017-05-23 Thread Apoorv Palkar
https://docs.google.com/document/d/1ZyybQg3UoxTXP23lKtMw0lX3Zo3_30Bz7biNoATRyLY/edit?usp=sharing




STORM ABOVE




https://docs.google.com/document/d/1ekUE-nderDkt4-CG6ILnF9JJHBPU6k1Fe3kyzxyVAiM/edit?usp=sharing





SPARK ABOVE


Storm Analysis

2017-05-22 Thread Apoorv Palkar
Uploaded basics of Apache Storm/ Uses for Airavata. Will now work on Flink and 
write similiar 1 pg paper on how it works/ pros/cons. 






Untitleddocument.docx
Description: MS-Word 2007 document


Storm Flink.

2017-05-22 Thread Apoorv Palkar
Currently writing a report for Storm and its potential uses in Airavata. Will 
finish by today. Has anybody used Apache Flink prior ? 


Recommendations on Flink would be nice.


Thanks,


Apoorv Palkar


Attached paper

2017-05-22 Thread Apoorv Palkar
I have attached spark use case paper. I can go into more details, what is 
required in terms of the paper? Should I also write a comparison to Storm? I 
went over the Storm architecture/functionalities over weekend, so this can be 
done quickly. 

paper.docx
Description: MS-Word 2007 document


[GSoC Plan of Attack] Choosing Apache Spark

2017-05-17 Thread Apoorv Palkar
Hey Dev,


I have started my GSoC here @ Indiana University. I have chosen to investigate 
Spark over Storm/Flink for our distributed model. This is because Storm/Flink 
are generally more better suited for live event streaming. We are analyzing the 
batch processing case first and then potentially considering live streaming. 
Spark is best suited for this because it allows for batch processing through 
the core engine and live processing through the Spark Streaming library. Over 
the past 4 days I configured the Spark standalone cluster manager to work with 
worker node virtual machines on AWS EC2. As Amazon was paid, we have decided to 
switch to the JetStream/OpenStack API. As of now, I am using Spark Standalone 
for the cluster manager between the core engine and workers. In addition to 
this, I'm investigating the use of Mesos/Yarn via Hadoop for future Airavata 
cluster managers.


Any suggestions would be good.


Apoorv Palkar


Docker and AWS

2017-04-05 Thread Apoorv Palkar
What aspects of the Airavata project are using Docker and AWS? I'm interested 
in these technologies.

Re: [GSoC] Due Date

2017-04-02 Thread Apoorv Palkar
Ok so 1 particular issue I have is adding additional solutions. I am familiar 
with the solution/improvements posted currently. In addition to take Gourav's 
advice, I have started exploring the use of new technology such as Kafka, akka, 
cassandra, mongodb, and kubernetes. I have a preliminary model, but I'm still 
working out the details in full. Should I include this as part of the GSoC? 
There is a very good chance I may end up implementing these technologies, but 
they are not mentioned in detail in my GSoC.



-Original Message-
From: Suresh Marru 
To: Airavata Dev 
Sent: Sun, Apr 2, 2017 5:31 pm
Subject: Re: [GSoC] Due Date


On Apr 2, 2017, at 6:22 PM, Apoorv Palkar  wrote:



Also who else should we ask for input about our proposal? How do we know its 
adequate for GSoC standards in terms of explanation, diagrams, etc?




You will know if its adequate or otherwise based on your acceptance on May 4th 
:)


You are asking in the right place (dev list). The application should be clear 
in terms of your goals and how you plan to accomplish. Diagrams and other 
explanations help but are not required. At this point, I suggest you make sure 
to catch up with this thread and double check if you addressed any issues 
raised in this discussion - http://markmail.org/thread/wfbvewfb6gmlsgmf


Gourav recently posted a possible proposal ideas, so you may want to sync up 
with him (on this list). 


Irrespective of the feedback you should make sure to submit the application 
well before the deadline. Neither we nor google cannot do anything beyond 12 pm 
Eastern Time tomorrow (April 3rd).


Suresh





-Original Message-
From: Suresh Marru 
To: Airavata Dev 
Sent: Sun, Apr 2, 2017 5:20 pm
Subject: Re: [GSoC] Due Date


You should follow the official deadlines  - https://summerofcode.withgoogle.com/


Note that it will require your student status verification and so on, so I 
suggest finish the application early (today) and keep modifying it instead of 
waiting for the deadlines.


Suresh



On Apr 2, 2017, at 6:18 PM, Apoorv Palkar  wrote:


When is the GSoC proposal due? What other materials do we need?


Thanks











Re: [GSoC] Due Date

2017-04-02 Thread Apoorv Palkar
Also who else should we ask for input about our proposal? How do we know its 
adequate for GSoC standards in terms of explanation, diagrams, etc?



-Original Message-
From: Suresh Marru 
To: Airavata Dev 
Sent: Sun, Apr 2, 2017 5:20 pm
Subject: Re: [GSoC] Due Date


You should follow the official deadlines  - https://summerofcode.withgoogle.com/


Note that it will require your student status verification and so on, so I 
suggest finish the application early (today) and keep modifying it instead of 
waiting for the deadlines.


Suresh



On Apr 2, 2017, at 6:18 PM, Apoorv Palkar  wrote:


When is the GSoC proposal due? What other materials do we need?


Thanks







[GSoC] Due Date

2017-04-02 Thread Apoorv Palkar
When is the GSoC proposal due? What other materials do we need?


Thanks


Re: [GSoC] Rough-Draft Propsal; Want Feedback

2017-04-02 Thread Apoorv Palkar

Is it detailed enough in your opinion ? I wanted to add an extra solution, but 
I'm not 100% familiar with it as I'm still working out the details. Should I 
include it? It has some technologies that I'm not completely familiar with so I 
didn't want to add it. I followed your two suggestions and I will be adding a 
"HackIllinois" section. Is there more you recommend? Thanks 



-Original Message-
From: Suresh Marru 
To: Airavata Dev 
Sent: Sun, Apr 2, 2017 12:47 pm
Subject: Re: [GSoC] Rough-Draft Propsal; Want Feedback


Hi Apoorv,


This is good. I added couple of suggestions on the google doc. Since you were 
introduced to Airavata at HackIllinois and already spent some time with all of 
there, do mention your hackethon experiences on the application, that should be 
a plus.


Suresh



On Apr 2, 2017, at 11:24 AM, Apoorv Palkar  wrote:


Here is my proposal:


https://docs.google.com/document/d/1NcsEAUPOUtggtscmhNeUDTHpGFqmDiXvn40B2tf0G4k/edit?usp=sharing




I'd like some feedback regarding what should be fixed and what should be added. 
'




Thanks







[GSoC] Rough-Draft Propsal; Want Feedback

2017-04-02 Thread Apoorv Palkar
Here is my proposal:


https://docs.google.com/document/d/1NcsEAUPOUtggtscmhNeUDTHpGFqmDiXvn40B2tf0G4k/edit?usp=sharing




I'd like some feedback regarding what should be fixed and what should be added. 
'




Thanks


[GSoC] Adding Workflow Parallel to Proposal

2017-04-01 Thread Apoorv Palkar

Do you think it's doable to deliver code for making certain jobs run in 
parallel and work on the workflow editor in addition to completing the 
distributed workload management? Or would this be too ambigious of a goal to 
complete in a summer? Thanks






Re: [GSoC] Number of Deliverables

2017-03-29 Thread Apoorv Palkar
If you promise to do things a certain way, but you find a better solution when 
actually working the project, can you implement new ideas and scrap old ones?



-Original Message-
From: Supun Nakandala 
To: dev 
Sent: Wed, Mar 29, 2017 4:43 pm
Subject: Re: [GSoC] Number of Deliverables



Hi Apoorv,


As a sample project proposal, I would recommend you to refer this.


>From my experiences as a past GSoC student, I think having specific and 
>challenging goals should make your proposal more attractive and increase the 
>chances of getting accepted.


However trying to come up with a set of goals which you think is overly 
unrealistic will also hinder your success later because you will be judged 
based on what you promise(proposal). It is completely ok to not being able to 
achieve what you promise. But you will have to prove that you put significant 
effort in achieving your goals (which can get tricky).


-Supun



On Wed, Mar 29, 2017 at 5:23 PM, Apoorv Palkar  wrote:

How many goals should we aim to put in our proposal? Is it better to put in 
small goals and over-deliver?


Thanks 






-- 

Thank you
Supun Nakandala
Dept. Computer Science and Engineering
University of Moratuwa





[GSoC] Number of Deliverables

2017-03-29 Thread Apoorv Palkar
How many goals should we aim to put in our proposal? Is it better to put in 
small goals and over-deliver?


Thanks 


Re: [GSoC] Proposal Topics

2017-03-28 Thread Apoorv Palkar
I haven't used Kafka or Cassandra. I would be interested in developing a 
solution using these technologies to avoid redundancies.



-Original Message-
From: Shenoy, Gourav Ganesh 
To: dev 
Sent: Mon, Mar 27, 2017 7:52 pm
Subject: [GSoC] Proposal Topics



Hello dev,
 
I am interested in participating for GSoC this season. There are a couple of 
topics in my mind which could be good proposals.
 
1.  Distributed Task Execution (Workload Management) for Apache Airavata
·Apoorv has already shown interest in this, and has a fair idea of the 
problem.
·I have been working on building a prototype to solve this problem, as 
part of Science Gateways course [see:https://goo.gl/CZcIIn]
·There are other possible approach(s), like using Akka, Cassandra, 
Kafka [see:https://youtu.be/s3GfXTnzG_Y]
 
2.  Workflow Editor/Builder for Apache Airavata
·Ajinkya had started on this topic, and I can use his inputs.
·The idea is to allow modelling multiple Airavata job submissions into 
a workflow, using tools such as CWL (Common Workflow Language).
·In addition, to integrate a workflow editor UI with the processing 
logic, and manage dependencies (whether 2 jobs can be run in parallel v/s 
waiting for one to complete since it depends on output of another).
 
I would love to hear from you all on any suggestions, inclusions to make.
 
Thanks and Regards,
Gourav Shenoy




[GSoC] Possible Solution for Question 2 of Workload.

2017-03-27 Thread Apoorv Palkar

https://github.com/airavata-courses/spring17-workload-management/wiki/%5BFinal%5D-Centralized-architecture-for-workload-management

So from the design presented in this link: "
How do we upgrade a worker, say with a new task ‘E’ implementation, in such a 
manner that if something goes wrong with code for ‘E’, the entire worker node 
should not fail? In short, avoid regression testing the entire worker module."



I was thinking that we can create a queue in the worker class. It can keep 
track of which jobs are entering, which are being processed currently, which 
have failed, and which are finished. Once the job is finished, we don't have to 
report to the scheduler. If the job does fail, we can tell the scheduler to put 
it back in queue. However, another issue that can arise is that if that 
particular machine is the only one that does that one type of job, it can keep 
looping in a circle. For that solution, i'm thinking some sort of unique key 
for every job. What am i missing and any recommendations?




Re: [GSoC] Distributed Workload Management.

2017-03-27 Thread Apoorv Palkar
https://github.com/airavata-courses/spring17-workload-management/wiki/%5BFinal%5D-Centralized-architecture-for-workload-management


The above is the link to the data. I was told during HackIllinois by Gourav 
that the workers should pull from the queue rather than the scheduler pushing 
to the workers. I worked on implementing the Serf and gossip protocol, but I 
ran out of time. This is something I'm looking to do as part of my GSoC project.



-Original Message-
From: Shameera Rathnayaka 
To: dev 
Sent: Sun, Mar 26, 2017 10:18 pm
Subject: Re: [GSoC] Distributed Workload Management.



Hi Apoorv, 


Why do you want to remove all RabbitMQ implementation and replaced it with 
gossip protocol? 
Where did you find the article you mentioned above? (any web link?)


Regards, 
Shameera.


On Sat, Mar 25, 2017 at 4:21 PM Apoorv Palkar  wrote:

I am creating a GSoC proposal for the distributed workload management part of 
Airavata. I read the article titled "A Centralized, Apache Mesos Inspired 
Architecture". I was wondering if anybody had actually coded the idea proposed 
in the article? It seems very interesting. In addition to this, I wanted to 
remove all of the RabbitMQ implementation and replace it with the gossip 
protocol via Serf.


Thanks,
Apoorv Palkar

-- 


Shameera Rathnayaka




[GSoC] Distributed Workload Management.

2017-03-25 Thread Apoorv Palkar
I am creating a GSoC proposal for the distributed workload management part of 
Airavata. I read the article titled "A Centralized, Apache Mesos Inspired 
Architecture". I was wondering if anybody had actually coded the idea proposed 
in the article? It seems very interesting. In addition to this, I wanted to 
remove all of the RabbitMQ implementation and replace it with the gossip 
protocol via Serf.


Thanks,
Apoorv Palkar


Interested in GSoC competition

2017-03-01 Thread Apoorv Palkar
Dear Developers,


I was interested in working on Apache Airavata for the GSoC competition. I am 
looking for a project. During HackIllinois 2017, I worked on/helped build a 
prototype distributed workload system with my teammates from the University of 
Illinois, Urbana-Champaign. Where do I go from here?


Thank you,


Apoorv Palkar
apoor...@illinois.edu
(925) 849-7847