Re: Implementation of JPA
After talking to marcus/marlon, I've decided to go with just the JDBC implementation. I just need the ability to read/write to the MySQL DB. I didn't want to spend too much time learning the material required for JPA. Palkar. --shoutout Marcus -Original Message- From: Shenoy, Gourav Ganesh To: dev Sent: Tue, Aug 1, 2017 10:31 am Subject: Re: Implementation of JPA Hi Apoorv, Well it’s difficult to say which one is absolutely better over the other. But yes, generally Hibernate is considered to be more optimized for Persistence/Retrieval on average for large number of entities. Hibernate also offers more utility methods which at times simplifies the extra code you would have to write in OpenJPA. But I have used OpenJPA for a long enough time, and once you get beyond learning the functionalities you realize that it’s easier to deal with a minimal set of annotations in OpenJPA; whereas Hibernate has some extra wrapper annotations. An important consideration is compatibility – OpenJPA annotations are certain to work with most JPA implementations, but not vice-versa. This plays a big role when you want to switch your JPA implementations (generally does not happen). Having said that, Hibernate has way more documentation and helpful sources online – if you’re facing any issues, etc. Thanks and Regards, Gourav Shenoy From: Apoorv Palkar Reply-To: "dev@airavata.apache.org" Date: Monday, July 31, 2017 at 10:03 AM To: "dev@airavata.apache.org" Subject: Implementation of JPA Dear Dev, I'm currently developing the code for the registry to be used for the monitoring system in Airavata. I'm looking at the pros/cons of each JPA implementation and was wondering if anyone has any recommendations. I'm choosing between Hibernate, OpenJPA, and EclipseLink. I understand Hibernate is the most mature, widely used technology. I was trying to determine Hibernate's cons. Does anybody have previous knowledge about Hibernate ? My use case for the database(most likely MySQL DB) is to read/write/store data about experiment ID, name, and statuses. Thanks, A. Palkar --shoutout marcus
Implementation of JPA
Dear Dev, I'm currently developing the code for the registry to be used for the monitoring system in Airavata. I'm looking at the pros/cons of each JPA implementation and was wondering if anyone has any recommendations. I'm choosing between Hibernate, OpenJPA, and EclipseLink. I understand Hibernate is the most mature, widely used technology. I was trying to determine Hibernate's cons. Does anybody have previous knowledge about Hibernate ? My use case for the database(most likely MySQL DB) is to read/write/store data about experiment ID, name, and statuses. Thanks, A. Palkar --shoutout marcus
Re: Monitoring System Diagram
Yes Gourav, Currently, I'm working on making the monitoring aspect of Airavata as independent as possible. I'm approaching the problem as if it doesn't matter whether the current architecture is being used or Helix is being used. After coming up with a good design and implementation code, we can proceed to see how to connect the pieces. From there we can probably split the DAG into two parts instead of having 1 DAG with the monitoring system. - A Palkar. shout out Marcus -- -Original Message- From: Shenoy, Gourav Ganesh To: dev Sent: Tue, Jul 25, 2017 11:04 am Subject: Re: Monitoring System Diagram Apoorv, Good work with the architecture. But I think the "request to monitor via helix mechanism" and "output to helix orchestration" are very crucial pieces here – which need to be detailed. Rest of it is kind of trivial, but what is important is how this system blends in to the Helix design. Do you think you can add more details to the above statements? Once again, you are heading in the right direction – good job! Thanks and Regards, Gourav Shenoy From: Apoorv Palkar Reply-To: "dev@airavata.apache.org" Date: Monday, July 24, 2017 at 1:14 PM To: "dev@airavata.apache.org" Subject: Monitoring System Diagram K Dev, I have attached a architecture diagram for the monitoring system. Currently the challenges we are facing is that GFAC is heavily tied to the monitoring system via task execution. The ultimate goal to separate this from the current GFAC. I understand Marlon doesn't want me looking at the code too much to avoid bias. I have glanced at some specifics/couple lines to get an idea of how monitoring is currently implemented in Airavata. Previously, I had been working on the parsing of the particular emails: PBS, slurs, UGA, etc. Over the weekend, I ran some performance matrix tests on the current parsing code as Shameera suggested. The current code written is quite balanced in terms of large scale processing. It is able to quickly parse the emails and still maintain a high degree of simplicity. I improved on a couple lines without using regex, however the code proved to be highly unmaintainable. As shameera/marlon pointed out, these emails change relatively frequently as servers/machines are upgraded/replaced. It is important for this code to be highly maintainable. In addition to this, I have been working with Supun to develop a new architecture for the mailing list. At first, there was a debate on whether to use Zookeeper and/or Redis in a global state. I conducted some research to identify the pros and cons of each technology. As Suresh/Gourav suggested, airaveata currently uses zookeeper. Also, zookeeper would provide less overhead than a database such as Regis. A big problem with this development strategy is the complexity of the code we will have to write. In the scenario of multiple GFaCs, a global zookeeper makes some sense. However the problem comes if a job is cancelled. This can potentially cause edge case scenario problems where say GFaC A accidentally processes GFAC B's emails. Therefore, we have to imagine on a low level, a clever implementation of locks for who needs to access data and who doesn't. This can prove to be a hassle. Another potential solution we can have is to implement a work queue similar to our job submission in Airavata. The work queue delegates the work of parsing/reading emails to multiple gfacs. This potentially could avoid lock/thread/dangerous situations. If a GFAC fails somehow, there needs to be a mechanism in place to handle the particular emails that GFAC is handed. We still have to decide on the correct implementation before the code can be implemented. I've been also working on the Thrift/RabbitMQ scenario, where data is parsed, serialized, and then sent over the network. I will upload the code by today/tomorrow. SHOUT OUT @Marcus !
Monitoring System Diagram
K Dev, I have attached a architecture diagram for the monitoring system. Currently the challenges we are facing is that GFAC is heavily tied to the monitoring system via task execution. The ultimate goal to separate this from the current GFAC. I understand Marlon doesn't want me looking at the code too much to avoid bias. I have glanced at some specifics/couple lines to get an idea of how monitoring is currently implemented in Airavata. Previously, I had been working on the parsing of the particular emails: PBS, slurs, UGA, etc. Over the weekend, I ran some performance matrix tests on the current parsing code as Shameera suggested. The current code written is quite balanced in terms of large scale processing. It is able to quickly parse the emails and still maintain a high degree of simplicity. I improved on a couple lines without using regex, however the code proved to be highly unmaintainable. As shameera/marlon pointed out, these emails change relatively frequently as servers/machines are upgraded/replaced. It is important for this code to be highly maintainable. In addition to this, I have been working with Supun to develop a new architecture for the mailing list. At first, there was a debate on whether to use Zookeeper and/or Redis in a global state. I conducted some research to identify the pros and cons of each technology. As Suresh/Gourav suggested, airaveata currently uses zookeeper. Also, zookeeper would provide less overhead than a database such as Regis. A big problem with this development strategy is the complexity of the code we will have to write. In the scenario of multiple GFaCs, a global zookeeper makes some sense. However the problem comes if a job is cancelled. This can potentially cause edge case scenario problems where say GFaC A accidentally processes GFAC B's emails. Therefore, we have to imagine on a low level, a clever implementation of locks for who needs to access data and who doesn't. This can prove to be a hassle. Another potential solution we can have is to implement a work queue similar to our job submission in Airavata. The work queue delegates the work of parsing/reading emails to multiple gfacs. This potentially could avoid lock/thread/dangerous situations. If a GFAC fails somehow, there needs to be a mechanism in place to handle the particular emails that GFAC is handed. We still have to decide on the correct implementation before the code can be implemented. I've been also working on the Thrift/RabbitMQ scenario, where data is parsed, serialized, and then sent over the network. I will upload the code by today/tomorrow. SHOUT OUT @Marcus !
Using Redis or Zookeeper for Email Monitoring
Hey Dev, I'm working on the email monitoring system for Airavata. Currently I'm trying to solve the problem of done emails coming before start emails and making the system scalable. Today, we have only one GFaC that handles the email monitoring system. As we are moving towards a microservices approach from the monolithic code we have, this email monitoring system also needs to adapt to these changes. Currently, in the gfac code, a concurrent hashmap is kept to keep track of start/end of emails using their respective experiment ID's. Instead of keeping the hashmap locally, we should keep it in a global state so in the future multiple GFaCs can handle the map. Supun has suggested to use Zookeeper for this as it has high avaliability and realibility. I was also thinking that since these experiment IDs are a key value pair, Redis would be a good option for such a use case. What do you guys think about each one. I understand airavata currently uses zookeeper, so development wise there seems to be an edge toward it. Would Redis be a good for such use case? -- shoutout Marcus.
Re: Helix + Mailing System
My B. Code didn't save properly onto Netbeans/JitHub. Let me update link. -Original Message- From: Apoorv Palkar To: dev Sent: Mon, Jul 17, 2017 11:31 am Subject: Helix + Mailing System Hey Dev, For the past 3-3.5 weeks, I've been investigating the use of Helix in Airavata and been working on the email monitoring problem. I went through the Curator/Zookeeper code to test out the internal workings of Helix. A particular question I had was, what is the difference between external view and current state? I understood that helix uses the resource model to maintain both the ideal state and current state. Why is it necessary to have an external view? In addition to this, what is the purpose of a spectator node. In the documentation, it states that a "spectator" reacts to changes in a distributed system. Why have the particular node have limited abilities when you can give it full access? These questions may be highly important to consider when writing the Helix paper for submission. As for the mailing/monitoring system, I have decided to move forward with the JavaMail API + IMAP implementation. I used the gw155j...@scigap.org (gmail) address as a basis for running my test code. For this particular use case, I didn't use the Gmail API because it had limited capabilities in terms of function/library uses. I played around with the Gmail API, however, I was unsuccessful in getting it to work in a clean and efficient manner. As such, I decided to use the JavaMail api provided via imported libraries. IMAP was considered because it had greater capabilities than POP3. POP3 was inefficient when fetching the emails. In terms of first reading the emails, the first challenge was to set up the code correctly to read from Gmail. Previously the issue was that the emails were being read every time the read() function was called in the Inbox class. This meant that every message would be pulled even if one email was unread. This proved to be highly time costly as the scigap email address has 1+ emails at any given time. I set up boolean flags for email addresses that were read and ones that were unread. As a result, all messages don't have to be pulled; only the ones with a "false" flag need to be read. These messages were pulled and then put into a Message[] array. This array was then compared using lambda expression as JavaMail retrieves the most current message last. After these messages are put into the array and dealt with, the messages are marked as "read" to avoid reading them again. Currently, I'm working on improving the implementations of all four email parsers. It is highly important to make sure these parsers run effeciently as many emails would be read. I didn't want to use regex as it is slightly slower than string operations. For my demo code, I have currently used string operations to parse the subject title/content. In reality, an array or StringBuilder class shoulder be used when implemented professionally to improve on speed. Currently, I'm refactoring the PBS code to run a bit more optimally and run test cases for the other two email types. Below is a link for the gmail implementation + SLURM interpreter. Basically the idea is to have 4 classes that handle each type and then proceed to parse the messages from the Message[] array. The idea is to then take this COMMON data collected such as job_id, name, status, time and then put it into a thrift data model file. Using this thrift, then create a java thrift object to send over a AMPQ message queue, RabbitMQ, to then potentially be used in a MySQL/SQL database. As of now, the database part is not clear, but it would most likely a registery that needs to be updated via use of Java JPA libary/SQL queries. https://github.com/chessman179/gmailtestinged <<<<<<<<<<<<< code. ** big shout out to Marcus --
Helix + Mailing System
Hey Dev, For the past 3-3.5 weeks, I've been investigating the use of Helix in Airavata and been working on the email monitoring problem. I went through the Curator/Zookeeper code to test out the internal workings of Helix. A particular question I had was, what is the difference between external view and current state? I understood that helix uses the resource model to maintain both the ideal state and current state. Why is it necessary to have an external view? In addition to this, what is the purpose of a spectator node. In the documentation, it states that a "spectator" reacts to changes in a distributed system. Why have the particular node have limited abilities when you can give it full access? These questions may be highly important to consider when writing the Helix paper for submission. As for the mailing/monitoring system, I have decided to move forward with the JavaMail API + IMAP implementation. I used the gw155j...@scigap.org (gmail) address as a basis for running my test code. For this particular use case, I didn't use the Gmail API because it had limited capabilities in terms of function/library uses. I played around with the Gmail API, however, I was unsuccessful in getting it to work in a clean and efficient manner. As such, I decided to use the JavaMail api provided via imported libraries. IMAP was considered because it had greater capabilities than POP3. POP3 was inefficient when fetching the emails. In terms of first reading the emails, the first challenge was to set up the code correctly to read from Gmail. Previously the issue was that the emails were being read every time the read() function was called in the Inbox class. This meant that every message would be pulled even if one email was unread. This proved to be highly time costly as the scigap email address has 1+ emails at any given time. I set up boolean flags for email addresses that were read and ones that were unread. As a result, all messages don't have to be pulled; only the ones with a "false" flag need to be read. These messages were pulled and then put into a Message[] array. This array was then compared using lambda expression as JavaMail retrieves the most current message last. After these messages are put into the array and dealt with, the messages are marked as "read" to avoid reading them again. Currently, I'm working on improving the implementations of all four email parsers. It is highly important to make sure these parsers run effeciently as many emails would be read. I didn't want to use regex as it is slightly slower than string operations. For my demo code, I have currently used string operations to parse the subject title/content. In reality, an array or StringBuilder class shoulder be used when implemented professionally to improve on speed. Currently, I'm refactoring the PBS code to run a bit more optimally and run test cases for the other two email types. Below is a link for the gmail implementation + SLURM interpreter. Basically the idea is to have 4 classes that handle each type and then proceed to parse the messages from the Message[] array. The idea is to then take this COMMON data collected such as job_id, name, status, time and then put it into a thrift data model file. Using this thrift, then create a java thrift object to send over a AMPQ message queue, RabbitMQ, to then potentially be used in a MySQL/SQL database. As of now, the database part is not clear, but it would most likely a registery that needs to be updated via use of Java JPA libary/SQL queries. https://github.com/chessman179/gmailtestinged < code. ** big shout out to Marcus --
Mailing System
Hey Dev, Anyone who worked on the Mailing system for the gmail emails, what were some problems with the Gmail API versus the IMAP implementation. I'm currently tasked on building this system and I would like some inputs from previous developers who worked on it. Thanks.
Storm Mock Services
Hey Dev, I've been working on the mock services examples for Apache Storm. The three mock services I looked into were: file transfer, job submission, and replication. I got all three to work successfully with each service in its own dummy class. Though the program technically executed, it doesn't mean that Storm should be a correct fit with Airavata. A problem that I see with Storm is that it is fundamentally built to process a stream of data tuples in real time. As such, its internal function nextTuple() is being called indefinitely during the execution of the topology. Judging from the actual Airavata tasks, it seems that we won't receive a continuous stream of data input. This is a major drawback of using Storm. There needs to be extra implementation that is added to fit the Airavata use case. We should aim to fit Storm in as cleanly as possible. This seems to be a small con. Another problem I'm seeming to have keeping track of DAG's that consist of internal DAGs. Currently, we can run a complex/simple DAG in Storm as long as it isn't layered. As Suresh mentioned, if we want to have a multiple layers of DAGs, then we might have to implement this functionality ourselves. Lastly, there isn't an internal mechanism for handling errors. Storm handles basic error cases, but we have to implement more complicated methods for our use case. Other than these three aspects, Storm seems to be doing a good job of execution. We will try to pick between Storm and Helix by the end of this week. You can see via my previous post on Github the execution code of Storm. * shout out @Marcus for explaining generics/enum use case in Java.
Re: Apache Flink Execution
continuous streaming data and therefore the fight is for offering low latency processing; these might not necessarily be that important for the Airavata use-case (tasks may take time to complete). Thanks and Regards, Gourav Shenoy From: "Pierce, Marlon" Reply-To: Date: Wednesday, May 24, 2017 at 11:36 AM To: "dev@airavata.apache.org" Subject: Re: Apache Flink Execution Thanks, Apoorv. Note for everyone else: request access if you’d like to leave a comment or make a suggestion. Marlon From: Apoorv Palkar Reply-To: "dev@airavata.apache.org" Date: Wednesday, May 24, 2017 at 11:32 AM To: "dev@airavata.apache.org" Subject: Apache Flink Execution https://docs.google.com/document/d/1GDh8kEbAXVY9Gv1mmFvq__zLN_JP6m2_KbfN-9C0uO0/edit?usp=sharing LINK for Flink Use/fundamental
Storm code example
here is teh storm demo using math topologies:: https://github.com/hista25/storm-example
Re: Using Docker images to run Thrift
+1, IDK what thrift and/or docker are(gourav explained me), shoutout marcus for being smart. -Original Message- From: Christie, Marcus Aaron To: dev Sent: Wed, Jun 7, 2017 3:02 pm Subject: Using Docker images to run Thrift Dev, After running into difficulties getting Thrift to build on my laptop I started exploring the possibility of using Docker images to run Thrift. I’ve created a pull request of my changes here: https://github.com/apache/airavata/pull/112 One question: I opted to just switch the scripts to using Docker, but I thought perhaps that could be a command line flag whether to use Docker or not. My hope is that using Docker images to run Thrift will be a lot more convenient than requiring developers to install Thrift. Your feedback is welcome. Thanks, Marcus
log4j2.x
Hey Dev, I posted on the Storm user list about log4j + storm use. Anyone here familiar with log4j2.x use? in particular related to the output control using a xml file? thanks
XML tips
Hey Dev, I've completed my spouts/bolts for the Storm demo of the distributed workflow manager. I'm now putting together the pieces ( ie creating the mavin project, editing the xml, adding dependencies, getting config file to work.) Is anybody very familiar with XML? If so, what resources should I use? Thanks
wireless/openwhisk
https://docs.google.com/document/d/1xF_p4FUEK0rXJxC2i_fiBtTydAL5rcKD74PuSV9oyTo/edit?usp=sharing >From my analysis, I think Storm && Flink are better than spark/openwhisk. So i >created an analysis for openwhisk above. The documentation is not as good as >the technology is new. Could serve as a problem for actual >prototyping/development. thanks.
OpenWhisk
Hey dev, Suresh has told me to also look into OpenWhisk this week as a potential candidate in addition to Spark, Storm, and Flink. I will be writing a 1-2 page detailed report on its inner workings/ relevant use case/ potential uses in Airavata. Does anybody have any background in serverless architecture or framework as a service?
Re: Spark context diagrams
ok will do. -Original Message- From: Pamidighantam, Sudhakar V To: dev Sent: Fri, May 26, 2017 11:36 am Subject: Re: Spark context diagrams Apoorv: Can you create these diagrams with creately or some software and annotate them better. It is a bit difficult for old eyes to read them.. Thanks, Sudhakar. On May 26, 2017, at 11:25 AM, Apoorv Palkar wrote: Hey I've been working on teh spark details and posted 2 diagrams on google docs in link below. Hopefully i can with the grove and have it be working with/as the potential orchestrator. https://docs.google.com/document/d/1kjIBC0ianDVJlSuPs8FanCTO8ili1VETA5xKeFqo1gY/edit?usp=sharing
Spark context diagrams
Hey I've been working on teh spark details and posted 2 diagrams on google docs in link below. Hopefully i can with the grove and have it be working with/as the potential orchestrator. https://docs.google.com/document/d/1kjIBC0ianDVJlSuPs8FanCTO8ili1VETA5xKeFqo1gY/edit?usp=sharing
Dissecting Apache Spark
Hey Dev, Now I'm diving deep into the internals(code implementation) of Spark after looking @ spark, storm, flink. as suresh pointed out, i want to see which is easier to take apart piece by piece for use in Airavata. Also, does anybody have experience on Apache Falcon ? If anyone has used it, I'd b interested in discussing its use case. I am now working on a report to identify key aspects of the core engine for use. I'll try to include as many specifics in the report as possible. As always any suggestions are nice. kk Thanks.
Apache Flink Execution
https://docs.google.com/document/d/1GDh8kEbAXVY9Gv1mmFvq__zLN_JP6m2_KbfN-9C0uO0/edit?usp=sharing LINK for Flink Use/fundamental
Storm + Spark Analysis
https://docs.google.com/document/d/1ZyybQg3UoxTXP23lKtMw0lX3Zo3_30Bz7biNoATRyLY/edit?usp=sharing STORM ABOVE https://docs.google.com/document/d/1ekUE-nderDkt4-CG6ILnF9JJHBPU6k1Fe3kyzxyVAiM/edit?usp=sharing SPARK ABOVE
Storm Analysis
Uploaded basics of Apache Storm/ Uses for Airavata. Will now work on Flink and write similiar 1 pg paper on how it works/ pros/cons. Untitleddocument.docx Description: MS-Word 2007 document
Storm Flink.
Currently writing a report for Storm and its potential uses in Airavata. Will finish by today. Has anybody used Apache Flink prior ? Recommendations on Flink would be nice. Thanks, Apoorv Palkar
Attached paper
I have attached spark use case paper. I can go into more details, what is required in terms of the paper? Should I also write a comparison to Storm? I went over the Storm architecture/functionalities over weekend, so this can be done quickly. paper.docx Description: MS-Word 2007 document
[GSoC Plan of Attack] Choosing Apache Spark
Hey Dev, I have started my GSoC here @ Indiana University. I have chosen to investigate Spark over Storm/Flink for our distributed model. This is because Storm/Flink are generally more better suited for live event streaming. We are analyzing the batch processing case first and then potentially considering live streaming. Spark is best suited for this because it allows for batch processing through the core engine and live processing through the Spark Streaming library. Over the past 4 days I configured the Spark standalone cluster manager to work with worker node virtual machines on AWS EC2. As Amazon was paid, we have decided to switch to the JetStream/OpenStack API. As of now, I am using Spark Standalone for the cluster manager between the core engine and workers. In addition to this, I'm investigating the use of Mesos/Yarn via Hadoop for future Airavata cluster managers. Any suggestions would be good. Apoorv Palkar
Docker and AWS
What aspects of the Airavata project are using Docker and AWS? I'm interested in these technologies.
Re: [GSoC] Due Date
Ok so 1 particular issue I have is adding additional solutions. I am familiar with the solution/improvements posted currently. In addition to take Gourav's advice, I have started exploring the use of new technology such as Kafka, akka, cassandra, mongodb, and kubernetes. I have a preliminary model, but I'm still working out the details in full. Should I include this as part of the GSoC? There is a very good chance I may end up implementing these technologies, but they are not mentioned in detail in my GSoC. -Original Message- From: Suresh Marru To: Airavata Dev Sent: Sun, Apr 2, 2017 5:31 pm Subject: Re: [GSoC] Due Date On Apr 2, 2017, at 6:22 PM, Apoorv Palkar wrote: Also who else should we ask for input about our proposal? How do we know its adequate for GSoC standards in terms of explanation, diagrams, etc? You will know if its adequate or otherwise based on your acceptance on May 4th :) You are asking in the right place (dev list). The application should be clear in terms of your goals and how you plan to accomplish. Diagrams and other explanations help but are not required. At this point, I suggest you make sure to catch up with this thread and double check if you addressed any issues raised in this discussion - http://markmail.org/thread/wfbvewfb6gmlsgmf Gourav recently posted a possible proposal ideas, so you may want to sync up with him (on this list). Irrespective of the feedback you should make sure to submit the application well before the deadline. Neither we nor google cannot do anything beyond 12 pm Eastern Time tomorrow (April 3rd). Suresh -Original Message- From: Suresh Marru To: Airavata Dev Sent: Sun, Apr 2, 2017 5:20 pm Subject: Re: [GSoC] Due Date You should follow the official deadlines - https://summerofcode.withgoogle.com/ Note that it will require your student status verification and so on, so I suggest finish the application early (today) and keep modifying it instead of waiting for the deadlines. Suresh On Apr 2, 2017, at 6:18 PM, Apoorv Palkar wrote: When is the GSoC proposal due? What other materials do we need? Thanks
Re: [GSoC] Due Date
Also who else should we ask for input about our proposal? How do we know its adequate for GSoC standards in terms of explanation, diagrams, etc? -Original Message- From: Suresh Marru To: Airavata Dev Sent: Sun, Apr 2, 2017 5:20 pm Subject: Re: [GSoC] Due Date You should follow the official deadlines - https://summerofcode.withgoogle.com/ Note that it will require your student status verification and so on, so I suggest finish the application early (today) and keep modifying it instead of waiting for the deadlines. Suresh On Apr 2, 2017, at 6:18 PM, Apoorv Palkar wrote: When is the GSoC proposal due? What other materials do we need? Thanks
[GSoC] Due Date
When is the GSoC proposal due? What other materials do we need? Thanks
Re: [GSoC] Rough-Draft Propsal; Want Feedback
Is it detailed enough in your opinion ? I wanted to add an extra solution, but I'm not 100% familiar with it as I'm still working out the details. Should I include it? It has some technologies that I'm not completely familiar with so I didn't want to add it. I followed your two suggestions and I will be adding a "HackIllinois" section. Is there more you recommend? Thanks -Original Message- From: Suresh Marru To: Airavata Dev Sent: Sun, Apr 2, 2017 12:47 pm Subject: Re: [GSoC] Rough-Draft Propsal; Want Feedback Hi Apoorv, This is good. I added couple of suggestions on the google doc. Since you were introduced to Airavata at HackIllinois and already spent some time with all of there, do mention your hackethon experiences on the application, that should be a plus. Suresh On Apr 2, 2017, at 11:24 AM, Apoorv Palkar wrote: Here is my proposal: https://docs.google.com/document/d/1NcsEAUPOUtggtscmhNeUDTHpGFqmDiXvn40B2tf0G4k/edit?usp=sharing I'd like some feedback regarding what should be fixed and what should be added. ' Thanks
[GSoC] Rough-Draft Propsal; Want Feedback
Here is my proposal: https://docs.google.com/document/d/1NcsEAUPOUtggtscmhNeUDTHpGFqmDiXvn40B2tf0G4k/edit?usp=sharing I'd like some feedback regarding what should be fixed and what should be added. ' Thanks
[GSoC] Adding Workflow Parallel to Proposal
Do you think it's doable to deliver code for making certain jobs run in parallel and work on the workflow editor in addition to completing the distributed workload management? Or would this be too ambigious of a goal to complete in a summer? Thanks
Re: [GSoC] Number of Deliverables
If you promise to do things a certain way, but you find a better solution when actually working the project, can you implement new ideas and scrap old ones? -Original Message- From: Supun Nakandala To: dev Sent: Wed, Mar 29, 2017 4:43 pm Subject: Re: [GSoC] Number of Deliverables Hi Apoorv, As a sample project proposal, I would recommend you to refer this. >From my experiences as a past GSoC student, I think having specific and >challenging goals should make your proposal more attractive and increase the >chances of getting accepted. However trying to come up with a set of goals which you think is overly unrealistic will also hinder your success later because you will be judged based on what you promise(proposal). It is completely ok to not being able to achieve what you promise. But you will have to prove that you put significant effort in achieving your goals (which can get tricky). -Supun On Wed, Mar 29, 2017 at 5:23 PM, Apoorv Palkar wrote: How many goals should we aim to put in our proposal? Is it better to put in small goals and over-deliver? Thanks -- Thank you Supun Nakandala Dept. Computer Science and Engineering University of Moratuwa
[GSoC] Number of Deliverables
How many goals should we aim to put in our proposal? Is it better to put in small goals and over-deliver? Thanks
Re: [GSoC] Proposal Topics
I haven't used Kafka or Cassandra. I would be interested in developing a solution using these technologies to avoid redundancies. -Original Message- From: Shenoy, Gourav Ganesh To: dev Sent: Mon, Mar 27, 2017 7:52 pm Subject: [GSoC] Proposal Topics Hello dev, I am interested in participating for GSoC this season. There are a couple of topics in my mind which could be good proposals. 1. Distributed Task Execution (Workload Management) for Apache Airavata ·Apoorv has already shown interest in this, and has a fair idea of the problem. ·I have been working on building a prototype to solve this problem, as part of Science Gateways course [see:https://goo.gl/CZcIIn] ·There are other possible approach(s), like using Akka, Cassandra, Kafka [see:https://youtu.be/s3GfXTnzG_Y] 2. Workflow Editor/Builder for Apache Airavata ·Ajinkya had started on this topic, and I can use his inputs. ·The idea is to allow modelling multiple Airavata job submissions into a workflow, using tools such as CWL (Common Workflow Language). ·In addition, to integrate a workflow editor UI with the processing logic, and manage dependencies (whether 2 jobs can be run in parallel v/s waiting for one to complete since it depends on output of another). I would love to hear from you all on any suggestions, inclusions to make. Thanks and Regards, Gourav Shenoy
[GSoC] Possible Solution for Question 2 of Workload.
https://github.com/airavata-courses/spring17-workload-management/wiki/%5BFinal%5D-Centralized-architecture-for-workload-management So from the design presented in this link: " How do we upgrade a worker, say with a new task ‘E’ implementation, in such a manner that if something goes wrong with code for ‘E’, the entire worker node should not fail? In short, avoid regression testing the entire worker module." I was thinking that we can create a queue in the worker class. It can keep track of which jobs are entering, which are being processed currently, which have failed, and which are finished. Once the job is finished, we don't have to report to the scheduler. If the job does fail, we can tell the scheduler to put it back in queue. However, another issue that can arise is that if that particular machine is the only one that does that one type of job, it can keep looping in a circle. For that solution, i'm thinking some sort of unique key for every job. What am i missing and any recommendations?
Re: [GSoC] Distributed Workload Management.
https://github.com/airavata-courses/spring17-workload-management/wiki/%5BFinal%5D-Centralized-architecture-for-workload-management The above is the link to the data. I was told during HackIllinois by Gourav that the workers should pull from the queue rather than the scheduler pushing to the workers. I worked on implementing the Serf and gossip protocol, but I ran out of time. This is something I'm looking to do as part of my GSoC project. -Original Message- From: Shameera Rathnayaka To: dev Sent: Sun, Mar 26, 2017 10:18 pm Subject: Re: [GSoC] Distributed Workload Management. Hi Apoorv, Why do you want to remove all RabbitMQ implementation and replaced it with gossip protocol? Where did you find the article you mentioned above? (any web link?) Regards, Shameera. On Sat, Mar 25, 2017 at 4:21 PM Apoorv Palkar wrote: I am creating a GSoC proposal for the distributed workload management part of Airavata. I read the article titled "A Centralized, Apache Mesos Inspired Architecture". I was wondering if anybody had actually coded the idea proposed in the article? It seems very interesting. In addition to this, I wanted to remove all of the RabbitMQ implementation and replace it with the gossip protocol via Serf. Thanks, Apoorv Palkar -- Shameera Rathnayaka
[GSoC] Distributed Workload Management.
I am creating a GSoC proposal for the distributed workload management part of Airavata. I read the article titled "A Centralized, Apache Mesos Inspired Architecture". I was wondering if anybody had actually coded the idea proposed in the article? It seems very interesting. In addition to this, I wanted to remove all of the RabbitMQ implementation and replace it with the gossip protocol via Serf. Thanks, Apoorv Palkar
Interested in GSoC competition
Dear Developers, I was interested in working on Apache Airavata for the GSoC competition. I am looking for a project. During HackIllinois 2017, I worked on/helped build a prototype distributed workload system with my teammates from the University of Illinois, Urbana-Champaign. Where do I go from here? Thank you, Apoorv Palkar apoor...@illinois.edu (925) 849-7847