Re: [kepler-users] dataflow using Kepler on Amazon EC2

Jianwu Wang Mon, 28 Mar 2011 14:09:30 -0700

Hi Luqman,

    Please see my comments below.


On 3/25/2011 2:17 PM, Luqman Hodgkinson wrote:




Hi Jianwu,

After playing with Kepler, there are some questions I have.

1. All my Java classes are in the same project. There is only a single Main 
class. In order to use Kepler, must each class be converted to a .jar file 
individually? If so, the disadvantage to this is then that only command-line 
parameters can be passed in. How does the data flow between the classes? Must 
each class read input from files and write input to files? That is, what is the 
nature of your type system for passing data between components. Can data be 
passed directly through RAM or must it go through the file system?

There is only ONE single Main class for all your classes? Do youwant Kepler to invoke the main class or build dataflow to connect yourclasses? In Kepler, actors are elements to be connected in a workflow.So I think you need to have actors to access your classes. One way isbuild actors based on all your classes and connect them. Another way isto build Web services based on your classes and using Kepler Web Serviceactor to connect them. There might be easier ways, but I don't havethem in my mind.

Kepler supports file system based data passing (the input thedownstream actors could be the file url generated in upstream actors)and memory based data passing (data contents are passed from one actorto another via RAM).

2. Provenance is very important for my workflow. The workflow will be run 
multiple times and a large number of versions will be created. These should be 
organized somewhere on the file system with timestamps and descriptions of the 
versions of the workflows that were used. How much support does Kepler have for 
this?

Provenance is supported in Kepler. Please check the ProvenanceDocumentation part at https://kepler-project.org/users/documentation. Inaddition, Workflow Run Manager module in Kepler will show the differentexecutions of the same workflow. You should be able to use them inKepler 2.1:https://kepler-project.org/users/whats-new/reporting-workflow-run-manager-provenance-and-tagging-add-on-module-suites-released-sept-30-2010.

3. Have you seen the new Conveyor paper? 
http://www.ncbi.nlm.nih.gov/pubmed?term=21278189 My requirements are very 
similar to those addressed in this paper. However, the current version of 
Conveyor does not seem very stable: I was even unable to get their graphical 
user interface running from their Java files. What are the capabilities of 
Kepler for this use case?

I didn't check the details of the paper. So I can't answer yourquestion.

Sincerely, with best wishes,
Luqman




On Mar 22, 2011, at 3:51 PM, Jianwu Wang wrote:

Hi Luqman,

    Your target is still not clear to me. Please break it into sub tasks so 
that we can help more efficiently. Or you can try Kepler first before getting 
more specific questions to ask.

    About Kepler workflow execution on EC2, I did some experiments on it and 
don't think it is hard to execute Kepler workflows on EC2.

Best wishes

Sincerely yours

Jianwu Wang
[email protected]
http://users.sdsc.edu/~jianwu/

Assistant Project Scientist
Scientific Workflow Automation Technologies (SWAT) Laboratory
San Diego Supercomputer Center
University of California, San Diego
San Diego, CA, U.S.A.


On 3/21/2011 5:10 PM, Luqman Hodgkinson wrote:



Dear Kepler developers,
I have a collection of Java classes linked by a custom dataflow architecture. 
All classes are in a single project but some of these classes call executables 
written in languages other than Java. I am investigating the possibility of 
transitioning to Kepler. Essentially my desires are to link these Java classes 
in a DAG representing the dataflow and to execute the dataflow in Amazon EC2. 
The data flowing along the edges are arbitrary custom Java classes. 
Additionally it is important to cache intermediate results. The data is 
acquired from a few web services: iRefIndex, IntAct, UniProt, and Gene 
Ontology. There are complex software dependencies so after setting up the 
dataflow I would like to save the entire system as an abstract machine image 
(AMI). How difficult would this transition be, and would it be worth the 
effort? I would appreciate your comments and advice.
                Sincerely, with best wishes,
                Luqman Hodgkinson,
                Ph.D. student, UC-Berkeley
_______________________________________________
Kepler-users mailing list
[email protected]
http://lists.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-users

_______________________________________________
Kepler-users mailing list
[email protected]
http://lists.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-users

Re: [kepler-users] dataflow using Kepler on Amazon EC2

Reply via email to