Anything thats an alternative to oozie is welcome. This also come with a full blown wf designer which is nice.
On Tue, Sep 23, 2014 at 5:47 AM, Srikanth Sundarrajan <[email protected]> wrote: > What do you guys think of this? A viable alternative for the scheduler? > > Sent from my iPhone > > Begin forwarded message: > > > From: Stian Soiland-Reyes <[email protected]> > > Date: 23 September 2014 6:13:21 pm IST > > To: [email protected] > > Cc: List for general discussion and hacking of the Taverna project < > [email protected]> > > Subject: [Proposal] Taverna workflow > > Reply-To: [email protected] > > > > I hereby present the Apache Incubator proposal for the project Taverna. > > > > > > Also available in rich text in the Taverna wiki (with more hyperlinks!): > > > > > http://dev.mygrid.org.uk/wiki/display/developer/Taverna+incubator+proposal > > > > (Could someone grant me access to edit the Incubator wiki pages? My > > wiki username is soilandreyes) > > > > > > > > > > # Abstract > > > > Taverna is an open source and domain-independent suite of tools used > > to design and execute data-driven workflows. > > > > > > # Proposal > > > > The Taverna suite includes: > > > > * Taverna Workbench, a Java-based desktop application for graphically > > composing, editing and executing workflows of distributed web services > > and local tools > > * Taverna Commandline Tool which allows repeated execution of > > parameterized workflow definitions > > * Taverna Server provides a REST and SOAP API for executing workflows > > * Taverna Player is a Ruby-based web interface towards the Server, > > providing a high-level view of workflow executions and their results, > > and allows further integrations with Ruby on Rails applications. > > > > Taverna can browse and combine different service types, allowing > > workflows to integrate steps of arbitrary REST and SOAP web services > > with command line tools (local and SSH), scripts (Beanshell, R, > > Jython) and finally visualize the results. > > > > The goal of the Taverna suite is to help researchers to access > > distributed datasets and processing capabilities by the construction > > of pipelines, and also to simplify the execution of these pipelines > > in various environments. > > > > The Taverna suite of products is already successful and in wide-use > > across different domains. The software is currently licensed as LGPL > > 2.1, with copyright owned by University of Manchester. External > > contributors have all signed Apache-like CLAs. > > > > > > # Background > > > > Taverna workflows coordinate inputs and outputs between computational > > processes and Web Services. The workflow is designed in a graphical > > interface which shows the workflow as a series of boxes and arrows; > > representing processes and their data connections. The different > > processes in a workflow can be command line tools, REST and WSDL Web > > Services; which are used for combining steps such as data acquisition, > > filtering, cleaning, integrating, analysis and visualization. Taverna > > calls these processes "services", as they generally are provided by > > remote (third-party) servers. > > > > These kind of computational workflows, also known as pipelines and > > dataflows, focus on the movement of data rather than the execution > > order of the underlying processes. Features such as implicit > > iterations (where an input list of values causes multiple process > > executions) and parallel invocations (independent processes are > > executed as soon as their data is available) are intrinsic to a > > dataflow system, not requiring any particular constructs by the > > workflow designer. > > > > As a visual programming environment, workflows aids collaboration and > > reuse of workflows. At the highest level, a workflow represents the > > conceptual level of an analysis, allowing understanding, discussion > > and communication of the overall analysis protocol. More detail can be > > revealed and modified for individual steps. At the individual process > > level, the workflow defines execution specifics such as operations, > > parameters and command line tools. > > > > Sharing of the workflow definitions allows re-use and re-purposing of > > the computational analysis. During workflow execution, provenance can > > be collected from every step, allowing deep inspection of intermediate > > values for the purpose of debugging and validation. > > > > > > # Rationale > > > > There is a strong need to lower the barrier of entry to datasets and > > computational resources widely available on the Internet, to increase > > their use by researchers who understand the computational steps needed > > to produce their results, but who are not necessarily expert > > programmers. Taverna has already shown its success and popularity in a > > wide range of scientific disciplines. > > > > > > # Initial Goals > > > > * Transition mailing lists to Apache (keep existing subscribers, but > > invite more) > > * Taverna developer workshop (2014-10-30) > > * Prepare git repositories for move: > > * Update headers/metadata to indicate Apache License 2.0 > > * Restructure git repositories > > * Rename Maven groupIds to org.apache.taverna.* > > * Rename packages to org.apache.taverna.* > > > > * Move Github repositories to Apache git > > * Automated builds in Apache's Jenkins > > * Update to latest releases of Apache dependencies > > * Propose updated release & testing procedure under Apache > > * Moved Website and documentation > > > > We intend to only release the current development version Taverna 3.x > > http://www.taverna.org.uk/developers/work-in-progress/taverna-3/ under > > the Apache umbrella (). 3.0 is not yet officially released - however > > the Taverna 3.0 Command Line can be released almost "as-is" after > > migration. The Taverna 3.0 Server is at beta quality, while the > > Taverna 3.0 Workbench is at alpha stage and would need to be > > stabilized to an initial beta release. > > > > * Before first release: Maven Central releases of Taverna support > > libraries (e.g. taverna-scufl2 and taverna-databundle) > > * First release: Apache Taverna Command Line 3.0 (OSGi-based) > > * Release: Apache Taverna Server 3.0 > > * Release: Apache Taverna Workbench 3.0 beta > > * Provenance exchange with relevant Apache products (e.g. Apache > > CXF->Taverna->CouchDB) > > * Release: Apache Taverna Workbench 3.0 > > > > It is not yet decided if the current Workbench Editions > > http://www.taverna.org.uk/download/workbench/2-5/ will be carried over > > to Taverna 3, or if this can be solved by having a "Install extra > > plugin" step on first start-up of Apache Taverna. In any case, we > > imagine that some of these specializing editions will be maintained > > outside (but in collaboration with) the Apache project. This is > > particularly the case for the Astronomy edition as it depends on > > several LGPL/GPL libraries and is maintained by the AstroTaverna team. > > > > > > # Current Status > > > > ## Meritocracy > > > > Taverna was initially created by the myGrid consortium in 2003. Since > > 2006, the majority of contributions to Taverna's core code-base, its > > architecture and direction have been led by staff at The University of > > Manchester and The European Bioinformatics Institute (EMBL-EBI). > > > > The project have benefited of a high-degree of extensions and > > integrations by other developers - but mainly in the form of plugins > > ( > http://www.taverna.org.uk/documentation/taverna-2-x/taverna-2-x-plugins/) > > and integrations > > (http://www.taverna.org.uk/developers/work-in-progress/taverna-online/ > > http://www.taverna.org.uk/download/associated-tools/). > > > > Taverna's developer community have unfortunately not had a culture of > > submitting patches that would warrant later commit access - perhaps > > due to its background in the science community. However contributors > > have been added as committers when the plugin becomes a part of the > > core distribution (e.g. External Tool plugin by Möller and Krabbenhöft > > and AstroTaverna by Garrido), or when their development has required > > patches to the existing code base. > > > > > > ## Community > > > > Taverna has an active community of plug-in developers and users. The > > developer mailing list ([email protected]) has 248 > > members, the user mailing list ([email protected]) > > has 370 members. > > > > 1500 users have registered as of 19 August 2014. Total downloads of > > all products since version 2.1 (released December 2009) is 35000. > > > > A Taverna Developer workshop is being arranged for 30 October 2014 to > > bring together developers and integrators of Taverna. We want to > > encourage plug-in developers to participate further also in the core > > development of Taverna, by introducing them to the code base and how > > to contribute. > http://dev.mygrid.org.uk/wiki/display/developer/Taverna+Open+Development+Workshop > > > > Active steps to grow the communities of users and developers by > > targeting specific research domains such as the work by Kevin Benson > > on Taverna's use in the Heliophysics and Astrophysics community. > > Susheel Varma is increasing usage of Taverna within the Biomedical > > domain. Julián Garrido and his work on AstroTaverna is promoting > > Taverna within the IVOA Virtual Astronomy community. Sonja Holl and > > Björn Hagemeier's are targeting high performance computing. > > > > > > ## Core Developers > > > > What we currently consider to be the core Taverna Team is (in > > alphabetical order): > > > > Christian Brenninkmeijer (University of Manchester) > > Donal Fellows (University of Manchester) > > Robert Haines (University of Manchester) > > Aleksandra Nenadic (University of Manchester) > > Dmitry Repchevsky (Barcelona Supercomputing Center) > > Stian Soiland-Reyes (University of Manchester) > > Shoaib Sufi (University of Manchester) > > Vadim Surpin (Institute for Information Transmission Problems in Moscow) > > Alan Williams (University of Manchester) > > > > The team consists of experienced developers who have worked on a > > multitude projects, particular within writing software for supporting > > scientists. The committers list (See below) includes additionally > > plugin developers whose contributions have become part of Taverna. > > Part of our desire to join the Apache Foundation is to recognise their > > effort and promote them into also being "core developers". > > > > > > ## Alignment > > > > Taverna dependencies include Apache Commons, Axis, Abdera, Batik, CXF, > > Derby, Felix, HttpComponents, Jena, log4j, Maven, POI, Velocity, > > Xerces, XMLBeans, Xalan, We use Tomcat for testing and deployment of > > the Taverna Server. > > As part of moving to Apache-compatible dependencies, Taverna will > > probably adopt OpenJPA to replace (LGPL) Hibernate. > > > > > > > > # Known Risks > > > > ## Orphaned products > > > > Most of the core developers are from the myGrid team at University of > > Manchester, but are funded through a series of projects - see > > http://www.mygrid.org.uk/projects/. Many of these projects incorporate > > Taverna, so the effort from Manchester is partially based on direct > > project requirements, but also partially a volunteer effort for > > project maintenance and general development. The myGrid team has > > guaranteed funding until 2017. > > > > The developers that are outside Manchester are generally funded for > > other activities, and so their effort to Taverna is to a greater > > extent a volunteer effort - although again project-specific > > requirements steer their effort (e.g. for a new Taverna plugin). > > > > One of the reasons for our desire to move to the Apache Foundation is > > to formalise this volunteering/contribution effort so that it becomes > > obvious that it is not just University of Manchester that is > > contributing to the core code base - and therefore reducing the > > impression that Taverna is vulnerable to Manchester’s future funding > > and projects. > > > > > > ## Inexperience with Open Source > > > > Taverna has been an open-source project since its first release in > > 2003. Most of the contributors also have experience with working with > > and contributing to other open source projects (e.g. TCL, CXF, Jena), > > particularly as Taverna strongly relies on other open source tools. > > Most of the research projects which the myGrid members have > > participated in produces open-source software. > > > > > > ## Homogeneous Developers > > > > The committers list includes many people from myGrid, University of > > Manchester in United Kingdom - but these developers have been working > > on a range of distributed and European projects in the field of > > scientific software - see http://www.mygrid.org.uk/projects/ > > > > The other developers on the committers list come from many different > > projects and institutions across the world, from Russia, Canada, > > Germany and Spain. > > > > > > ## Reliance on Salaried Developers > > > > Development for Taverna is mainly performed as part of the developers' > > salaried work, but funded through many different projects at several > > institutions (see above). These projects don't generally have > > "contribute to Taverna" as their main goals - so therefore in many > > ways the effort is still volunteer-based - contributing to Taverna as > > a way to support one's own work. > > > > From our experience of running Taverna over the last 10 years, new > > contributors will continue to join as Taverna becomes an ingredient in > > new projects, while existing contributors more slowly fade out of > > their involvement. Often existing contributors and users gives the > > personal link to the new contributors. > > > > > > ## Relationships with Other Apache Products > > > > Apache already contains projects that seem relevant to Taverna. > > > > Apache Pig https://pig.apache.org/ is a high-level language for > > creating Map-Reduce programs for Apache Hadoop. There already exists > > third-party efforts to convert Taverna Workflows to Hadoop and Pig - > > https://github.com/umaqsud/taverna-to-pig > > https://github.com/schenck/taverna-to-hadoop (thus making a graphical > > interface for building Apache Pig workflows) - and part of the Apache > > Taverna effort would be to invite these to join the project. > > > > Apache Airavata http://airavata.apache.org/ is a software framework > > for executing and managing computational jobs and workflows on > > distributed computing resources. Taverna's concern is not as much job > > coordination, but more of a data flow between services. Airavata's > > XBaya Workflow Suite can export workflows in Taverna 1 format SCUFL, > > but could be updated to work with Taverna 3's SCUFL2 format. > > > > Apache ODE https://ode.apache.org/ is a WS-BPEL workflow engine. BPEL > > as a workflow language is quite verbose compared to dataflow languages > > like Taverna, and is additionally bound to a particular protocol > > (SOAP). Nevertheless, a sub-section of Taverna workflows could in > > theory run on the Apache ODE engine - and the Taverna 3 Platform API > > has facilities for plugging in alternative workflow engines. We have > > previously considered Apache Hadoop as one such alternate engine for > > executing a different subset of workflows with local command line > > tools. > > > > Apache Storm http://storm.incubator.apache.org/ is a distributed > > realtime computation framework. Experiments are under development to > > use Taverna as a front-end for creating Apache Storm workflows - > > http://markmail.org/message/zg5ylo2aucpwfc5j > > > > Apache has several popular frameworks for building REST/SOAP web > > services (Apache CXF, Apache Clerezza), data services (Apache Jena, > > Apache Hive, Apache CouchDB) and specific workflow engines (Apache > > Oozie for Hadoop, Apache ODE for WS-BPEL). Taverna as a general REST > > and SOAP service client can be used for combining, testing and > > demonstrating such services. > > > > > > ## A Excessive Fascination with the Apache Brand > > > > Taverna is a long-running project (since 2003) with an existing user- > > and developer base across the academic world. Our main motivation for > > moving to Apache is to further encourage an open development process > > and engage existing and new developers to contribute to the core code > > base. We also want to ensure long-term continuity of the Taverna > > products, and for its future directions to be decided by the whole > > Taverna community rather than one of the parties involved. > > > > > > > > # Documentation > > > > Taverna's documentation is available from > > http://www.taverna.org.uk/documentation/taverna-2-x/, including an > > extensive user manual at > > http://www.mygrid.org.uk/dev/wiki/display/taverna/User+Manual and > > tutorials http://www.taverna.org.uk/documentation/taverna-2-x/tutorials/ > > and videos http://www.taverna.org.uk/documentation/taverna-2-x/videos/. > > > > The developer documentation > > http://dev.mygrid.org.uk/wiki/display/developer/Developers+Guide > > includes tutorials > > http://dev.mygrid.org.uk/wiki/display/developer/Tutorials for working > > with Taverna's source code and creating plugins. > > > > > > # Initial Source > > > > Taverna's source code is available from the 'taverna' github team > > account: https://github.com/taverna/. These 85 git repositories > > reflect the current modules of Taverna's plugin system after recently > > transitioning from Google Code SVN at > > http://taverna.googlecode.com/svn/taverna/. The history of Taverna's > > code base goes back to being hosted in CVS at SourceForge > > http://taverna.cvs.sourceforge.net/, transitioned as of > > http://taverna.googlecode.com/svn/archived/cvs2svn-2008-09-25/. Note > > that reasonable steps have been made to preserve commit history when > > moving between version control system, this has not always been > > achieved when moving between modules and refactoring larger Java > > packages. Some source files might therefore in git have initial > > commits like "Moved from /taverna/utils/trunk" referring to SVN paths. > > > > One of the reason for many repositories is that we rely on Apache > > Maven and a plugin system (since Taverna 3 OSGi-based) where different > > modules have different version numbers and release cycles (e.g. > > tags/branches). This is essential for the plug-in support of Taverna > > as the plug-ins depend on the semantic versioning of the APIs and > > required implementations. > > > > It is however in our current plans to merge repositories that have > > similar release cycles and greatly reduce the number of repositories. > > > > Taverna source code uses the package names (and children packages): > > > > net.sf.taverna - since Taverna 2 > > uk.org.taverna - new from Taverna 3 > > org.taverna (sic) - Taverna Server > > > > Some contributed code uses package names depending on their > > originating projects: > > > > org.purl.wf4ever.provtaverna > > org.biomart.martservice > > > > We intend to release only the upcoming Taverna 3.0 version under the > > Apache umbrella (not 2.x) - therefore, according to semantic > > versioning rules http://semver.org/, the transition period of the > > Apache Incubator would be the best (and possibly only) chance to > > rename Java packages and Maven groupIDs to org.apache.taverna.* Under > > OSGi the packaging and JAR goes hand-in-hand (several JARs don't > > normally provide the same package), and therefore any package rename > > would be done together with the repository restructuring. > > > > > > # Source and Intellectual Property Submission Plan > > > > Taverna source code from http://github.com/taverna/ > > > > (c) University of Manchester. > > Signed Apache-like CLAs for all external contributors. > > Current license is LGPL 2.1 (and GPL3 for one domain-specific > > download), as copyright holder Manchester can change this to Apache > > License 2.0 > > > > taverna.org.uk domain - registrant University of Manchester > > http://www.taverna.org.uk/ content (c) University of Manchester > > http://dev.mygrid.org.uk/wiki/display/tav250/ Confluence wiki content > > (c) University of Manchester > > http://dev.mygrid.org.uk/wiki/display/developer Confluence wiki > > content (c) University of Manchester > > > > The details of intellectual property submission will be worked out > > together with myGrid project manager Shoaib Sufi and the University of > > Manchester's Contracts Office. > > > > > > # External Dependencies > > > > Taverna, as an integrating workflow system, has a fairly large number > > of dependencies - the latest 2.5.0 Core Workbench distribution has 517 > > JARs (although many of those are duplicates in different versions) > > > > We are intending for our first Apache-based release to be Taverna 3, > > which has already reduced this dependency list. > > > > We have performed an analysis of our dependencies of Taverna 3 at > > http://dev.mygrid.org.uk/wiki/display/developer/Taverna+Dependencies - > > but this is not yet a complete list. > > > > A second analysis looks at the license of those dependencies at > > http://dev.mygrid.org.uk/wiki/display/developer/Third-party+licenses - > > where we have some incompatible (LGPL) dependencies. Most of these are > > resolvable as they are part of optional plugins to Taverna (e.g. R > > support, BioMart). The dependency on Hibernate requires some developer > > effort to be replaced with either Apache Open JPA or a "No-SQL" > > solution. > > > > > > # Cryptography > > > > Taverna uses these cryptography dependencies: > > > > BouncyCastle > > OpenJDK builds with the default JCE full encryption policy (bundled in > > installer) > > > > Taverna utilise these to form of an encrypted keystore (storing > > username/password and client certificates for third-party services > > accessed by the designed workflow) with corresponding user interface, > > and additionally binds to Java's SSL support to provide UI and command > > line options for security interactions, e.g. accepting new server > > certificates, or asking for username/passwords for HTTP Basic > > authentication (which can then be stored in the keystore). > > > > > > # Required Resources > > > > Taverna currently relies on a mixture of infrastructure hosted for > > free by third-parties (e.g. Github, SourceForge, GoogleCode, > > Launchpad, Bitbucket) and infrastructure hosted by myGrid at > > University of Manchester (Jenkins, Jira, Confluence, Wordpress). > > > > ## Mailing lists > > > > Existing mailing lists for Taverna are hosted at Sourceforge with > > archives at markmail. See http://www.taverna.org.uk/about/ > > > > [email protected] (replacing > > [email protected]) > > [email protected] (replacing [email protected] > > - to a lesser degree as we would want to encourage openness) > > [email protected] (replacing > > [email protected], 240 members) > > [email protected] (replacing > > [email protected], 370 members) > > > > > > ## Git repositories > > > > The Taverna community would prefer to keep using git and Github, and > > we would request for experimental writable git repositories > > http://www.apache.org/dev/writable-git with mirroring to Github. > > > > The repositories would be named taverna-*, as the current repositories > > on the github team: https://github.com/taverna/. This repository > > organization is styled equivalent to the git repositories of cordova-* > > and couchdb-*. > > > > Exactly how repositories are split/merged is open for discussion - it > > is part of our current plan to reduce the number of repositories by > > merging common modules with a similar release cycle - this could be > > done at an early phase of the incubation period. > > > > > > ## Issue Tracking > > > > JIRA Taverna (TAV) > > > > Existing issues in Taverna 3's current JIRA - > > http://dev.mygrid.org.uk/issues/browse/T3 - should be imported - but > > its current list of Modules should be further agreed. > > > > > > ## Other Resources > > > > Wiki spaces in Confluence https://cwiki.apache.org/confluence - > > importing the most recent Taverna-related spaces and documentation > > from > http://dev.mygrid.org.uk/wiki/spacedirectory/view.action?startIndex=24 > > Jenkins - replacing myGrid Jenkins at http://build.mygrid.org.uk/ci/ > > Maven repository at https://repository.apache.org/ - replacing myGrid > > artifactory http://repository.mygrid.org.uk/ > > File-based web space for Plugin Update Site - replacing > > http://updates.taverna.org.uk/ and > > http://www.mygrid.org.uk/taverna/updates/ > > Home pages - to be transitioned from from http://www.taverna.org.uk/ > (Wordpress) > > Binary distribution download hosting, about ~8 GB pr release, > > replacing http://www.taverna.org.uk/download/ (currently downloads are > > hosted by http://launchpad.net/ and https://bitbucket.org/) > > > > > > # Initial Committers > > > > The initial list of committers reflect the current list of active > > developers at the Github team: https://github.com/orgs/taverna/people > > (Note that not all of these have made their membership public on > > Github) > > > > > > Alan R [email protected] > > Aleksandra [email protected] > > Christian Y. [email protected] > > David [email protected] > > Dmitriy Repchevsky [email protected] > > Donal K. [email protected] > > Finn [email protected] > > Hajo Nils Krabbenhö[email protected] > > Ian [email protected] > > Ingo [email protected] > > Julián [email protected] > > Mark [email protected] > > Luke [email protected] > > Robert [email protected] > > Shoaib [email protected] > > Steffen Mö[email protected] > > Stian [email protected] (Apache CLA Signed) > > Stuart [email protected] > > > > In addition to the Core Team (mentioned earlier), this list also > > reflects Taverna's existing meritocrazy as it includes plugin > > developers whose contributions have been merged into the main code > > base. We acknowledge that not all of these are likely to continue as > > "Core" developers, but would like to encourage that during the > > Incubating process. > > > > > > # Affiliations > > > > The majority of the initial committers are employed by University of > > Manchester as part of the myGrid team, including responsibilities for > > contributing to and supporting Taverna. > > http://www.mygrid.org.uk/about-us/people/core-mygrid-team/. > > > > Dmitriy Repchevsky is employed by the Barcelona Supercomputing Center, > > including responsibilities for contributing to Taverna. Steffen Möller > > is employed by University of Lübeck. Julián Garrido is employed by > > Instituto de Astrofísica de Andalucía. > > > > > > # Sponsor Champion > > > > Andy Seaborne > > > > > > # Nominated Mentors > > > > * Andy Seaborne > > > > > > # Sponsoring Entity > > > > The Incubator. > > > > > > > > > > > > Your feedback is very much welcome! > > > > > > -- > > Stian Soiland-Reyes, myGrid team > > School of Computer Science > > The University of Manchester > > http://soiland-reyes.com/stian/work/ > http://orcid.org/0000-0001-9842-9718 > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > -- Regards, Venkatesh “Perfection (in design) is achieved not when there is nothing more to add, but rather when there is nothing more to take away.” - Antoine de Saint-Exupéry
