Probably way off-topic but just a question: should a generic interface target
something like DRMAA?
http://en.wikipedia.org/wiki/DRMAA
That would work across most clusters as it’s a single unified API.
(there is a DRMAA module, Schedule::DRMAAc, but I believe it’s XS-based and way
out of date; at least I could never get it to install)
chris
On Sep 5, 2014, at 9:12 AM, John Macdonald <[email protected]> wrote:
> Dana, I may be wrong here, but I think that Hadoop is one form of compute
> cluster management software, just as SGE is. I'm aiming to provide a generic
> interface layer that you can use for writing code to be distributed across a
> cluster. By changing one parameter, cluster=>'Hadoop' instead of
> cluster=>'SGE' your same code would run on a different type of cluster.
> There would be limitations if you used cluster-specific capabilities, just as
> there are the same limitations converting a database connection that uses DBI
> to replace the underlying database platform, but *most* of the code would be
> unaffected. (Assuming that I get a good enough generic interface definition
> that captures balances the requirements and capabilities of different
> clusters well enough in a single consistent form. :-)
>
> John Macdonald
> Software Engineer
>
> Ontario Institute for Cancer Research
> MaRS Centre
>
> 661 University Avenue
>
> Suite 510
> Toronto, Ontario
>
> Canada M5G 0A3
>
>
> Tel:
>
> Email: [email protected]
>
> Toll-free: 1-866-678-6427
> Twitter: @OICR_news
>
>
> www.oicr.on.ca
>
> This message and any attachments may contain confidential and/or privileged
> information for the sole use of the intended recipient. Any review or
> distribution by anyone other than the person for whom it was originally
> intended is strictly prohibited. If you have received this message in error,
> please contact the sender and delete all copies. Opinions, conclusions or
> other information contained in this message may not be that of the
> organization.
>
> ________________________________________
> From: Dana Hudes [[email protected]]
> Sent: September 5, 2014 10:03 AM
> To: John Macdonald
> Cc: [email protected]
> Subject: Re: Top level name proposal - ComputeCluster
>
> So you intend to develop a new pure Perl compute cluster? Because if you just
> need to get the job done why would you not use Hadoop whether private cluster
> or AWS? It has a Perl APi and it will cheerfully run Perl jobs.
> Hadoop is an Apache project, open source free software with a large installed
> base.
>
> -----Original Message-----
> From: John Macdonald <[email protected]>
> Date: Fri, 5 Sep 2014 13:57:47
> To: Fields, Christopher J<[email protected]>
> Cc: James E Keenan<[email protected]>;
> [email protected]<[email protected]>
> Subject: RE: Top level name proposal - ComputeCluster
>
> I'm intending that ComputeCluster (or whatever the final name turns out to
> be) will be domain-agnostic at the top level interface at least. However, my
> lab will be using it for genome analysis pipelines, and I suspect a
> significant proportion of the potential other users will also be in this
> field (as shown by the repsonses on this discussion already) so there could
> be domain-specific submodules - either within this namespace or in other
> namespaces simply using this module set.
>
> Chris, Alex, and anyone else who is interested as a potential future
> user/contributor, feel free to email me outside of this module-authors
> discussion about how the actual module will develop.
>
> John Macdonald
> Software Engineer
>
> Ontario Institute for Cancer Research
> MaRS Centre
>
> 661 University Avenue
>
> Suite 510
> Toronto, Ontario
>
> Canada M5G 0A3
>
>
> Tel:
>
> Email: [email protected]
>
> Toll-free: 1-866-678-6427
> Twitter: @OICR_news
>
>
> www.oicr.on.ca
>
> This message and any attachments may contain confidential and/or privileged
> information for the sole use of the intended recipient. Any review or
> distribution by anyone other than the person for whom it was originally
> intended is strictly prohibited. If you have received this message in error,
> please contact the sender and delete all copies. Opinions, conclusions or
> other information contained in this message may not be that of the
> organization.
>
> ________________________________________
> From: Fields, Christopher J [[email protected]]
> Sent: September 5, 2014 9:47 AM
> To: John Macdonald
> Cc: James E Keenan; [email protected]
> Subject: Re: Top level name proposal - ComputeCluster
>
> Yup, I agree. I think Cluster is too generic and can mean a lot of things (I
> think of cluster analysis myself). Maybe something more distinctive? Is it
> application- or domain-specific (bioinformatics, etc)?
>
> There are a few tools with similar functionality that come to mind. Most of
> them have catchy names; one written in Perl is Clusterflow (not on CPAN but
> here: https://github.com/ewels/clusterflow/). Another is the (completely
> unmaintained, likely broken, but possibly useful for something) biopipe
> project: https://github.com/bioperl/bioperl-pipeline. I have thought about
> retooling the latter to be less reliant on bioperl and more a stand-alone
> tool.
>
> There are a couple Java tools also: bpipe (https://code.google.com/p/bpipe/)
> and nextflow (https://github.com/nextflow-io/nextflow).
>
> And I agree with Alex; as you might guess based on my comment on biopipe, our
> group would be very interested in helping out on this, even if it’s at simply
> the testing phase (we run PBS/Torque locally).
>
> chris
>
> On Sep 5, 2014, at 8:00 AM, John Macdonald <[email protected]> wrote:
>
>> Cluster was my first thought for a name, but when I did a search to see what
>> modules already existed (bot in case someone had already written a generic
>> cluster module saving me the bother of starting a new one, and to see what
>> types of cluster had cluster-specific modules written for them) the word
>> cluster came up in a large number of contexts. An tightly connected group
>> of "things" is a cluster (e.g. nodes in a graph) - so I didn't think that
>> the simple name would be clear enough. The name Cluster leaves the reader
>> with the immediate question "Cluster of what?".
>>
>> John Macdonald
>> Software Engineer
>>
>> Ontario Institute for Cancer Research
>> MaRS Centre
>>
>> 661 University Avenue
>>
>> Suite 510
>> Toronto, Ontario
>>
>> Canada M5G 0A3
>>
>>
>> Tel:
>>
>> Email: [email protected]
>>
>> Toll-free: 1-866-678-6427
>> Twitter: @OICR_news
>>
>>
>> www.oicr.on.ca
>>
>> This message and any attachments may contain confidential and/or privileged
>> information for the sole use of the intended recipient. Any review or
>> distribution by anyone other than the person for whom it was originally
>> intended is strictly prohibited. If you have received this message in error,
>> please contact the sender and delete all copies. Opinions, conclusions or
>> other information contained in this message may not be that of the
>> organization.
>>
>> ________________________________________
>> From: James E Keenan [[email protected]]
>> Sent: September 5, 2014 7:25 AM
>> To: [email protected]
>> Subject: Re: Top level name proposal - ComputeCluster
>>
>> On 09/04/2014 10:23 AM, John Macdonald wrote:
>>> Hi,
>>>
>>> I wanted to get general comment/concensus about a top level name that I
>>> am proposing.
>>>
>>> I'm starting to organize a set of modules for managing jobs on a
>>> computer cluster. I intend it to work much like DBI - with a top level
>>> abstract interface that programs can use, actually implemented by
>>> drivers that translate the common interface into the interface used by
>>> the particular type of compute cluster that is being accessed.
>>> Initially, I will provide a driver for SGE, since that is what we have
>>> and use in our lab (but after I have that running, my PI can get me
>>> access to a couple of other type of compute cluster to add some more.
>>>
>>> For naming, I am planning to use:
>>>
>>> ComputeCluster - top level name
>>> - will provide switching functions to create a class of object
>>> for a particular cluster type
>>>
>>
>> Could that be shortened to simply: Cluster ?
>