Re: Top level name proposal - ComputeCluster

Fields, Christopher J Fri, 05 Sep 2014 07:23:50 -0700

Probably way off-topic but just a question: should a generic interface target 
something like DRMAA?


    http://en.wikipedia.org/wiki/DRMAA

That would work across most clusters as it’s a single unified API.

(there is a DRMAA module, Schedule::DRMAAc, but I believe it’s XS-based and way 
out of date; at least I could never get it to install)

chris

On Sep 5, 2014, at 9:12 AM, John Macdonald <[email protected]> wrote:

> Dana, I may be wrong here, but I think that Hadoop is one form of compute 
> cluster management software, just as SGE is.  I'm aiming to provide a generic 
> interface layer that you can use for writing code to be distributed across a 
> cluster.  By changing one parameter, cluster=>'Hadoop' instead of 
> cluster=>'SGE' your same code would run on a different type of cluster.  
> There would be limitations if you used cluster-specific capabilities, just as 
> there are the same limitations converting a database connection that uses DBI 
> to replace the underlying database platform, but *most* of the code would be 
> unaffected.  (Assuming that I get a good enough generic interface definition 
> that captures balances the requirements and capabilities of different 
> clusters well enough in a single consistent form. :-)
> 
> John Macdonald
> Software Engineer
> 
> Ontario Institute for Cancer Research
> MaRS Centre
> 
> 661 University Avenue
> 
> Suite 510
> Toronto, Ontario
> 
> Canada M5G 0A3
> 
> 
> Tel:
> 
> Email: [email protected]
> 
> Toll-free: 1-866-678-6427
> Twitter: @OICR_news
> 
> 
> www.oicr.on.ca
> 
> This message and any attachments may contain confidential and/or privileged 
> information for the sole use of the intended recipient. Any review or 
> distribution by anyone other than the person for whom it was originally 
> intended is strictly prohibited. If you have received this message in error, 
> please contact the sender and delete all copies. Opinions, conclusions or 
> other information contained in this message may not be that of the 
> organization.
> 
> ________________________________________
> From: Dana Hudes [[email protected]]
> Sent: September 5, 2014 10:03 AM
> To: John Macdonald
> Cc: [email protected]
> Subject: Re: Top level name proposal - ComputeCluster
> 
> So you intend to develop a new pure Perl compute cluster? Because if you just 
> need to get the job done why would you not use Hadoop whether private cluster 
> or AWS? It has a Perl APi and it will cheerfully run Perl jobs.
> Hadoop is an Apache project, open source free software with a large installed 
> base.
> 
> -----Original Message-----
> From: John Macdonald <[email protected]>
> Date: Fri, 5 Sep 2014 13:57:47
> To: Fields, Christopher J<[email protected]>
> Cc: James E Keenan<[email protected]>; 
> [email protected]<[email protected]>
> Subject: RE: Top level name proposal - ComputeCluster
> 
> I'm intending that ComputeCluster (or whatever the final name turns out to 
> be) will be domain-agnostic at the top level interface at least.  However, my 
> lab will be using it for genome analysis pipelines, and I suspect a 
> significant proportion of the potential other users will also be in this 
> field (as shown by the repsonses on this discussion already) so there could 
> be domain-specific submodules - either within this namespace or in other 
> namespaces simply using this module set.
> 
> Chris, Alex, and anyone else who is interested as a potential future 
> user/contributor, feel free to email me outside of this module-authors 
> discussion about how the actual module will develop.
> 
> John Macdonald
> Software Engineer
> 
> Ontario Institute for Cancer Research
> MaRS Centre
> 
> 661 University Avenue
> 
> Suite 510
> Toronto, Ontario
> 
> Canada M5G 0A3
> 
> 
> Tel:
> 
> Email: [email protected]
> 
> Toll-free: 1-866-678-6427
> Twitter: @OICR_news
> 
> 
> www.oicr.on.ca
> 
> This message and any attachments may contain confidential and/or privileged 
> information for the sole use of the intended recipient. Any review or 
> distribution by anyone other than the person for whom it was originally 
> intended is strictly prohibited. If you have received this message in error, 
> please contact the sender and delete all copies. Opinions, conclusions or 
> other information contained in this message may not be that of the 
> organization.
> 
> ________________________________________
> From: Fields, Christopher J [[email protected]]
> Sent: September 5, 2014 9:47 AM
> To: John Macdonald
> Cc: James E Keenan; [email protected]
> Subject: Re: Top level name proposal - ComputeCluster
> 
> Yup, I agree.  I think Cluster is too generic and can mean a lot of things (I 
> think of cluster analysis myself).  Maybe something more distinctive?  Is it 
> application- or domain-specific (bioinformatics, etc)?
> 
> There are a few tools with similar functionality that come to mind.  Most of 
> them have catchy names; one written in Perl is Clusterflow (not on CPAN but 
> here: https://github.com/ewels/clusterflow/).  Another is the (completely 
> unmaintained, likely broken, but possibly useful for something) biopipe 
> project: https://github.com/bioperl/bioperl-pipeline.  I have thought about 
> retooling the latter to be less reliant on bioperl and more a stand-alone 
> tool.
> 
> There are a couple Java tools also: bpipe (https://code.google.com/p/bpipe/) 
> and nextflow (https://github.com/nextflow-io/nextflow).
> 
> And I agree with Alex; as you might guess based on my comment on biopipe, our 
> group would be very interested in helping out on this, even if it’s at simply 
> the testing phase (we run PBS/Torque locally).
> 
> chris
> 
> On Sep 5, 2014, at 8:00 AM, John Macdonald <[email protected]> wrote:
> 
>> Cluster was my first thought for a name, but when I did a search to see what 
>> modules already existed (bot in case someone had already written a generic 
>> cluster module saving me the bother of starting a new one, and to see what 
>> types of cluster had cluster-specific modules written for them) the word 
>> cluster came up in a large number of contexts.  An tightly connected group 
>> of "things" is a cluster (e.g. nodes in a graph) - so I didn't think that 
>> the simple name would be clear enough.  The name Cluster leaves the reader 
>> with the immediate question "Cluster of what?".
>> 
>> John Macdonald
>> Software Engineer
>> 
>> Ontario Institute for Cancer Research
>> MaRS Centre
>> 
>> 661 University Avenue
>> 
>> Suite 510
>> Toronto, Ontario
>> 
>> Canada M5G 0A3
>> 
>> 
>> Tel:
>> 
>> Email: [email protected]
>> 
>> Toll-free: 1-866-678-6427
>> Twitter: @OICR_news
>> 
>> 
>> www.oicr.on.ca
>> 
>> This message and any attachments may contain confidential and/or privileged 
>> information for the sole use of the intended recipient. Any review or 
>> distribution by anyone other than the person for whom it was originally 
>> intended is strictly prohibited. If you have received this message in error, 
>> please contact the sender and delete all copies. Opinions, conclusions or 
>> other information contained in this message may not be that of the 
>> organization.
>> 
>> ________________________________________
>> From: James E Keenan [[email protected]]
>> Sent: September 5, 2014 7:25 AM
>> To: [email protected]
>> Subject: Re: Top level name proposal - ComputeCluster
>> 
>> On 09/04/2014 10:23 AM, John Macdonald wrote:
>>> Hi,
>>> 
>>> I wanted to get general comment/concensus about a top level name that I
>>> am proposing.
>>> 
>>> I'm starting to organize a set of modules for managing jobs on a
>>> computer cluster.  I intend it to work much like DBI - with a top level
>>> abstract interface that programs can use, actually implemented by
>>> drivers that translate the common interface into the interface used by
>>> the particular type of compute cluster that is being accessed.
>>> Initially, I will provide a driver for SGE, since that is what we have
>>> and use in our lab (but after I have that running, my PI can get me
>>> access to a couple of other type of compute cluster to add some more.
>>> 
>>> For naming, I am planning to use:
>>> 
>>>    ComputeCluster - top level name
>>>      - will provide switching functions to create a class of object
>>> for a particular cluster type
>>> 
>> 
>> Could that be shortened to simply:  Cluster ?
>

Re: Top level name proposal - ComputeCluster

Reply via email to