Re: [Beowulf] HPC in the cloud question

Prentice Bisbal Fri, 08 May 2015 06:41:37 -0700

Mike,

What are the characteristics of your cluster workloads? Are they tightlycoupled jobs, or are they embarassingly parallel or serial jobs? I findit hard to believe that a virtualized, ethernet shared networkinfrastructure can compete with FDR IB for performance on tightlycoupled jobs. AWS HPC representatives came to my school to give apresentation on their offerings, and even they admitted as much.

If your workloads are communication intensive, I'd think harder aboutusing the cloud, or find a cloud provider that provides IB for HPC(there are a few that do, but I can't remember their names). If yourworkloads are loosely-coupled jobs or many serial jobs, AWS or similarmight be fine. AWS does not provide IB, and in fact shares very littleinformation about their network architecture, making it had to compareto other offerings without actually running benchmarks.

If your users primarily interact with the cluster through command-linelogins, using the cloud shouldn't be noticeably different thehostname(s) they have to SSH to will be different, and moving data in anout might be different, but compiling and submitting jobs should be thesame if you make the same tools available in the cloud that you have onyour local clusters.


Prentice



On 05/07/2015 06:28 PM, Hutcheson, Mike wrote:

Hi.  We are working on refreshing the centralized HPC cluster resources
that our university researchers use.  I have been asked by our
administration to look into HPC in the cloud offerings as a possibility to
purchasing or running a cluster on-site.

We currently run a 173-node, CentOS-based cluster with ~120TB (soon to
increase to 300+TB) in our datacenter.  It¹s a standard cluster
configuration:  IB network, distributed file system (BeeGFS.  I really
like it), Torque/Maui batch.  Our users run a varied workload, from
fine-grained, MPI-based parallel aps scaling to 100s of cores to
coarse-grained, high-throughput jobs (We¹re a CMS Tier-3 site) with high
I/O requirements.

Whatever we transition to, whether it be a new in-house cluster or
something ³out there², I want to minimize the amount of change or learning
curve our users would have to experience.  They should be able to focus on
their research and not have to spend a lot of their time learning a new
system or trying to spin one up each time they have a job to run.

If you have worked with HPC in the cloud, either as an admin and/or
someone who has used cloud resources for research computing purposes, I
would appreciate learning your experience.

Even if you haven¹t used the cloud for HPC computing, please feel free to
share your thoughts or concerns on the matter.

Sort of along those same lines, what are your thoughts about leasing a
cluster and running it on-site?

Thanks for your time,

Mike Hutcheson
Assistant Director of Academic and Research Computing Services
Baylor University


_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] HPC in the cloud question

Reply via email to