Hi. We are working on refreshing the centralized HPC cluster resources that our university researchers use. I have been asked by our administration to look into HPC in the cloud offerings as a possibility to purchasing or running a cluster on-site.
We currently run a 173-node, CentOS-based cluster with ~120TB (soon to increase to 300+TB) in our datacenter. It¹s a standard cluster configuration: IB network, distributed file system (BeeGFS. I really like it), Torque/Maui batch. Our users run a varied workload, from fine-grained, MPI-based parallel aps scaling to 100s of cores to coarse-grained, high-throughput jobs (We¹re a CMS Tier-3 site) with high I/O requirements. Whatever we transition to, whether it be a new in-house cluster or something ³out there², I want to minimize the amount of change or learning curve our users would have to experience. They should be able to focus on their research and not have to spend a lot of their time learning a new system or trying to spin one up each time they have a job to run. If you have worked with HPC in the cloud, either as an admin and/or someone who has used cloud resources for research computing purposes, I would appreciate learning your experience. Even if you haven¹t used the cloud for HPC computing, please feel free to share your thoughts or concerns on the matter. Sort of along those same lines, what are your thoughts about leasing a cluster and running it on-site? Thanks for your time, Mike Hutcheson Assistant Director of Academic and Research Computing Services Baylor University _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
