Hi Cole:
That link you posted refers to our work at ISI. We're currently running LXC as
the hypervisor on our SGI UV. Other than performance, one of the issues with
KVM is that it currently has a hard-coded limit on how many vCPUs you can run
in a single instance, so we can't run, say, a 256 vcpus instance.
Some of the LXC-related issues we've run into:
- The CPU affinity issue on LXC you mention. Running LXC with OpenStack, you
don't get proper space sharing out of the box, each instance actually sees
all of the available CPUs. It's possible to restrict this, but that
functionality doesn't seem to be exposed through libvirt, so it would have to
be implemented in nova.
- LXC doesn't currently support volume attachment through libvirt. We were able
to implement a workaround by invoking lxc-attach inside of OpenStack instead
(e.g., see
https://github.com/usc-isi/nova/blob/hpc-testing/nova/virt/libvirt/connection.py#L482.
But to be able to use lxc-attach, we had to upgrade the Linux kernel in
RHEL6.1 from 2.6.32 to 2.6.38. This kernel isn't supported by SGI, which means
that we aren't able to load the SGI numa-related kernel modules.
Take care,
Lorin
--
Lorin Hochstein, Computer Scientist
USC Information Sciences Institute
703.812.3710
http://www.east.isi.edu/~lorin
On Dec 3, 2011, at 5:08 PM, Cole wrote:
First and foremost:
http://wiki.openstack.org/HeterogeneousSgiUltraVioletSupport
With Numa and lightweight container technology (LXC / OpenVZ) you can achieve
very close to real hardware performance for certain HPC applications. The
problem with technologies like LXC is there isn't a ton of logic to address
the cpu affinity that other hypervisors offer (which generally wouldn't be
ideal for HPC).
On the interconnect side. There are plenty of
open-mx(http://open-mx.gforge.inria.fr/) HPC applications running on
everything from single channel 1 gig to bonded 10 gig.
This is an area I'm personally interested in and have done some testing and
will be doing more. If you are going to try HPC with ethernet, Arista makes
the lowest latency switches in the business.
Cole
Nebula
On Sat, Dec 3, 2011 at 11:11 AM, Tim Bell tim.b...@cern.ch wrote:
At CERN, we are also faced with similar thoughts as we look to the cloud on
how to match the VM creation performance (typically O(minutes)) with the
required batch job system rates for a single program (O(sub-second)).
Data locality to aim that the job runs close to the source data makes this
more difficult along with fair share to align the priority of the jobs to
achieve the agreed quota between competing requests for limited and shared
resource. The classic IaaS model of 'have credit card, will compute' does
not apply for some private cloud use cases/users.
We would be interested to discuss further with other sites. There is further
background from OpenStack Boston at http://vimeo.com/31678577.
Tim
tim.b...@cern.ch
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp