Colin Freas wrote:
I've wondered about this using single or dual quad-core machines with one
spindle per core, and partitioning them out into 2, 4, 8, whatever virtual
machines, possibly marking each physical box as a "rack".

Or just host VM images with multi-cpu support and make it one 'machine' with the appopriate # of task trackers.


There would be some initial and ongoing sysadmin costs.  But could this
increase thoughput on a small cluster, of 2 or 3 boxes with 16 or 24 cores,
with many jobs by limiting the number of cores each job runs on, to say 8?
Has anyone tried such a setup?


I can see reasons for virtualisation -better isolation and security- but dont think performance would improve. More likely anything clock-sensitive will get confused if, under load, some VMs get less cpu time than they expect.

A better approach could be to improve the schedulers in hadoop to put work where it is best, maybe even move if if things are taking too long, etc.

Reply via email to