Edward Capriolo wrote:


I have not used it much, but I think HOD is pretty cool. I guess most people
who are looking to (spin up, run job ,transfer off, spin down) are using
EC2. HOD does something like make private hadoop clouds on your hardware and
many probably do not have that use case. As schedulers advance and get
better HOD becomes less attractive, but I can always see a place for it.

I don't know who is using it, or maintaining it; we've been bringing up short-lived Hadoop clusters different.

I think I should write a little article on the topic; I presented about it at Berlin Buzzwords last week.

Short lived Hadoop clusters on VMs are fine if you don't have enough data or CPU load to justify a set of dedicated physical machines, and is a good way of experimenting with Hadoop at scale. You can maybe lock down the network better too, though that depends on your VM infrastructure.

Where VMs are weak is in disk IO performance, but there's no reason why the VM infrastructure can't take a list of filenames/directories as a hint for VM placement (placement is the new scheduling, incidentally), and virtualized IO can only improve. If you can run Hadoop MapReduce directly against SAN-mounted storage then you can stop worrying about locality of data and still gain from parallelisation of the operations.


-steve


Reply via email to