The MR jobs I'm performing are not CPU intensive, so I've always assumed that they're more IO bound. Maybe that's an exceptional situation, but I'm not really sure.
A good motherboard with a local IO channel per disk, feeding individual cores, with memory partitioned up between them... and I've heard good things about Intel's next tock vis-a-vis internal system throughput. And yes, this would be a task for a paravirtualization system like Xen. Again, it's just a thought, but with low end quad core proc's running about $300, and the potential to cut the number of machines you need to physically setup by 75%, I'm not sure I'd say it'd only be good for a proof of concept. Also, I just set up a dozen odd boxes that are two generations behind modern boxes, and promptly blew a fuse. The TDP on the Xeon 3.06Ghz chips I'm using is 89W. The TDP on an Intel Q6600 is 65W, and it represents 4 cores. It's a simple experiment, but I don't have the resources on hand to run it. I'm curious if anyone has seen the performance impact from the different setups we're talking about. I also think you could come close to faking it with Hadoop config changes. -Colin On Fri, Jun 6, 2008 at 12:41 PM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > I once asked a wise man in change of a rather large multi-datacenter > service, "Have you every considered virtualization?" He replied, "All > the CPU's here are pegged at 100%" > > They may be applications for this type of processing. I have thought > about systems like this from time to time. This thinking goes in > circles. Hadoop is designed for storing and processing on different > hardware. Virtualization lets you split a system into sub-systems. > > Virtualization is great for proof of concept. > For example, I have deployed this: I installed VMware with two linux > systems on my windows host, I followed a hadoop multi-system-tutorial > running on two vmware nodes. I was able to get the word count > application working, I also confirmed that blocks were indeed being > stored on both virtual systems and that processing was being shared > via MAP/REDUCE. > > The processing however was slow, of course this is the fault of > VMware. VMware has a very high emulation overhead. Xen has less > overhead. LinuxVserver and OpenVZ use software virtualization (they > have very little (almost no) overhead). Regardless of how much > overhead, overhead is overhead. Personally I find the Vmware falls > short of its promises >