100 nodes is certainly overkill for 500MBs of data, but if you have the resources, you might as well use them I suppose (assuming you're already paying for power, network, cooling, etc.).
As for your idea of virtualization, it makes sense. I don't know of anyone running a Hadoop cluster on Windows. That said, it might not be impossible, but it'll be hard to get help from the list, as most people run Hadoop on Linux. For this reason, virtualization is probably a good idea, though it will slow your cluster down. Running in a virtualized world is generally slower than a native world. Hope this helps. Alex On Tue, Jun 9, 2009 at 9:23 AM, PORTO aLET <portoa...@gmail.com> wrote: > Hi, > I am trying to setup a hadoop cluster to process our apache log (about > 500MB > a day). > I am just not sure what kind of pc configuration I should use? > We have a few windows xp machines (about 100+, too many to process 'just' > 500MB ?) that I am thinking of using sparingly (during the night) to > process > the logs. > So I am thinking of installing Linux in VirtualBox in them to run hadoop? > Please share your knowledge and wisdom, > Regards. >