100 nodes is certainly overkill for 500MBs of data, but if you have the
resources, you might as well use them I suppose (assuming you're already
paying for power, network, cooling, etc.).

As for your idea of virtualization, it makes sense.  I don't know of anyone
running a Hadoop cluster on Windows.  That said, it might not be impossible,
but it'll be hard to get help from the list, as most people run Hadoop on
Linux.  For this reason, virtualization is probably a good idea, though it
will slow your cluster down.  Running in a virtualized world is generally
slower than a native world.

Hope this helps.

Alex

On Tue, Jun 9, 2009 at 9:23 AM, PORTO aLET <portoa...@gmail.com> wrote:

> Hi,
> I am trying to setup a hadoop cluster to process our apache log (about
> 500MB
> a day).
> I am just not sure what kind of pc configuration I should use?
> We have a few windows xp machines (about 100+, too many to process 'just'
> 500MB ?) that I am thinking of using sparingly (during the night) to
> process
> the logs.
> So I am thinking of installing Linux in VirtualBox in them to run hadoop?
> Please share your knowledge and wisdom,
> Regards.
>

Reply via email to