I have some old data from a personal experiment. An year ago, CBayes
model generation from a subset of wikipedia(3 GB out of 17GB) over 6
Pentium HT 3.0GHz cluster with 100mbps switched ethernet took 15 mins.
An addition 5 mins was used to generated the 3 GB dataset from 17Gb
bringing total time to 20mins approx.

Note that hadoop sorted 1TB using 4000 quadcore/duo core systems over
gigabit/multigigabit connections. so there is no comparison.

I hope this info helps

Robin



On Sun, Nov 22, 2009 at 12:26 PM, Jeff Zhang <[email protected]> wrote:
> Maybe benchmark is what I like to know accurately,
>
> Just like hadoop has a benchmark that it can sort 1TB data in 62 seconds, so
> the same, how much time will it take mahout's bayes algorithms to train a
> model using data like 1GB?
>
>
> Thank you
>
> Jeff Zhang
>
>
> ---------- Forwarded message ----------
> From: Sean Owen <[email protected]>
> Date: Sat, Nov 21, 2009 at 10:44 PM
> Subject: Re: Is there performance comparison document ?
> To: [email protected]
>
>
> I think we can already state the answer though: it's going to take
> much more CPU time and resources to run a computation via Hadoop than
> run it completely on one machine (non-parallelized). Hadoop is a lot
> of overhead.
>
> However some problems are too big to fit on one machine, so you have
> to parallelize with Hadoop. In that case, there is no comparison --
> you can't run it without Hadoop.
>
> Also, parallelizing means you can finish the computation in fewer
> wall-clock seconds. It'll take more CPU-seconds though. But then the
> Hadoop runtime is just a function of how many machines you throw at it
> and how parallelizable it is, so it's not much of a comparison.
>
> Are you wondering how much the overhead is, of a framework like Hadoop?
>
> On Sun, Nov 22, 2009 at 6:30 AM, Jeff Zhang <[email protected]> wrote:
>> Hi all,,
>>
>> Since mahout is build upon hadoop, so is there any performance comparison
>> between the algorithms using hadoop and without using hadoop. ?
>>
>> Thank you.
>>
>> Jeff Zhang
>>
>

Reply via email to