Right.

Realistic benchmarks are what we really don't have yet.

Care to write a simple clustering benchmark?

On Sat, Nov 21, 2009 at 10:56 PM, Jeff Zhang <[email protected]> wrote:

> Maybe benchmark is what I like to know accurately,
>
> Just like hadoop has a benchmark that it can sort 1TB data in 62 seconds,
> so
> the same, how much time will it take mahout's bayes algorithms to train a
> model using data like 1GB?
>
>
> Thank you
>
> Jeff Zhang
>
>
> ---------- Forwarded message ----------
> From: Sean Owen <[email protected]>
> Date: Sat, Nov 21, 2009 at 10:44 PM
> Subject: Re: Is there performance comparison document ?
> To: [email protected]
>
>
> I think we can already state the answer though: it's going to take
> much more CPU time and resources to run a computation via Hadoop than
> run it completely on one machine (non-parallelized). Hadoop is a lot
> of overhead.
>
> However some problems are too big to fit on one machine, so you have
> to parallelize with Hadoop. In that case, there is no comparison --
> you can't run it without Hadoop.
>
> Also, parallelizing means you can finish the computation in fewer
> wall-clock seconds. It'll take more CPU-seconds though. But then the
> Hadoop runtime is just a function of how many machines you throw at it
> and how parallelizable it is, so it's not much of a comparison.
>
> Are you wondering how much the overhead is, of a framework like Hadoop?
>
> On Sun, Nov 22, 2009 at 6:30 AM, Jeff Zhang <[email protected]> wrote:
> > Hi all,,
> >
> > Since mahout is build upon hadoop, so is there any performance comparison
> > between the algorithms using hadoop and without using hadoop. ?
> >
> > Thank you.
> >
> > Jeff Zhang
> >
>



-- 
Ted Dunning, CTO
DeepDyve

Reply via email to