Right. Realistic benchmarks are what we really don't have yet.
Care to write a simple clustering benchmark? On Sat, Nov 21, 2009 at 10:56 PM, Jeff Zhang <[email protected]> wrote: > Maybe benchmark is what I like to know accurately, > > Just like hadoop has a benchmark that it can sort 1TB data in 62 seconds, > so > the same, how much time will it take mahout's bayes algorithms to train a > model using data like 1GB? > > > Thank you > > Jeff Zhang > > > ---------- Forwarded message ---------- > From: Sean Owen <[email protected]> > Date: Sat, Nov 21, 2009 at 10:44 PM > Subject: Re: Is there performance comparison document ? > To: [email protected] > > > I think we can already state the answer though: it's going to take > much more CPU time and resources to run a computation via Hadoop than > run it completely on one machine (non-parallelized). Hadoop is a lot > of overhead. > > However some problems are too big to fit on one machine, so you have > to parallelize with Hadoop. In that case, there is no comparison -- > you can't run it without Hadoop. > > Also, parallelizing means you can finish the computation in fewer > wall-clock seconds. It'll take more CPU-seconds though. But then the > Hadoop runtime is just a function of how many machines you throw at it > and how parallelizable it is, so it's not much of a comparison. > > Are you wondering how much the overhead is, of a framework like Hadoop? > > On Sun, Nov 22, 2009 at 6:30 AM, Jeff Zhang <[email protected]> wrote: > > Hi all,, > > > > Since mahout is build upon hadoop, so is there any performance comparison > > between the algorithms using hadoop and without using hadoop. ? > > > > Thank you. > > > > Jeff Zhang > > > -- Ted Dunning, CTO DeepDyve
