Re: Server sizing Hadoop + Mahout

2013-02-02 Thread Sean Owen
The problem with this POV is that it assumes it's obvious what the
right outcome is. With a transaction test or a disk write test or big
sort, it's obvious and you can make a benchmark. With ML, it's not
even close.

For example, I can make you a recommender that is literally as fast as
you like by picking any random set of items. Classifiers can likewise
do so by randomly picking a class. Specifying even a desired answer
isn't useful, since then you are just selecting a process that picks a
particular answer on a particular data set.

I don't think that works, since the classic idea of benchmark is not
well-defined here, but you're welcome to go create and run whatever
tests you like.

On Sat, Feb 2, 2013 at 3:19 PM, jordi  wrote:
> Hi Sean! First of all, thanks for your reply!
> I do agree that it's very complicated to do the sizing of an environment since
> there are many variables that should be considerated. You have mentioned some 
> of
> them: the algorithm, the distribution of data, the amount of data, type of
> hardware, etc.
> But I dont agree that it's impossible to give a baseline.
> Maybe should be a great idea for the Mahout+Hadoop community to take a look to
> this guys (Standard Performance Evaluation Corporation, http://www.spec.org/).
> They run the same benchmark on different types of architectures, establishing
> empirically a baseline that can be used as a good start point to do a capacity
> planning.
> They have a lot of benchmarks depending on CPU, Java Client Server, etc.
> Obviously, thats only a start point: before your software goes live to
> production mode, it's desirable to benchmark again your software running a
> load-test, adequating your infraestructure depending on performance results.
>
>


Re: Server sizing Hadoop + Mahout

2013-02-02 Thread jordi
Sean Owen  gmail.com> writes:

> 
> You haven't even said what algorithm. It even depends on the distribution
> of your data, in addition to amount, not to mention the type of servers,
> configuration, etc. It's impossible to give a meaningful baseline. You can
> run your real data on a real cluster to get some notion. Run-time and
> requirements generally scale up linearly.
> 
> On Wed, May 30, 2012 at 10:32 AM, jcuencaa
>  everis.com>wrote:
> 
> > Hello!
> > I need to do a capacity planning or a server sizing for a Mahout + Hadoop
> > server, it means, plan how many servers and hardware (CPU, memory, etc.) do
> > I need to accomplish with the maximum amount of work that my organization
> > requires in a given period.
> > I haven’t found documentation regarding to this in the Mahout or Hadoop
> > site
> > or, at least, which things should be taken into account for doing the
> > server
> > sizing. It’s obvious that sizing depends on many factors but, in example,
> > in
> > Application servers or Web Servers normally sizing is done inferring
> > hardware needs using some benchmarks as a baseline.
> > So I’d be pleased if someone can help me.
> > Thanks in advance.
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Server-sizing-Hadoop-Mahout-tp3986807.html
> > Sent from the Mahout User List mailing list archive at Nabble.com.
> >
> 

Hi Sean! First of all, thanks for your reply!
I do agree that it's very complicated to do the sizing of an environment since
there are many variables that should be considerated. You have mentioned some of
them: the algorithm, the distribution of data, the amount of data, type of
hardware, etc.
But I dont agree that it's impossible to give a baseline. 
Maybe should be a great idea for the Mahout+Hadoop community to take a look to
this guys (Standard Performance Evaluation Corporation, http://www.spec.org/).
They run the same benchmark on different types of architectures, establishing
empirically a baseline that can be used as a good start point to do a capacity
planning. 
They have a lot of benchmarks depending on CPU, Java Client Server, etc.
Obviously, thats only a start point: before your software goes live to
production mode, it's desirable to benchmark again your software running a
load-test, adequating your infraestructure depending on performance results.




Re: Server sizing Hadoop + Mahout

2012-05-30 Thread Sean Owen
You haven't even said what algorithm. It even depends on the distribution
of your data, in addition to amount, not to mention the type of servers,
configuration, etc. It's impossible to give a meaningful baseline. You can
run your real data on a real cluster to get some notion. Run-time and
requirements generally scale up linearly.

On Wed, May 30, 2012 at 10:32 AM, jcuencaa
wrote:

> Hello!
> I need to do a capacity planning or a server sizing for a Mahout + Hadoop
> server, it means, plan how many servers and hardware (CPU, memory, etc.) do
> I need to accomplish with the maximum amount of work that my organization
> requires in a given period.
> I haven’t found documentation regarding to this in the Mahout or Hadoop
> site
> or, at least, which things should be taken into account for doing the
> server
> sizing. It’s obvious that sizing depends on many factors but, in example,
> in
> Application servers or Web Servers normally sizing is done inferring
> hardware needs using some benchmarks as a baseline.
> So I’d be pleased if someone can help me.
> Thanks in advance.
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Server-sizing-Hadoop-Mahout-tp3986807.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>


Server sizing Hadoop + Mahout

2012-05-30 Thread jcuencaa
Hello! 
I need to do a capacity planning or a server sizing for a Mahout + Hadoop
server, it means, plan how many servers and hardware (CPU, memory, etc.) do
I need to accomplish with the maximum amount of work that my organization
requires in a given period. 
I haven’t found documentation regarding to this in the Mahout or Hadoop site
or, at least, which things should be taken into account for doing the server
sizing. It’s obvious that sizing depends on many factors but, in example, in
Application servers or Web Servers normally sizing is done inferring
hardware needs using some benchmarks as a baseline.
So I’d be pleased if someone can help me. 
Thanks in advance.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Server-sizing-Hadoop-Mahout-tp3986807.html
Sent from the Mahout User List mailing list archive at Nabble.com.