Re: Hadoop Profiling!

2008-10-10 Thread Ariel Rabkin
That code is in, unfortunately it doesn't quite solve the problem;
you'd need to do some more work.  You'd have to write subclasses that
spit out the statistics you want.  Then set the appropriate options in
hadoop-site, so that those classes get loaded.

On Wed, Oct 8, 2008 at 12:30 PM, George Porter [EMAIL PROTECTED] wrote:
 Hi Ashish,

 I believe that Ari committed two instrumentation classes,
 TaskTrackerInstrumentation and JobTrackerInstrumentation, (both in
 src/mapred/org/apache/hadoop/mapred) that can give you information on when
 components of your M/R jobs start and stop.  I'm in the process of writing
 some additional instrumentation APIs that collect timing information about
 the RPC and HDFS layers, and will hopefully be able to submit a patch in a
 few weeks.

 Thanks,
 George

 Ashish Venugopal wrote:

 Are you interested in simply profiling your own code (in which case you
 can
 clearly use what ever java profiler you want), or your construction of the
 MapReduce job, ie  how much time is being spent in the Map vs the sort vs
 the shuffle vs the Reduce. I am not aware of a good solution to the second
 problem, can anyone comment?

 Ashish

 On Wed, Oct 8, 2008 at 12:06 PM, Stefan Groschupf [EMAIL PROTECTED] wrote:



 Just run your map reduce job local and connect your profiler. I use
 yourkit.
 Works great!
 You can profile your map reduce job running the job in local mode as ant
 other java app as well.
 However we also profiled in a grid. You just need to install the yourkit
 agent into the jvm of the node you want to profile and than you connect
 to
 the node when the job runs.
 However you need to time things well, since the task jvm is shutdown as
 soon your job is done.
 Stefan

 ~~~
 101tec Inc., Menlo Park, California
 web:  http://www.101tec.com
 blog: http://www.find23.net




 On Oct 8, 2008, at 11:27 AM, Gerardo Velez wrote:

  Hi!


 I've developed a Map/Reduce algorithm to analyze some logs from web
 application.

 So basically, we are ready to start QA test phase, so now, I would like
 to
 now how efficient is my application
 from performance point of view.

 So is there any procedure I could use to do some profiling?


 Basically I need basi data, like time excecution or code bottlenecks.


 Thanks in advance.

 -- Gerardo Velez







 --
 George Porter, Sun Labs/CTO
 Sun Microsystems - San Diego, Calif.
 [EMAIL PROTECTED] 1.858.526.9328





-- 
Ari Rabkin [EMAIL PROTECTED]
UC Berkeley Computer Science Department


Hadoop Profiling!

2008-10-08 Thread Gerardo Velez
Hi!

I've developed a Map/Reduce algorithm to analyze some logs from web
application.

So basically, we are ready to start QA test phase, so now, I would like to
now how efficient is my application
from performance point of view.

So is there any procedure I could use to do some profiling?


Basically I need basi data, like time excecution or code bottlenecks.


Thanks in advance.

-- Gerardo Velez


Re: Hadoop Profiling!

2008-10-08 Thread Stefan Groschupf
Just run your map reduce job local and connect your profiler. I use  
yourkit.

Works great!
You can profile your map reduce job running the job in local mode as  
ant other java app as well.
However we also profiled in a grid. You just need to install the  
yourkit agent into the jvm of the node you want to profile and than  
you connect to the node when the job runs.
However you need to time things well, since the task jvm is shutdown  
as soon your job is done.

Stefan

~~~
101tec Inc., Menlo Park, California
web:  http://www.101tec.com
blog: http://www.find23.net



On Oct 8, 2008, at 11:27 AM, Gerardo Velez wrote:


Hi!

I've developed a Map/Reduce algorithm to analyze some logs from web
application.

So basically, we are ready to start QA test phase, so now, I would  
like to

now how efficient is my application
from performance point of view.

So is there any procedure I could use to do some profiling?


Basically I need basi data, like time excecution or code bottlenecks.


Thanks in advance.

-- Gerardo Velez




Re: Hadoop Profiling!

2008-10-08 Thread Ashish Venugopal
Are you interested in simply profiling your own code (in which case you can
clearly use what ever java profiler you want), or your construction of the
MapReduce job, ie  how much time is being spent in the Map vs the sort vs
the shuffle vs the Reduce. I am not aware of a good solution to the second
problem, can anyone comment?

Ashish

On Wed, Oct 8, 2008 at 12:06 PM, Stefan Groschupf [EMAIL PROTECTED] wrote:

 Just run your map reduce job local and connect your profiler. I use
 yourkit.
 Works great!
 You can profile your map reduce job running the job in local mode as ant
 other java app as well.
 However we also profiled in a grid. You just need to install the yourkit
 agent into the jvm of the node you want to profile and than you connect to
 the node when the job runs.
 However you need to time things well, since the task jvm is shutdown as
 soon your job is done.
 Stefan

 ~~~
 101tec Inc., Menlo Park, California
 web:  http://www.101tec.com
 blog: http://www.find23.net




 On Oct 8, 2008, at 11:27 AM, Gerardo Velez wrote:

  Hi!

 I've developed a Map/Reduce algorithm to analyze some logs from web
 application.

 So basically, we are ready to start QA test phase, so now, I would like to
 now how efficient is my application
 from performance point of view.

 So is there any procedure I could use to do some profiling?


 Basically I need basi data, like time excecution or code bottlenecks.


 Thanks in advance.

 -- Gerardo Velez





Re: Hadoop Profiling!

2008-10-08 Thread Ashish Venugopal
Great, thanks for this info, is there any chance that this information can
also be exposed for streaming jobs as well?
(All of the jobs that we run in our lab are only via streaming...)

Thanks!

Ashish

On Wed, Oct 8, 2008 at 12:30 PM, George Porter [EMAIL PROTECTED]wrote:

 Hi Ashish,

 I believe that Ari committed two instrumentation classes,
 TaskTrackerInstrumentation and JobTrackerInstrumentation, (both in
 src/mapred/org/apache/hadoop/mapred) that can give you information on when
 components of your M/R jobs start and stop.  I'm in the process of writing
 some additional instrumentation APIs that collect timing information about
 the RPC and HDFS layers, and will hopefully be able to submit a patch in a
 few weeks.

 Thanks,
 George


 Ashish Venugopal wrote:

 Are you interested in simply profiling your own code (in which case you
 can
 clearly use what ever java profiler you want), or your construction of the
 MapReduce job, ie  how much time is being spent in the Map vs the sort vs
 the shuffle vs the Reduce. I am not aware of a good solution to the second
 problem, can anyone comment?

 Ashish

 On Wed, Oct 8, 2008 at 12:06 PM, Stefan Groschupf [EMAIL PROTECTED] wrote:



 Just run your map reduce job local and connect your profiler. I use
 yourkit.
 Works great!
 You can profile your map reduce job running the job in local mode as ant
 other java app as well.
 However we also profiled in a grid. You just need to install the yourkit
 agent into the jvm of the node you want to profile and than you connect
 to
 the node when the job runs.
 However you need to time things well, since the task jvm is shutdown as
 soon your job is done.
 Stefan

 ~~~
 101tec Inc., Menlo Park, California
 web:  http://www.101tec.com
 blog: http://www.find23.net




 On Oct 8, 2008, at 11:27 AM, Gerardo Velez wrote:

  Hi!


 I've developed a Map/Reduce algorithm to analyze some logs from web
 application.

 So basically, we are ready to start QA test phase, so now, I would like
 to
 now how efficient is my application
 from performance point of view.

 So is there any procedure I could use to do some profiling?


 Basically I need basi data, like time excecution or code bottlenecks.


 Thanks in advance.

 -- Gerardo Velez









 --
 George Porter, Sun Labs/CTO
 Sun Microsystems - San Diego, Calif.
 [EMAIL PROTECTED] 1.858.526.9328