Re: Profiling Hadoop Job

2012-04-18 Thread Leonardo Urbina
Sorry it took so long to respond, however that did solve it. Thanks!

On Thu, Mar 8, 2012 at 7:37 PM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 The JobClient is trying to download the profile output to the local
 directory. It seems like you don't have write permissions in the
 current working directory where you are running the JobClient. Please
 check that.

 HTH.

 +Vinod
 Hortonworks Inc.
 http://hortonworks.com/


 On Thu, Mar 8, 2012 at 3:13 PM, Leonardo Urbina lurb...@mit.edu wrote:
  Does anyone have any idea how to solve this problem? Regardless of
 whether
  I'm using plain HPROF or profiling through Starfish, I am getting the
 same
  error:
 
  Exception in thread main java.io.FileNotFoundException:
  attempt_201203071311_0004_m_
  00_0.profile (Permission denied)
 at java.io.FileOutputStream.open(Native Method)
 at java.io.FileOutputStream.init(FileOutputStream.java:194)
 at java.io.FileOutputStream.init(FileOutputStream.java:84)
 at
  org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226)
 at
 
 org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302)
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
 at
 
 com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at
 
 com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 
  But I can't find what permissions to change to fix this issue. Any ideas?
  Thanks in advance,
 
  Best,
  -Leo
 
 
  On Wed, Mar 7, 2012 at 3:52 PM, Leonardo Urbina lurb...@mit.edu wrote:
 
  Thanks,
  -Leo
 
 
  On Wed, Mar 7, 2012 at 3:47 PM, Jie Li ji...@cs.duke.edu wrote:
 
  Hi Leo,
 
  Thanks for pointing out the outdated README file.  Glad to tell you
 that
  we
  do support the old API in the latest version. See here:
 
  http://www.cs.duke.edu/starfish/previous.html
 
  Welcome to join our mailing list and your questions will reach more of
 our
  group members.
 
  Jie
 
  On Wed, Mar 7, 2012 at 3:37 PM, Leonardo Urbina lurb...@mit.edu
 wrote:
 
   Hi Jie,
  
   According to the Starfish README, the hadoop programs must be written
  using
   the new Hadoop API. This is not my case (I am using MultipleInputs
 among
   other non-new API supported features). Is there any way around this?
   Thanks,
  
   -Leo
  
   On Wed, Mar 7, 2012 at 3:19 PM, Jie Li ji...@cs.duke.edu wrote:
  
Hi Leonardo,
   
You might want to try Starfish which supports the memory profiling
 as
   well
as cpu/disk/network profiling for the performance tuning.
   
Jie
--
Starfish is an intelligent performance tuning tool for Hadoop.
Homepage: www.cs.duke.edu/starfish/
Mailing list: http://groups.google.com/group/hadoop-starfish
   
   
On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina lurb...@mit.edu
  wrote:
   
 Hello everyone,

 I have a Hadoop job that I run on several GBs of data that I am
  trying
   to
 optimize in order to reduce the memory consumption as well as
  improve
   the
 speed. I am following the steps outlined in Tom White's Hadoop:
 The
 Definitive Guide for profiling using HPROF (p161), by setting
 the
 following properties in the JobConf:

job.setProfileEnabled(true);


  job.setProfileParams(-agentlib:hprof=cpu=samples,heap=sites,depth=6,
   +
force=n,thread=y,verbose=n,file=%s);
job.setProfileTaskRange(true, 0-2);
job.setProfileTaskRange(false, 0-2);

 I am trying to run this locally on a single pseudo-distributed
  install
   of
 hadoop (0.20.2) and it gives the following error:

 Exception in thread main java.io.FileNotFoundException:
 attempt_201203071311_0004_m_00_0.profile (Permission denied)
at java.io.FileOutputStream.open(Native Method)
at
 java.io.FileOutputStream.init(FileOutputStream.java:194)
at
 java.io.FileOutputStream.init(FileOutputStream.java:84)
at

  org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226)
at

   
  
 
 org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302)
at
   org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
at


   
  
 
 com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89)
at
 org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

Re: Profiling Hadoop Job

2012-03-08 Thread Leonardo Urbina
Does anyone have any idea how to solve this problem? Regardless of whether
I'm using plain HPROF or profiling through Starfish, I am getting the same
error:

Exception in thread main java.io.FileNotFoundException:
attempt_201203071311_0004_m_
00_0.profile (Permission denied)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.init(FileOutputStream.java:194)
at java.io.FileOutputStream.init(FileOutputStream.java:84)
at
org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226)
at
org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
at
com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

But I can't find what permissions to change to fix this issue. Any ideas?
Thanks in advance,

Best,
-Leo


On Wed, Mar 7, 2012 at 3:52 PM, Leonardo Urbina lurb...@mit.edu wrote:

 Thanks,
 -Leo


 On Wed, Mar 7, 2012 at 3:47 PM, Jie Li ji...@cs.duke.edu wrote:

 Hi Leo,

 Thanks for pointing out the outdated README file.  Glad to tell you that
 we
 do support the old API in the latest version. See here:

 http://www.cs.duke.edu/starfish/previous.html

 Welcome to join our mailing list and your questions will reach more of our
 group members.

 Jie

 On Wed, Mar 7, 2012 at 3:37 PM, Leonardo Urbina lurb...@mit.edu wrote:

  Hi Jie,
 
  According to the Starfish README, the hadoop programs must be written
 using
  the new Hadoop API. This is not my case (I am using MultipleInputs among
  other non-new API supported features). Is there any way around this?
  Thanks,
 
  -Leo
 
  On Wed, Mar 7, 2012 at 3:19 PM, Jie Li ji...@cs.duke.edu wrote:
 
   Hi Leonardo,
  
   You might want to try Starfish which supports the memory profiling as
  well
   as cpu/disk/network profiling for the performance tuning.
  
   Jie
   --
   Starfish is an intelligent performance tuning tool for Hadoop.
   Homepage: www.cs.duke.edu/starfish/
   Mailing list: http://groups.google.com/group/hadoop-starfish
  
  
   On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina lurb...@mit.edu
 wrote:
  
Hello everyone,
   
I have a Hadoop job that I run on several GBs of data that I am
 trying
  to
optimize in order to reduce the memory consumption as well as
 improve
  the
speed. I am following the steps outlined in Tom White's Hadoop: The
Definitive Guide for profiling using HPROF (p161), by setting the
following properties in the JobConf:
   
   job.setProfileEnabled(true);
   
   
 job.setProfileParams(-agentlib:hprof=cpu=samples,heap=sites,depth=6,
  +
   force=n,thread=y,verbose=n,file=%s);
   job.setProfileTaskRange(true, 0-2);
   job.setProfileTaskRange(false, 0-2);
   
I am trying to run this locally on a single pseudo-distributed
 install
  of
hadoop (0.20.2) and it gives the following error:
   
Exception in thread main java.io.FileNotFoundException:
attempt_201203071311_0004_m_00_0.profile (Permission denied)
   at java.io.FileOutputStream.open(Native Method)
   at java.io.FileOutputStream.init(FileOutputStream.java:194)
   at java.io.FileOutputStream.init(FileOutputStream.java:84)
   at
   
 org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226)
   at
   
  
 
 org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302)
   at
  org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
   at
   
   
  
 
 com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
   at
   
   
  
 
 com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
 Method)
   at
   
   
  
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at
   
   
  
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
   
However, I can access these logs directly from the tasktracker's
 logs
(through the web UI). For the sakes of  running this locally, I
 could
   just
   

Re: Profiling Hadoop Job

2012-03-08 Thread Mohit Anchlia
Can you check which user you are running this process as and compare it
with the ownership on the directory?

On Thu, Mar 8, 2012 at 3:13 PM, Leonardo Urbina lurb...@mit.edu wrote:

 Does anyone have any idea how to solve this problem? Regardless of whether
 I'm using plain HPROF or profiling through Starfish, I am getting the same
 error:

 Exception in thread main java.io.FileNotFoundException:
 attempt_201203071311_0004_m_
 00_0.profile (Permission denied)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.init(FileOutputStream.java:194)
at java.io.FileOutputStream.init(FileOutputStream.java:84)
at
 org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226)
at
 org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
at

 com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at

 com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

 But I can't find what permissions to change to fix this issue. Any ideas?
 Thanks in advance,

 Best,
 -Leo


  On Wed, Mar 7, 2012 at 3:52 PM, Leonardo Urbina lurb...@mit.edu wrote:

  Thanks,
  -Leo
 
 
  On Wed, Mar 7, 2012 at 3:47 PM, Jie Li ji...@cs.duke.edu wrote:
 
  Hi Leo,
 
  Thanks for pointing out the outdated README file.  Glad to tell you that
  we
  do support the old API in the latest version. See here:
 
  http://www.cs.duke.edu/starfish/previous.html
 
  Welcome to join our mailing list and your questions will reach more of
 our
  group members.
 
  Jie
 
  On Wed, Mar 7, 2012 at 3:37 PM, Leonardo Urbina lurb...@mit.edu
 wrote:
 
   Hi Jie,
  
   According to the Starfish README, the hadoop programs must be written
  using
   the new Hadoop API. This is not my case (I am using MultipleInputs
 among
   other non-new API supported features). Is there any way around this?
   Thanks,
  
   -Leo
  
   On Wed, Mar 7, 2012 at 3:19 PM, Jie Li ji...@cs.duke.edu wrote:
  
Hi Leonardo,
   
You might want to try Starfish which supports the memory profiling
 as
   well
as cpu/disk/network profiling for the performance tuning.
   
Jie
--
Starfish is an intelligent performance tuning tool for Hadoop.
Homepage: www.cs.duke.edu/starfish/
Mailing list: http://groups.google.com/group/hadoop-starfish
   
   
On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina lurb...@mit.edu
  wrote:
   
 Hello everyone,

 I have a Hadoop job that I run on several GBs of data that I am
  trying
   to
 optimize in order to reduce the memory consumption as well as
  improve
   the
 speed. I am following the steps outlined in Tom White's Hadoop:
 The
 Definitive Guide for profiling using HPROF (p161), by setting the
 following properties in the JobConf:

job.setProfileEnabled(true);


  job.setProfileParams(-agentlib:hprof=cpu=samples,heap=sites,depth=6,
   +
force=n,thread=y,verbose=n,file=%s);
job.setProfileTaskRange(true, 0-2);
job.setProfileTaskRange(false, 0-2);

 I am trying to run this locally on a single pseudo-distributed
  install
   of
 hadoop (0.20.2) and it gives the following error:

 Exception in thread main java.io.FileNotFoundException:
 attempt_201203071311_0004_m_00_0.profile (Permission denied)
at java.io.FileOutputStream.open(Native Method)
at
 java.io.FileOutputStream.init(FileOutputStream.java:194)
at
 java.io.FileOutputStream.init(FileOutputStream.java:84)
at

  org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226)
at

   
  
 
 org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302)
at
   org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
at


   
  
 
 com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89)
at
 org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at


   
  
 
 com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
  Method)
at


   
  
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at


   
  
 
 

Profiling Hadoop Job

2012-03-07 Thread Leonardo Urbina
Hello everyone,

I have a Hadoop job that I run on several GBs of data that I am trying to
optimize in order to reduce the memory consumption as well as improve the
speed. I am following the steps outlined in Tom White's Hadoop: The
Definitive Guide for profiling using HPROF (p161), by setting the
following properties in the JobConf:

job.setProfileEnabled(true);

job.setProfileParams(-agentlib:hprof=cpu=samples,heap=sites,depth=6, +
force=n,thread=y,verbose=n,file=%s);
job.setProfileTaskRange(true, 0-2);
job.setProfileTaskRange(false, 0-2);

I am trying to run this locally on a single pseudo-distributed install of
hadoop (0.20.2) and it gives the following error:

Exception in thread main java.io.FileNotFoundException:
attempt_201203071311_0004_m_00_0.profile (Permission denied)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.init(FileOutputStream.java:194)
at java.io.FileOutputStream.init(FileOutputStream.java:84)
at
org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226)
at
org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
at
com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

However, I can access these logs directly from the tasktracker's logs
(through the web UI). For the sakes of  running this locally, I could just
ignore this error, however I want to be able to profile the job once
deployed to our hadoop cluster and need to be able to automatically
retrieve these logs. Do I need to change the permissions in HDFS to allow
for this? Any ideas on how to fix this? Thanks in advance,

Best,
-Leo

-- 
Leo Urbina
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Department of Mathematics
lurb...@mit.edu


Re: Profiling Hadoop Job

2012-03-07 Thread Jie Li
Hi Leonardo,

You might want to try Starfish which supports the memory profiling as well
as cpu/disk/network profiling for the performance tuning.

Jie
--
Starfish is an intelligent performance tuning tool for Hadoop.
Homepage: www.cs.duke.edu/starfish/
Mailing list: http://groups.google.com/group/hadoop-starfish


On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina lurb...@mit.edu wrote:

 Hello everyone,

 I have a Hadoop job that I run on several GBs of data that I am trying to
 optimize in order to reduce the memory consumption as well as improve the
 speed. I am following the steps outlined in Tom White's Hadoop: The
 Definitive Guide for profiling using HPROF (p161), by setting the
 following properties in the JobConf:

job.setProfileEnabled(true);

 job.setProfileParams(-agentlib:hprof=cpu=samples,heap=sites,depth=6, +
force=n,thread=y,verbose=n,file=%s);
job.setProfileTaskRange(true, 0-2);
job.setProfileTaskRange(false, 0-2);

 I am trying to run this locally on a single pseudo-distributed install of
 hadoop (0.20.2) and it gives the following error:

 Exception in thread main java.io.FileNotFoundException:
 attempt_201203071311_0004_m_00_0.profile (Permission denied)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.init(FileOutputStream.java:194)
at java.io.FileOutputStream.init(FileOutputStream.java:84)
at
 org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226)
at
 org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
at

 com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at

 com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

 However, I can access these logs directly from the tasktracker's logs
 (through the web UI). For the sakes of  running this locally, I could just
 ignore this error, however I want to be able to profile the job once
 deployed to our hadoop cluster and need to be able to automatically
 retrieve these logs. Do I need to change the permissions in HDFS to allow
 for this? Any ideas on how to fix this? Thanks in advance,

 Best,
 -Leo

 --
 Leo Urbina
 Massachusetts Institute of Technology
 Department of Electrical Engineering and Computer Science
 Department of Mathematics
 lurb...@mit.edu



Re: Profiling Hadoop Job

2012-03-07 Thread Leonardo Urbina
Hi Jie,

According to the Starfish README, the hadoop programs must be written using
the new Hadoop API. This is not my case (I am using MultipleInputs among
other non-new API supported features). Is there any way around this? Thanks,

-Leo

On Wed, Mar 7, 2012 at 3:19 PM, Jie Li ji...@cs.duke.edu wrote:

 Hi Leonardo,

 You might want to try Starfish which supports the memory profiling as well
 as cpu/disk/network profiling for the performance tuning.

 Jie
 --
 Starfish is an intelligent performance tuning tool for Hadoop.
 Homepage: www.cs.duke.edu/starfish/
 Mailing list: http://groups.google.com/group/hadoop-starfish


 On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina lurb...@mit.edu wrote:

  Hello everyone,
 
  I have a Hadoop job that I run on several GBs of data that I am trying to
  optimize in order to reduce the memory consumption as well as improve the
  speed. I am following the steps outlined in Tom White's Hadoop: The
  Definitive Guide for profiling using HPROF (p161), by setting the
  following properties in the JobConf:
 
 job.setProfileEnabled(true);
 
  job.setProfileParams(-agentlib:hprof=cpu=samples,heap=sites,depth=6, +
 force=n,thread=y,verbose=n,file=%s);
 job.setProfileTaskRange(true, 0-2);
 job.setProfileTaskRange(false, 0-2);
 
  I am trying to run this locally on a single pseudo-distributed install of
  hadoop (0.20.2) and it gives the following error:
 
  Exception in thread main java.io.FileNotFoundException:
  attempt_201203071311_0004_m_00_0.profile (Permission denied)
 at java.io.FileOutputStream.open(Native Method)
 at java.io.FileOutputStream.init(FileOutputStream.java:194)
 at java.io.FileOutputStream.init(FileOutputStream.java:84)
 at
  org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226)
 at
 
 org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302)
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
 at
 
 
 com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at
 
 
 com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 
  However, I can access these logs directly from the tasktracker's logs
  (through the web UI). For the sakes of  running this locally, I could
 just
  ignore this error, however I want to be able to profile the job once
  deployed to our hadoop cluster and need to be able to automatically
  retrieve these logs. Do I need to change the permissions in HDFS to allow
  for this? Any ideas on how to fix this? Thanks in advance,
 
  Best,
  -Leo
 
  --
  Leo Urbina
  Massachusetts Institute of Technology
  Department of Electrical Engineering and Computer Science
  Department of Mathematics
  lurb...@mit.edu
 




-- 
Leo Urbina
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Department of Mathematics
lurb...@mit.edu


Re: Profiling Hadoop Job

2012-03-07 Thread Jie Li
Hi Leo,

Thanks for pointing out the outdated README file.  Glad to tell you that we
do support the old API in the latest version. See here:

http://www.cs.duke.edu/starfish/previous.html

Welcome to join our mailing list and your questions will reach more of our
group members.

Jie

On Wed, Mar 7, 2012 at 3:37 PM, Leonardo Urbina lurb...@mit.edu wrote:

 Hi Jie,

 According to the Starfish README, the hadoop programs must be written using
 the new Hadoop API. This is not my case (I am using MultipleInputs among
 other non-new API supported features). Is there any way around this?
 Thanks,

 -Leo

 On Wed, Mar 7, 2012 at 3:19 PM, Jie Li ji...@cs.duke.edu wrote:

  Hi Leonardo,
 
  You might want to try Starfish which supports the memory profiling as
 well
  as cpu/disk/network profiling for the performance tuning.
 
  Jie
  --
  Starfish is an intelligent performance tuning tool for Hadoop.
  Homepage: www.cs.duke.edu/starfish/
  Mailing list: http://groups.google.com/group/hadoop-starfish
 
 
  On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina lurb...@mit.edu wrote:
 
   Hello everyone,
  
   I have a Hadoop job that I run on several GBs of data that I am trying
 to
   optimize in order to reduce the memory consumption as well as improve
 the
   speed. I am following the steps outlined in Tom White's Hadoop: The
   Definitive Guide for profiling using HPROF (p161), by setting the
   following properties in the JobConf:
  
  job.setProfileEnabled(true);
  
   job.setProfileParams(-agentlib:hprof=cpu=samples,heap=sites,depth=6,
 +
  force=n,thread=y,verbose=n,file=%s);
  job.setProfileTaskRange(true, 0-2);
  job.setProfileTaskRange(false, 0-2);
  
   I am trying to run this locally on a single pseudo-distributed install
 of
   hadoop (0.20.2) and it gives the following error:
  
   Exception in thread main java.io.FileNotFoundException:
   attempt_201203071311_0004_m_00_0.profile (Permission denied)
  at java.io.FileOutputStream.open(Native Method)
  at java.io.FileOutputStream.init(FileOutputStream.java:194)
  at java.io.FileOutputStream.init(FileOutputStream.java:84)
  at
   org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226)
  at
  
 
 org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302)
  at
 org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
  at
  
  
 
 com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
  at
  
  
 
 com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
  
  
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
  
  
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
  
   However, I can access these logs directly from the tasktracker's logs
   (through the web UI). For the sakes of  running this locally, I could
  just
   ignore this error, however I want to be able to profile the job once
   deployed to our hadoop cluster and need to be able to automatically
   retrieve these logs. Do I need to change the permissions in HDFS to
 allow
   for this? Any ideas on how to fix this? Thanks in advance,
  
   Best,
   -Leo
  
   --
   Leo Urbina
   Massachusetts Institute of Technology
   Department of Electrical Engineering and Computer Science
   Department of Mathematics
   lurb...@mit.edu
  
 



 --
 Leo Urbina
 Massachusetts Institute of Technology
 Department of Electrical Engineering and Computer Science
 Department of Mathematics
 lurb...@mit.edu



Re: Profiling Hadoop Job

2012-03-07 Thread Leonardo Urbina
Thanks,
-Leo

On Wed, Mar 7, 2012 at 3:47 PM, Jie Li ji...@cs.duke.edu wrote:

 Hi Leo,

 Thanks for pointing out the outdated README file.  Glad to tell you that we
 do support the old API in the latest version. See here:

 http://www.cs.duke.edu/starfish/previous.html

 Welcome to join our mailing list and your questions will reach more of our
 group members.

 Jie

 On Wed, Mar 7, 2012 at 3:37 PM, Leonardo Urbina lurb...@mit.edu wrote:

  Hi Jie,
 
  According to the Starfish README, the hadoop programs must be written
 using
  the new Hadoop API. This is not my case (I am using MultipleInputs among
  other non-new API supported features). Is there any way around this?
  Thanks,
 
  -Leo
 
  On Wed, Mar 7, 2012 at 3:19 PM, Jie Li ji...@cs.duke.edu wrote:
 
   Hi Leonardo,
  
   You might want to try Starfish which supports the memory profiling as
  well
   as cpu/disk/network profiling for the performance tuning.
  
   Jie
   --
   Starfish is an intelligent performance tuning tool for Hadoop.
   Homepage: www.cs.duke.edu/starfish/
   Mailing list: http://groups.google.com/group/hadoop-starfish
  
  
   On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina lurb...@mit.edu
 wrote:
  
Hello everyone,
   
I have a Hadoop job that I run on several GBs of data that I am
 trying
  to
optimize in order to reduce the memory consumption as well as improve
  the
speed. I am following the steps outlined in Tom White's Hadoop: The
Definitive Guide for profiling using HPROF (p161), by setting the
following properties in the JobConf:
   
   job.setProfileEnabled(true);
   
   
 job.setProfileParams(-agentlib:hprof=cpu=samples,heap=sites,depth=6,
  +
   force=n,thread=y,verbose=n,file=%s);
   job.setProfileTaskRange(true, 0-2);
   job.setProfileTaskRange(false, 0-2);
   
I am trying to run this locally on a single pseudo-distributed
 install
  of
hadoop (0.20.2) and it gives the following error:
   
Exception in thread main java.io.FileNotFoundException:
attempt_201203071311_0004_m_00_0.profile (Permission denied)
   at java.io.FileOutputStream.open(Native Method)
   at java.io.FileOutputStream.init(FileOutputStream.java:194)
   at java.io.FileOutputStream.init(FileOutputStream.java:84)
   at
   
 org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226)
   at
   
  
 
 org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302)
   at
  org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
   at
   
   
  
 
 com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
   at
   
   
  
 
 com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
   
   
  
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at
   
   
  
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
   
However, I can access these logs directly from the tasktracker's logs
(through the web UI). For the sakes of  running this locally, I could
   just
ignore this error, however I want to be able to profile the job once
deployed to our hadoop cluster and need to be able to automatically
retrieve these logs. Do I need to change the permissions in HDFS to
  allow
for this? Any ideas on how to fix this? Thanks in advance,
   
Best,
-Leo
   
--
Leo Urbina
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Department of Mathematics
lurb...@mit.edu
   
  
 
 
 
  --
  Leo Urbina
  Massachusetts Institute of Technology
  Department of Electrical Engineering and Computer Science
  Department of Mathematics
  lurb...@mit.edu
 




-- 
Leo Urbina
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Department of Mathematics
lurb...@mit.edu