Re: Profiling Hadoop Job
Sorry it took so long to respond, however that did solve it. Thanks! On Thu, Mar 8, 2012 at 7:37 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: The JobClient is trying to download the profile output to the local directory. It seems like you don't have write permissions in the current working directory where you are running the JobClient. Please check that. HTH. +Vinod Hortonworks Inc. http://hortonworks.com/ On Thu, Mar 8, 2012 at 3:13 PM, Leonardo Urbina lurb...@mit.edu wrote: Does anyone have any idea how to solve this problem? Regardless of whether I'm using plain HPROF or profiling through Starfish, I am getting the same error: Exception in thread main java.io.FileNotFoundException: attempt_201203071311_0004_m_ 00_0.profile (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:194) at java.io.FileOutputStream.init(FileOutputStream.java:84) at org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) at com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) But I can't find what permissions to change to fix this issue. Any ideas? Thanks in advance, Best, -Leo On Wed, Mar 7, 2012 at 3:52 PM, Leonardo Urbina lurb...@mit.edu wrote: Thanks, -Leo On Wed, Mar 7, 2012 at 3:47 PM, Jie Li ji...@cs.duke.edu wrote: Hi Leo, Thanks for pointing out the outdated README file. Glad to tell you that we do support the old API in the latest version. See here: http://www.cs.duke.edu/starfish/previous.html Welcome to join our mailing list and your questions will reach more of our group members. Jie On Wed, Mar 7, 2012 at 3:37 PM, Leonardo Urbina lurb...@mit.edu wrote: Hi Jie, According to the Starfish README, the hadoop programs must be written using the new Hadoop API. This is not my case (I am using MultipleInputs among other non-new API supported features). Is there any way around this? Thanks, -Leo On Wed, Mar 7, 2012 at 3:19 PM, Jie Li ji...@cs.duke.edu wrote: Hi Leonardo, You might want to try Starfish which supports the memory profiling as well as cpu/disk/network profiling for the performance tuning. Jie -- Starfish is an intelligent performance tuning tool for Hadoop. Homepage: www.cs.duke.edu/starfish/ Mailing list: http://groups.google.com/group/hadoop-starfish On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina lurb...@mit.edu wrote: Hello everyone, I have a Hadoop job that I run on several GBs of data that I am trying to optimize in order to reduce the memory consumption as well as improve the speed. I am following the steps outlined in Tom White's Hadoop: The Definitive Guide for profiling using HPROF (p161), by setting the following properties in the JobConf: job.setProfileEnabled(true); job.setProfileParams(-agentlib:hprof=cpu=samples,heap=sites,depth=6, + force=n,thread=y,verbose=n,file=%s); job.setProfileTaskRange(true, 0-2); job.setProfileTaskRange(false, 0-2); I am trying to run this locally on a single pseudo-distributed install of hadoop (0.20.2) and it gives the following error: Exception in thread main java.io.FileNotFoundException: attempt_201203071311_0004_m_00_0.profile (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:194) at java.io.FileOutputStream.init(FileOutputStream.java:84) at org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) at com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
Re: Profiling Hadoop Job
Does anyone have any idea how to solve this problem? Regardless of whether I'm using plain HPROF or profiling through Starfish, I am getting the same error: Exception in thread main java.io.FileNotFoundException: attempt_201203071311_0004_m_ 00_0.profile (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:194) at java.io.FileOutputStream.init(FileOutputStream.java:84) at org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) at com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) But I can't find what permissions to change to fix this issue. Any ideas? Thanks in advance, Best, -Leo On Wed, Mar 7, 2012 at 3:52 PM, Leonardo Urbina lurb...@mit.edu wrote: Thanks, -Leo On Wed, Mar 7, 2012 at 3:47 PM, Jie Li ji...@cs.duke.edu wrote: Hi Leo, Thanks for pointing out the outdated README file. Glad to tell you that we do support the old API in the latest version. See here: http://www.cs.duke.edu/starfish/previous.html Welcome to join our mailing list and your questions will reach more of our group members. Jie On Wed, Mar 7, 2012 at 3:37 PM, Leonardo Urbina lurb...@mit.edu wrote: Hi Jie, According to the Starfish README, the hadoop programs must be written using the new Hadoop API. This is not my case (I am using MultipleInputs among other non-new API supported features). Is there any way around this? Thanks, -Leo On Wed, Mar 7, 2012 at 3:19 PM, Jie Li ji...@cs.duke.edu wrote: Hi Leonardo, You might want to try Starfish which supports the memory profiling as well as cpu/disk/network profiling for the performance tuning. Jie -- Starfish is an intelligent performance tuning tool for Hadoop. Homepage: www.cs.duke.edu/starfish/ Mailing list: http://groups.google.com/group/hadoop-starfish On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina lurb...@mit.edu wrote: Hello everyone, I have a Hadoop job that I run on several GBs of data that I am trying to optimize in order to reduce the memory consumption as well as improve the speed. I am following the steps outlined in Tom White's Hadoop: The Definitive Guide for profiling using HPROF (p161), by setting the following properties in the JobConf: job.setProfileEnabled(true); job.setProfileParams(-agentlib:hprof=cpu=samples,heap=sites,depth=6, + force=n,thread=y,verbose=n,file=%s); job.setProfileTaskRange(true, 0-2); job.setProfileTaskRange(false, 0-2); I am trying to run this locally on a single pseudo-distributed install of hadoop (0.20.2) and it gives the following error: Exception in thread main java.io.FileNotFoundException: attempt_201203071311_0004_m_00_0.profile (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:194) at java.io.FileOutputStream.init(FileOutputStream.java:84) at org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) at com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) However, I can access these logs directly from the tasktracker's logs (through the web UI). For the sakes of running this locally, I could just
Re: Profiling Hadoop Job
Can you check which user you are running this process as and compare it with the ownership on the directory? On Thu, Mar 8, 2012 at 3:13 PM, Leonardo Urbina lurb...@mit.edu wrote: Does anyone have any idea how to solve this problem? Regardless of whether I'm using plain HPROF or profiling through Starfish, I am getting the same error: Exception in thread main java.io.FileNotFoundException: attempt_201203071311_0004_m_ 00_0.profile (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:194) at java.io.FileOutputStream.init(FileOutputStream.java:84) at org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) at com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) But I can't find what permissions to change to fix this issue. Any ideas? Thanks in advance, Best, -Leo On Wed, Mar 7, 2012 at 3:52 PM, Leonardo Urbina lurb...@mit.edu wrote: Thanks, -Leo On Wed, Mar 7, 2012 at 3:47 PM, Jie Li ji...@cs.duke.edu wrote: Hi Leo, Thanks for pointing out the outdated README file. Glad to tell you that we do support the old API in the latest version. See here: http://www.cs.duke.edu/starfish/previous.html Welcome to join our mailing list and your questions will reach more of our group members. Jie On Wed, Mar 7, 2012 at 3:37 PM, Leonardo Urbina lurb...@mit.edu wrote: Hi Jie, According to the Starfish README, the hadoop programs must be written using the new Hadoop API. This is not my case (I am using MultipleInputs among other non-new API supported features). Is there any way around this? Thanks, -Leo On Wed, Mar 7, 2012 at 3:19 PM, Jie Li ji...@cs.duke.edu wrote: Hi Leonardo, You might want to try Starfish which supports the memory profiling as well as cpu/disk/network profiling for the performance tuning. Jie -- Starfish is an intelligent performance tuning tool for Hadoop. Homepage: www.cs.duke.edu/starfish/ Mailing list: http://groups.google.com/group/hadoop-starfish On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina lurb...@mit.edu wrote: Hello everyone, I have a Hadoop job that I run on several GBs of data that I am trying to optimize in order to reduce the memory consumption as well as improve the speed. I am following the steps outlined in Tom White's Hadoop: The Definitive Guide for profiling using HPROF (p161), by setting the following properties in the JobConf: job.setProfileEnabled(true); job.setProfileParams(-agentlib:hprof=cpu=samples,heap=sites,depth=6, + force=n,thread=y,verbose=n,file=%s); job.setProfileTaskRange(true, 0-2); job.setProfileTaskRange(false, 0-2); I am trying to run this locally on a single pseudo-distributed install of hadoop (0.20.2) and it gives the following error: Exception in thread main java.io.FileNotFoundException: attempt_201203071311_0004_m_00_0.profile (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:194) at java.io.FileOutputStream.init(FileOutputStream.java:84) at org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) at com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at
Profiling Hadoop Job
Hello everyone, I have a Hadoop job that I run on several GBs of data that I am trying to optimize in order to reduce the memory consumption as well as improve the speed. I am following the steps outlined in Tom White's Hadoop: The Definitive Guide for profiling using HPROF (p161), by setting the following properties in the JobConf: job.setProfileEnabled(true); job.setProfileParams(-agentlib:hprof=cpu=samples,heap=sites,depth=6, + force=n,thread=y,verbose=n,file=%s); job.setProfileTaskRange(true, 0-2); job.setProfileTaskRange(false, 0-2); I am trying to run this locally on a single pseudo-distributed install of hadoop (0.20.2) and it gives the following error: Exception in thread main java.io.FileNotFoundException: attempt_201203071311_0004_m_00_0.profile (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:194) at java.io.FileOutputStream.init(FileOutputStream.java:84) at org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) at com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) However, I can access these logs directly from the tasktracker's logs (through the web UI). For the sakes of running this locally, I could just ignore this error, however I want to be able to profile the job once deployed to our hadoop cluster and need to be able to automatically retrieve these logs. Do I need to change the permissions in HDFS to allow for this? Any ideas on how to fix this? Thanks in advance, Best, -Leo -- Leo Urbina Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mathematics lurb...@mit.edu
Re: Profiling Hadoop Job
Hi Leonardo, You might want to try Starfish which supports the memory profiling as well as cpu/disk/network profiling for the performance tuning. Jie -- Starfish is an intelligent performance tuning tool for Hadoop. Homepage: www.cs.duke.edu/starfish/ Mailing list: http://groups.google.com/group/hadoop-starfish On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina lurb...@mit.edu wrote: Hello everyone, I have a Hadoop job that I run on several GBs of data that I am trying to optimize in order to reduce the memory consumption as well as improve the speed. I am following the steps outlined in Tom White's Hadoop: The Definitive Guide for profiling using HPROF (p161), by setting the following properties in the JobConf: job.setProfileEnabled(true); job.setProfileParams(-agentlib:hprof=cpu=samples,heap=sites,depth=6, + force=n,thread=y,verbose=n,file=%s); job.setProfileTaskRange(true, 0-2); job.setProfileTaskRange(false, 0-2); I am trying to run this locally on a single pseudo-distributed install of hadoop (0.20.2) and it gives the following error: Exception in thread main java.io.FileNotFoundException: attempt_201203071311_0004_m_00_0.profile (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:194) at java.io.FileOutputStream.init(FileOutputStream.java:84) at org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) at com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) However, I can access these logs directly from the tasktracker's logs (through the web UI). For the sakes of running this locally, I could just ignore this error, however I want to be able to profile the job once deployed to our hadoop cluster and need to be able to automatically retrieve these logs. Do I need to change the permissions in HDFS to allow for this? Any ideas on how to fix this? Thanks in advance, Best, -Leo -- Leo Urbina Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mathematics lurb...@mit.edu
Re: Profiling Hadoop Job
Hi Jie, According to the Starfish README, the hadoop programs must be written using the new Hadoop API. This is not my case (I am using MultipleInputs among other non-new API supported features). Is there any way around this? Thanks, -Leo On Wed, Mar 7, 2012 at 3:19 PM, Jie Li ji...@cs.duke.edu wrote: Hi Leonardo, You might want to try Starfish which supports the memory profiling as well as cpu/disk/network profiling for the performance tuning. Jie -- Starfish is an intelligent performance tuning tool for Hadoop. Homepage: www.cs.duke.edu/starfish/ Mailing list: http://groups.google.com/group/hadoop-starfish On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina lurb...@mit.edu wrote: Hello everyone, I have a Hadoop job that I run on several GBs of data that I am trying to optimize in order to reduce the memory consumption as well as improve the speed. I am following the steps outlined in Tom White's Hadoop: The Definitive Guide for profiling using HPROF (p161), by setting the following properties in the JobConf: job.setProfileEnabled(true); job.setProfileParams(-agentlib:hprof=cpu=samples,heap=sites,depth=6, + force=n,thread=y,verbose=n,file=%s); job.setProfileTaskRange(true, 0-2); job.setProfileTaskRange(false, 0-2); I am trying to run this locally on a single pseudo-distributed install of hadoop (0.20.2) and it gives the following error: Exception in thread main java.io.FileNotFoundException: attempt_201203071311_0004_m_00_0.profile (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:194) at java.io.FileOutputStream.init(FileOutputStream.java:84) at org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) at com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) However, I can access these logs directly from the tasktracker's logs (through the web UI). For the sakes of running this locally, I could just ignore this error, however I want to be able to profile the job once deployed to our hadoop cluster and need to be able to automatically retrieve these logs. Do I need to change the permissions in HDFS to allow for this? Any ideas on how to fix this? Thanks in advance, Best, -Leo -- Leo Urbina Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mathematics lurb...@mit.edu -- Leo Urbina Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mathematics lurb...@mit.edu
Re: Profiling Hadoop Job
Hi Leo, Thanks for pointing out the outdated README file. Glad to tell you that we do support the old API in the latest version. See here: http://www.cs.duke.edu/starfish/previous.html Welcome to join our mailing list and your questions will reach more of our group members. Jie On Wed, Mar 7, 2012 at 3:37 PM, Leonardo Urbina lurb...@mit.edu wrote: Hi Jie, According to the Starfish README, the hadoop programs must be written using the new Hadoop API. This is not my case (I am using MultipleInputs among other non-new API supported features). Is there any way around this? Thanks, -Leo On Wed, Mar 7, 2012 at 3:19 PM, Jie Li ji...@cs.duke.edu wrote: Hi Leonardo, You might want to try Starfish which supports the memory profiling as well as cpu/disk/network profiling for the performance tuning. Jie -- Starfish is an intelligent performance tuning tool for Hadoop. Homepage: www.cs.duke.edu/starfish/ Mailing list: http://groups.google.com/group/hadoop-starfish On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina lurb...@mit.edu wrote: Hello everyone, I have a Hadoop job that I run on several GBs of data that I am trying to optimize in order to reduce the memory consumption as well as improve the speed. I am following the steps outlined in Tom White's Hadoop: The Definitive Guide for profiling using HPROF (p161), by setting the following properties in the JobConf: job.setProfileEnabled(true); job.setProfileParams(-agentlib:hprof=cpu=samples,heap=sites,depth=6, + force=n,thread=y,verbose=n,file=%s); job.setProfileTaskRange(true, 0-2); job.setProfileTaskRange(false, 0-2); I am trying to run this locally on a single pseudo-distributed install of hadoop (0.20.2) and it gives the following error: Exception in thread main java.io.FileNotFoundException: attempt_201203071311_0004_m_00_0.profile (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:194) at java.io.FileOutputStream.init(FileOutputStream.java:84) at org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) at com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) However, I can access these logs directly from the tasktracker's logs (through the web UI). For the sakes of running this locally, I could just ignore this error, however I want to be able to profile the job once deployed to our hadoop cluster and need to be able to automatically retrieve these logs. Do I need to change the permissions in HDFS to allow for this? Any ideas on how to fix this? Thanks in advance, Best, -Leo -- Leo Urbina Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mathematics lurb...@mit.edu -- Leo Urbina Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mathematics lurb...@mit.edu
Re: Profiling Hadoop Job
Thanks, -Leo On Wed, Mar 7, 2012 at 3:47 PM, Jie Li ji...@cs.duke.edu wrote: Hi Leo, Thanks for pointing out the outdated README file. Glad to tell you that we do support the old API in the latest version. See here: http://www.cs.duke.edu/starfish/previous.html Welcome to join our mailing list and your questions will reach more of our group members. Jie On Wed, Mar 7, 2012 at 3:37 PM, Leonardo Urbina lurb...@mit.edu wrote: Hi Jie, According to the Starfish README, the hadoop programs must be written using the new Hadoop API. This is not my case (I am using MultipleInputs among other non-new API supported features). Is there any way around this? Thanks, -Leo On Wed, Mar 7, 2012 at 3:19 PM, Jie Li ji...@cs.duke.edu wrote: Hi Leonardo, You might want to try Starfish which supports the memory profiling as well as cpu/disk/network profiling for the performance tuning. Jie -- Starfish is an intelligent performance tuning tool for Hadoop. Homepage: www.cs.duke.edu/starfish/ Mailing list: http://groups.google.com/group/hadoop-starfish On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina lurb...@mit.edu wrote: Hello everyone, I have a Hadoop job that I run on several GBs of data that I am trying to optimize in order to reduce the memory consumption as well as improve the speed. I am following the steps outlined in Tom White's Hadoop: The Definitive Guide for profiling using HPROF (p161), by setting the following properties in the JobConf: job.setProfileEnabled(true); job.setProfileParams(-agentlib:hprof=cpu=samples,heap=sites,depth=6, + force=n,thread=y,verbose=n,file=%s); job.setProfileTaskRange(true, 0-2); job.setProfileTaskRange(false, 0-2); I am trying to run this locally on a single pseudo-distributed install of hadoop (0.20.2) and it gives the following error: Exception in thread main java.io.FileNotFoundException: attempt_201203071311_0004_m_00_0.profile (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:194) at java.io.FileOutputStream.init(FileOutputStream.java:84) at org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) at com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) However, I can access these logs directly from the tasktracker's logs (through the web UI). For the sakes of running this locally, I could just ignore this error, however I want to be able to profile the job once deployed to our hadoop cluster and need to be able to automatically retrieve these logs. Do I need to change the permissions in HDFS to allow for this? Any ideas on how to fix this? Thanks in advance, Best, -Leo -- Leo Urbina Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mathematics lurb...@mit.edu -- Leo Urbina Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mathematics lurb...@mit.edu -- Leo Urbina Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mathematics lurb...@mit.edu