Re: Re: Inverse of a matrix using Map - Reduce
Hi, As far as i know, inversion of matrix need a lot of loops which are not supported well in Hadoop MapRed. Hadoop MapRed is working well with block algorithms, especially for simple operations such are addition, transposition and possibly multiplication. However, with inversion, there is no (as I search up to now) algorithms support blocking, i mean working in each small parts of matrix and combine to final result. There are several algorithms such as Gaussian (as you said) or Csanky, but i think you will need more conplex implementation with ChainMapper/ChainReducer and/or using multiple contrained job. However, I think it does not effective much and convenient. So I am developing another version of Map Reduce which support staging of Reducer: 1 job = Mapper Reducer*. I test it with Csanky algorithm and it work quite well but I'm still on the way to improve the scheduling mechanism. From: aa...@buffalo.edu aa...@buffalo.edu To: common-user@hadoop.apache.org; aa...@buffalo.edu; Ganesh Swami gan...@iamganesh.com Sent: Thursday, February 4, 2010 3:57:39 Subject: Re: Re: Inverse of a matrix using Map - Reduce Hi, Any idea how this method will scale for dense matrices ?The kind of matrices I am going to be working with are 500,000*500,000. Will this be a problem. Also have you used this patch ? Best Regards from Buffalo Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Wed 02/03/10 1:41 AM , Ganesh Swami gan...@iamganesh.com sent: What about the Moore-Penrose inverse? http://en.wikipedia.org/wiki/Moore-Penrose_pseudoinverse The pseudo-inverse coincides with the regular inverse when the matrix is non-singular. Moreover, it can be computed using the SVD. Here's a patch for a MapReduce version of the SVD: https://issues.apache.org/jira/browse/MAHOUT-180 Ganesh On Tue, Feb 2, 2010 at 10:11 PM, aa...@buffa lo.edu wrote: Hello People, Â Â Â Â Â Â My name is Abhishek Agrawal. For the last few days I have been trying to figure out how to calculate the inverse of a matrix using Map Reduce. Matrix inversion has 2 common approaches. Gaussian- Jordan and the cofactor of transpose method. But both of them dont seem to be suited too well for Map- Reduce. Gaussian Jordan involves blocking co factoring a matrix requires repeated calculation of determinant. Can some one give me any pointers so as to how to solve this problem ? Best Regards from Buffalo Abhishek Agrawal SUNY- Buffalo (716-435-7122) New Email names for you! Get the Email name you#39;ve always wanted on the new @ymail and @rocketmail. Hurry before someone else does! http://mail.promotions.yahoo.com/newdomains/aa/
Re: Job Tracker questions
Ye currently am using jobclient to read these counters. But We are not able to use *webservices *because the jar which is used to read the counters from running hadoop job is itself a Hadoop program If we could have pure Java Api which is run without hadoop command then we could return the counter variable into webservices and show in UI. Any help or technique to show thsese counters in the UI would be appreciated ( not necessarily using web service ) I am using webservices because I am having .net VB client thanks On Wed, Feb 3, 2010 at 8:33 PM, Jeff Zhang zjf...@gmail.com wrote: I think you can use JobClient to get the counters in your web service. If you look at the shell script bin/hadoop, you will find that actually this shell use the JobClient to get the counters. On Wed, Feb 3, 2010 at 4:34 AM, Mark N nipen.m...@gmail.com wrote: We have a hadoop job running and have used custom counters to track few counters ( like no of successfully processed documents matching certain conditions) Since we need to get this counters even while the Hadoop job is running , we wrote another Java program to read these counters *Counter reader program *will do the following : 1) List all the running jobs. 2) Get the running job using Job name 2) Get all the counter for individual running jobs 3) Set this counters in variables. We could successfully read these counters , but since we need to show these counters to custom UI , how can we show these counters? we looked into various options to read these counters to show in UI as following : 1. Dump these counters to database , however this may be overhead 2. Write web service and UI will invoke the functions from these service to show in UI ( However since we need to run *Counter reader program *with Hadoop command it might not be feasible to write web service ? ) so the question is can we achive to read the counters using simple Java APIs ? Does anyone have idea how does the default jobtracker JSP works ? we wanted to built something similar to this thanks -- Nipen Mark -- Best Regards Jeff Zhang -- Nipen Mark
Re: Job Tracker questions
Well, you can create a proxy of JobTracker in client side, and then you can use the API of JobTracker to get the information of jobs. The Proxy take the responsibility of communication with the Master Node. Read the source code of JobClient can help you. On Thu, Feb 4, 2010 at 6:59 PM, Mark N nipen.m...@gmail.com wrote: Ye currently am using jobclient to read these counters. But We are not able to use *webservices *because the jar which is used to read the counters from running hadoop job is itself a Hadoop program If we could have pure Java Api which is run without hadoop command then we could return the counter variable into webservices and show in UI. Any help or technique to show thsese counters in the UI would be appreciated ( not necessarily using web service ) I am using webservices because I am having .net VB client thanks On Wed, Feb 3, 2010 at 8:33 PM, Jeff Zhang zjf...@gmail.com wrote: I think you can use JobClient to get the counters in your web service. If you look at the shell script bin/hadoop, you will find that actually this shell use the JobClient to get the counters. On Wed, Feb 3, 2010 at 4:34 AM, Mark N nipen.m...@gmail.com wrote: We have a hadoop job running and have used custom counters to track few counters ( like no of successfully processed documents matching certain conditions) Since we need to get this counters even while the Hadoop job is running , we wrote another Java program to read these counters *Counter reader program *will do the following : 1) List all the running jobs. 2) Get the running job using Job name 2) Get all the counter for individual running jobs 3) Set this counters in variables. We could successfully read these counters , but since we need to show these counters to custom UI , how can we show these counters? we looked into various options to read these counters to show in UI as following : 1. Dump these counters to database , however this may be overhead 2. Write web service and UI will invoke the functions from these service to show in UI ( However since we need to run *Counter reader program *with Hadoop command it might not be feasible to write web service ? ) so the question is can we achive to read the counters using simple Java APIs ? Does anyone have idea how does the default jobtracker JSP works ? we wanted to built something similar to this thanks -- Nipen Mark -- Best Regards Jeff Zhang -- Nipen Mark -- Best Regards Jeff Zhang
Re: configuration file
Hi, A shot in the dark, is the conf file in your classpath? If yes, are the parameters you are trying to override marked final? Amogh On 2/4/10 3:18 AM, Gang Luo lgpub...@yahoo.com.cn wrote: Hi, I am writing script to run whole bunch of jobs automatically. But the configuration file doesn't seems working. I think there is something wrong in my command. The command is my script is like: bin/hadoop jar myJarFile myClass -conf myConfigurationFilr.xml arg1 agr2 I use conf.get() so show the value of some parameters. But the values are not what I define in that xml file. Is there something wrong? Thanks. -Gang
Re: Job Tracker questions
could you please elaborate on this ( * hint to get started as am very new to hadoop? ) So far I could succesfully read all the default and custom counters. Currently we are having a .net client. thanks in advance. On Thu, Feb 4, 2010 at 4:53 PM, Jeff Zhang zjf...@gmail.com wrote: Well, you can create a proxy of JobTracker in client side, and then you can use the API of JobTracker to get the information of jobs. The Proxy take the responsibility of communication with the Master Node. Read the source code of JobClient can help you. On Thu, Feb 4, 2010 at 6:59 PM, Mark N nipen.m...@gmail.com wrote: Ye currently am using jobclient to read these counters. But We are not able to use *webservices *because the jar which is used to read the counters from running hadoop job is itself a Hadoop program If we could have pure Java Api which is run without hadoop command then we could return the counter variable into webservices and show in UI. Any help or technique to show thsese counters in the UI would be appreciated ( not necessarily using web service ) I am using webservices because I am having .net VB client thanks On Wed, Feb 3, 2010 at 8:33 PM, Jeff Zhang zjf...@gmail.com wrote: I think you can use JobClient to get the counters in your web service. If you look at the shell script bin/hadoop, you will find that actually this shell use the JobClient to get the counters. On Wed, Feb 3, 2010 at 4:34 AM, Mark N nipen.m...@gmail.com wrote: We have a hadoop job running and have used custom counters to track few counters ( like no of successfully processed documents matching certain conditions) Since we need to get this counters even while the Hadoop job is running , we wrote another Java program to read these counters *Counter reader program *will do the following : 1) List all the running jobs. 2) Get the running job using Job name 2) Get all the counter for individual running jobs 3) Set this counters in variables. We could successfully read these counters , but since we need to show these counters to custom UI , how can we show these counters? we looked into various options to read these counters to show in UI as following : 1. Dump these counters to database , however this may be overhead 2. Write web service and UI will invoke the functions from these service to show in UI ( However since we need to run *Counter reader program *with Hadoop command it might not be feasible to write web service ? ) so the question is can we achive to read the counters using simple Java APIs ? Does anyone have idea how does the default jobtracker JSP works ? we wanted to built something similar to this thanks -- Nipen Mark -- Best Regards Jeff Zhang -- Nipen Mark -- Best Regards Jeff Zhang -- Nipen Mark
Re: Job Tracker questions
Do you mean want to connect the JobTracker using .Net ? If so, I'm afraid I have no idea how to this. The rpc of hadoop is language dependent. On Thu, Feb 4, 2010 at 7:18 PM, Mark N nipen.m...@gmail.com wrote: could you please elaborate on this ( * hint to get started as am very new to hadoop? ) So far I could succesfully read all the default and custom counters. Currently we are having a .net client. thanks in advance. On Thu, Feb 4, 2010 at 4:53 PM, Jeff Zhang zjf...@gmail.com wrote: Well, you can create a proxy of JobTracker in client side, and then you can use the API of JobTracker to get the information of jobs. The Proxy take the responsibility of communication with the Master Node. Read the source code of JobClient can help you. On Thu, Feb 4, 2010 at 6:59 PM, Mark N nipen.m...@gmail.com wrote: Ye currently am using jobclient to read these counters. But We are not able to use *webservices *because the jar which is used to read the counters from running hadoop job is itself a Hadoop program If we could have pure Java Api which is run without hadoop command then we could return the counter variable into webservices and show in UI. Any help or technique to show thsese counters in the UI would be appreciated ( not necessarily using web service ) I am using webservices because I am having .net VB client thanks On Wed, Feb 3, 2010 at 8:33 PM, Jeff Zhang zjf...@gmail.com wrote: I think you can use JobClient to get the counters in your web service. If you look at the shell script bin/hadoop, you will find that actually this shell use the JobClient to get the counters. On Wed, Feb 3, 2010 at 4:34 AM, Mark N nipen.m...@gmail.com wrote: We have a hadoop job running and have used custom counters to track few counters ( like no of successfully processed documents matching certain conditions) Since we need to get this counters even while the Hadoop job is running , we wrote another Java program to read these counters *Counter reader program *will do the following : 1) List all the running jobs. 2) Get the running job using Job name 2) Get all the counter for individual running jobs 3) Set this counters in variables. We could successfully read these counters , but since we need to show these counters to custom UI , how can we show these counters? we looked into various options to read these counters to show in UI as following : 1. Dump these counters to database , however this may be overhead 2. Write web service and UI will invoke the functions from these service to show in UI ( However since we need to run *Counter reader program *with Hadoop command it might not be feasible to write web service ? ) so the question is can we achive to read the counters using simple Java APIs ? Does anyone have idea how does the default jobtracker JSP works ? we wanted to built something similar to this thanks -- Nipen Mark -- Best Regards Jeff Zhang -- Nipen Mark -- Best Regards Jeff Zhang -- Nipen Mark -- Best Regards Jeff Zhang
Re: Job Tracker questions
I think you can create web service using Java, and then in .net using the web service to display the result. On Thu, Feb 4, 2010 at 7:21 PM, Jeff Zhang zjf...@gmail.com wrote: Do you mean want to connect the JobTracker using .Net ? If so, I'm afraid I have no idea how to this. The rpc of hadoop is language dependent. On Thu, Feb 4, 2010 at 7:18 PM, Mark N nipen.m...@gmail.com wrote: could you please elaborate on this ( * hint to get started as am very new to hadoop? ) So far I could succesfully read all the default and custom counters. Currently we are having a .net client. thanks in advance. On Thu, Feb 4, 2010 at 4:53 PM, Jeff Zhang zjf...@gmail.com wrote: Well, you can create a proxy of JobTracker in client side, and then you can use the API of JobTracker to get the information of jobs. The Proxy take the responsibility of communication with the Master Node. Read the source code of JobClient can help you. On Thu, Feb 4, 2010 at 6:59 PM, Mark N nipen.m...@gmail.com wrote: Ye currently am using jobclient to read these counters. But We are not able to use *webservices *because the jar which is used to read the counters from running hadoop job is itself a Hadoop program If we could have pure Java Api which is run without hadoop command then we could return the counter variable into webservices and show in UI. Any help or technique to show thsese counters in the UI would be appreciated ( not necessarily using web service ) I am using webservices because I am having .net VB client thanks On Wed, Feb 3, 2010 at 8:33 PM, Jeff Zhang zjf...@gmail.com wrote: I think you can use JobClient to get the counters in your web service. If you look at the shell script bin/hadoop, you will find that actually this shell use the JobClient to get the counters. On Wed, Feb 3, 2010 at 4:34 AM, Mark N nipen.m...@gmail.com wrote: We have a hadoop job running and have used custom counters to track few counters ( like no of successfully processed documents matching certain conditions) Since we need to get this counters even while the Hadoop job is running , we wrote another Java program to read these counters *Counter reader program *will do the following : 1) List all the running jobs. 2) Get the running job using Job name 2) Get all the counter for individual running jobs 3) Set this counters in variables. We could successfully read these counters , but since we need to show these counters to custom UI , how can we show these counters? we looked into various options to read these counters to show in UI as following : 1. Dump these counters to database , however this may be overhead 2. Write web service and UI will invoke the functions from these service to show in UI ( However since we need to run *Counter reader program *with Hadoop command it might not be feasible to write web service ? ) so the question is can we achive to read the counters using simple Java APIs ? Does anyone have idea how does the default jobtracker JSP works ? we wanted to built something similar to this thanks -- Nipen Mark -- Best Regards Jeff Zhang -- Nipen Mark -- Best Regards Jeff Zhang -- Nipen Mark -- Best Regards Jeff Zhang -- Best Regards Jeff Zhang
Re: Job Tracker questions
You can use org.apache.hadoop.ipc.RPC.getProxy() to initialize the proxy of JobTracker On Thu, Feb 4, 2010 at 7:23 PM, Jeff Zhang zjf...@gmail.com wrote: I think you can create web service using Java, and then in .net using the web service to display the result. On Thu, Feb 4, 2010 at 7:21 PM, Jeff Zhang zjf...@gmail.com wrote: Do you mean want to connect the JobTracker using .Net ? If so, I'm afraid I have no idea how to this. The rpc of hadoop is language dependent. On Thu, Feb 4, 2010 at 7:18 PM, Mark N nipen.m...@gmail.com wrote: could you please elaborate on this ( * hint to get started as am very new to hadoop? ) So far I could succesfully read all the default and custom counters. Currently we are having a .net client. thanks in advance. On Thu, Feb 4, 2010 at 4:53 PM, Jeff Zhang zjf...@gmail.com wrote: Well, you can create a proxy of JobTracker in client side, and then you can use the API of JobTracker to get the information of jobs. The Proxy take the responsibility of communication with the Master Node. Read the source code of JobClient can help you. On Thu, Feb 4, 2010 at 6:59 PM, Mark N nipen.m...@gmail.com wrote: Ye currently am using jobclient to read these counters. But We are not able to use *webservices *because the jar which is used to read the counters from running hadoop job is itself a Hadoop program If we could have pure Java Api which is run without hadoop command then we could return the counter variable into webservices and show in UI. Any help or technique to show thsese counters in the UI would be appreciated ( not necessarily using web service ) I am using webservices because I am having .net VB client thanks On Wed, Feb 3, 2010 at 8:33 PM, Jeff Zhang zjf...@gmail.com wrote: I think you can use JobClient to get the counters in your web service. If you look at the shell script bin/hadoop, you will find that actually this shell use the JobClient to get the counters. On Wed, Feb 3, 2010 at 4:34 AM, Mark N nipen.m...@gmail.com wrote: We have a hadoop job running and have used custom counters to track few counters ( like no of successfully processed documents matching certain conditions) Since we need to get this counters even while the Hadoop job is running , we wrote another Java program to read these counters *Counter reader program *will do the following : 1) List all the running jobs. 2) Get the running job using Job name 2) Get all the counter for individual running jobs 3) Set this counters in variables. We could successfully read these counters , but since we need to show these counters to custom UI , how can we show these counters? we looked into various options to read these counters to show in UI as following : 1. Dump these counters to database , however this may be overhead 2. Write web service and UI will invoke the functions from these service to show in UI ( However since we need to run *Counter reader program *with Hadoop command it might not be feasible to write web service ? ) so the question is can we achive to read the counters using simple Java APIs ? Does anyone have idea how does the default jobtracker JSP works ? we wanted to built something similar to this thanks -- Nipen Mark -- Best Regards Jeff Zhang -- Nipen Mark -- Best Regards Jeff Zhang -- Nipen Mark -- Best Regards Jeff Zhang -- Best Regards Jeff Zhang -- Best Regards Jeff Zhang
Re: Job Tracker questions
yes we can create a webservice in java which would be called by .net to display these counters. But since the java code to read these counters needs use hadoop APIs ( job client ) , am not sure we can create a webservice to read the counters Question is how does the default hadoop task tracker display counter information in JSP pages ? does it read from the XML files ? thanks, On Thu, Feb 4, 2010 at 5:08 PM, Jeff Zhang zjf...@gmail.com wrote: I think you can create web service using Java, and then in .net using the web service to display the result. On Thu, Feb 4, 2010 at 7:21 PM, Jeff Zhang zjf...@gmail.com wrote: Do you mean want to connect the JobTracker using .Net ? If so, I'm afraid I have no idea how to this. The rpc of hadoop is language dependent. On Thu, Feb 4, 2010 at 7:18 PM, Mark N nipen.m...@gmail.com wrote: could you please elaborate on this ( * hint to get started as am very new to hadoop? ) So far I could succesfully read all the default and custom counters. Currently we are having a .net client. thanks in advance. On Thu, Feb 4, 2010 at 4:53 PM, Jeff Zhang zjf...@gmail.com wrote: Well, you can create a proxy of JobTracker in client side, and then you can use the API of JobTracker to get the information of jobs. The Proxy take the responsibility of communication with the Master Node. Read the source code of JobClient can help you. On Thu, Feb 4, 2010 at 6:59 PM, Mark N nipen.m...@gmail.com wrote: Ye currently am using jobclient to read these counters. But We are not able to use *webservices *because the jar which is used to read the counters from running hadoop job is itself a Hadoop program If we could have pure Java Api which is run without hadoop command then we could return the counter variable into webservices and show in UI. Any help or technique to show thsese counters in the UI would be appreciated ( not necessarily using web service ) I am using webservices because I am having .net VB client thanks On Wed, Feb 3, 2010 at 8:33 PM, Jeff Zhang zjf...@gmail.com wrote: I think you can use JobClient to get the counters in your web service. If you look at the shell script bin/hadoop, you will find that actually this shell use the JobClient to get the counters. On Wed, Feb 3, 2010 at 4:34 AM, Mark N nipen.m...@gmail.com wrote: We have a hadoop job running and have used custom counters to track few counters ( like no of successfully processed documents matching certain conditions) Since we need to get this counters even while the Hadoop job is running , we wrote another Java program to read these counters *Counter reader program *will do the following : 1) List all the running jobs. 2) Get the running job using Job name 2) Get all the counter for individual running jobs 3) Set this counters in variables. We could successfully read these counters , but since we need to show these counters to custom UI , how can we show these counters? we looked into various options to read these counters to show in UI as following : 1. Dump these counters to database , however this may be overhead 2. Write web service and UI will invoke the functions from these service to show in UI ( However since we need to run *Counter reader program *with Hadoop command it might not be feasible to write web service ? ) so the question is can we achive to read the counters using simple Java APIs ? Does anyone have idea how does the default jobtracker JSP works ? we wanted to built something similar to this thanks -- Nipen Mark -- Best Regards Jeff Zhang -- Nipen Mark -- Best Regards Jeff Zhang -- Nipen Mark -- Best Regards Jeff Zhang -- Best Regards Jeff Zhang -- Nipen Mark
Re: Job Tracker questions
I look at the source code, it seems the job tracker web ui also use the proxy of JobTracker to get the counter information rather the xml file. On Thu, Feb 4, 2010 at 7:29 PM, Mark N nipen.m...@gmail.com wrote: yes we can create a webservice in java which would be called by .net to display these counters. But since the java code to read these counters needs use hadoop APIs ( job client ) , am not sure we can create a webservice to read the counters Question is how does the default hadoop task tracker display counter information in JSP pages ? does it read from the XML files ? thanks, On Thu, Feb 4, 2010 at 5:08 PM, Jeff Zhang zjf...@gmail.com wrote: I think you can create web service using Java, and then in .net using the web service to display the result. On Thu, Feb 4, 2010 at 7:21 PM, Jeff Zhang zjf...@gmail.com wrote: Do you mean want to connect the JobTracker using .Net ? If so, I'm afraid I have no idea how to this. The rpc of hadoop is language dependent. On Thu, Feb 4, 2010 at 7:18 PM, Mark N nipen.m...@gmail.com wrote: could you please elaborate on this ( * hint to get started as am very new to hadoop? ) So far I could succesfully read all the default and custom counters. Currently we are having a .net client. thanks in advance. On Thu, Feb 4, 2010 at 4:53 PM, Jeff Zhang zjf...@gmail.com wrote: Well, you can create a proxy of JobTracker in client side, and then you can use the API of JobTracker to get the information of jobs. The Proxy take the responsibility of communication with the Master Node. Read the source code of JobClient can help you. On Thu, Feb 4, 2010 at 6:59 PM, Mark N nipen.m...@gmail.com wrote: Ye currently am using jobclient to read these counters. But We are not able to use *webservices *because the jar which is used to read the counters from running hadoop job is itself a Hadoop program If we could have pure Java Api which is run without hadoop command then we could return the counter variable into webservices and show in UI. Any help or technique to show thsese counters in the UI would be appreciated ( not necessarily using web service ) I am using webservices because I am having .net VB client thanks On Wed, Feb 3, 2010 at 8:33 PM, Jeff Zhang zjf...@gmail.com wrote: I think you can use JobClient to get the counters in your web service. If you look at the shell script bin/hadoop, you will find that actually this shell use the JobClient to get the counters. On Wed, Feb 3, 2010 at 4:34 AM, Mark N nipen.m...@gmail.com wrote: We have a hadoop job running and have used custom counters to track few counters ( like no of successfully processed documents matching certain conditions) Since we need to get this counters even while the Hadoop job is running , we wrote another Java program to read these counters *Counter reader program *will do the following : 1) List all the running jobs. 2) Get the running job using Job name 2) Get all the counter for individual running jobs 3) Set this counters in variables. We could successfully read these counters , but since we need to show these counters to custom UI , how can we show these counters? we looked into various options to read these counters to show in UI as following : 1. Dump these counters to database , however this may be overhead 2. Write web service and UI will invoke the functions from these service to show in UI ( However since we need to run *Counter reader program *with Hadoop command it might not be feasible to write web service ? ) so the question is can we achive to read the counters using simple Java APIs ? Does anyone have idea how does the default jobtracker JSP works ? we wanted to built something similar to this thanks -- Nipen Mark -- Best Regards Jeff Zhang -- Nipen Mark -- Best Regards Jeff Zhang -- Nipen Mark -- Best Regards Jeff Zhang -- Best Regards Jeff Zhang -- Nipen Mark -- Best Regards Jeff Zhang
Re: Inverse of a matrix using Map - Reduce
Hey Abhishek, Why would you want to fully invert a matrix that large? How is it preconditioned? What is the condition number of the matrix? Why not just use ScaLAPACK? It's a hairy beast, but you should definitely consider it. Brian On Feb 3, 2010, at 9:57 PM, aa...@buffalo.edu wrote: Hi, Any idea how this method will scale for dense matrices ?The kind of matrices I am going to be working with are 500,000*500,000. Will this be a problem. Also have you used this patch ? Best Regards from Buffalo Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Wed 02/03/10 1:41 AM , Ganesh Swami gan...@iamganesh.com sent: What about the Moore-Penrose inverse? http://en.wikipedia.org/wiki/Moore-Penrose_pseudoinverse The pseudo-inverse coincides with the regular inverse when the matrix is non-singular. Moreover, it can be computed using the SVD. Here's a patch for a MapReduce version of the SVD: https://issues.apache.org/jira/browse/MAHOUT-180 Ganesh On Tue, Feb 2, 2010 at 10:11 PM, aa...@buffa lo.edu wrote: Hello People, Â Â Â Â Â Â My name is Abhishek Agrawal. For the last few days I have been trying to figure out how to calculate the inverse of a matrix using Map Reduce. Matrix inversion has 2 common approaches. Gaussian- Jordan and the cofactor of transpose method. But both of them dont seem to be suited too well for Map- Reduce. Gaussian Jordan involves blocking co factoring a matrix requires repeated calculation of determinant. Can some one give me any pointers so as to how to solve this problem ? Best Regards from Buffalo Abhishek Agrawal SUNY- Buffalo (716-435-7122) smime.p7s Description: S/MIME cryptographic signature
Re: Maven and Mini MR Cluster
Ya with the hadoop_home stuff i was grasping at straws. My mini MR Cluster has a valid classpath i assume, since my entire test runs (thru 3 mapreduce jobs via the localrunner) before it gets to the mini MR cluster portion. Is it possible to print out the classpath thru the JVMManager or anything else like that for debugging purposes? mb On Feb 4, 2010, at 5:55 AM, Steve Loughran wrote: Michael Basnight wrote: Im using maven to run all my unit tests, and i have a unit test that creates a mini mr cluster. When i create this cluster, i get classdefnotfound errors for the core hadoop libs (Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.Child). When i run the same test w/o creating the mini cluster, well.. it works fine. My HADOOP_HOME is set to the same version as my mvn repo, and points to a valid installation of hadoop. When i validate the classpath thru maven, (dependency:build-classpath), it says that the core libs are on the classpath as well (sourced from my .m2 repository). I just cant figure out why hadoop's mini cluster cant find those jars. Running hadoop 0.20.0. Any suggestions? the miniMR cluster does everything in memory, and doesnt look at HADOOP_HOME, which is only for the shell scripts. It sounds like you need hadoop-mapreduce on your classpath. Sounds like. the Child class is the entry point used when creating new JVMs, and it is that classpath that isn't right, which is a forked JVM from the one the MiniMRCluster was created in.
Re: configuration file
I give the path to that xml file in that command. Do I need to add that path to classpath? I try to give a wrong path, there is no error reported. Aren't those parameters all configurable? like io.sort.mb, mapred.reduce.tasks, io.sort.factor, etc. Thanks. -Gang - 原始邮件 发件人: Amogh Vasekar am...@yahoo-inc.com 收件人: common-user@hadoop.apache.org common-user@hadoop.apache.org 发送日期: 2010/2/4 (周四) 6:09:04 上午 主 题: Re: configuration file Hi, A shot in the dark, is the conf file in your classpath? If yes, are the parameters you are trying to override marked final? Amogh On 2/4/10 3:18 AM, Gang Luo lgpub...@yahoo.com.cn wrote: Hi, I am writing script to run whole bunch of jobs automatically. But the configuration file doesn't seems working. I think there is something wrong in my command. The command is my script is like: bin/hadoop jar myJarFile myClass -conf myConfigurationFilr.xml arg1 agr2 I use conf.get() so show the value of some parameters. But the values are not what I define in that xml file. Is there something wrong? Thanks. -Gang ___ 好玩贺卡等你发,邮箱贺卡全新上线! http://card.mail.cn.yahoo.com/
Re: Maven and Mini MR Cluster
Ya with the hadoop_home stuff i was grasping at straws. My mini MR Cluster has a valid classpath i assume, since my entire test runs (thru 3 mapreduce jobs via the localrunner) before it gets to the mini MR cluster portion. Is it possible to print out the classpath thru the JVMManager or anything else like that for debugging purposes? mb On Feb 4, 2010, at 5:55 AM, Steve Loughran wrote: Michael Basnight wrote: Im using maven to run all my unit tests, and i have a unit test that creates a mini mr cluster. When i create this cluster, i get classdefnotfound errors for the core hadoop libs (Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.Child). When i run the same test w/o creating the mini cluster, well.. it works fine. My HADOOP_HOME is set to the same version as my mvn repo, and points to a valid installation of hadoop. When i validate the classpath thru maven, (dependency:build-classpath), it says that the core libs are on the classpath as well (sourced from my .m2 repository). I just cant figure out why hadoop's mini cluster cant find those jars. Running hadoop 0.20.0. Any suggestions? the miniMR cluster does everything in memory, and doesnt look at HADOOP_HOME, which is only for the shell scripts. It sounds like you need hadoop-mapreduce on your classpath. Sounds like. the Child class is the entry point used when creating new JVMs, and it is that classpath that isn't right, which is a forked JVM from the one the MiniMRCluster was created in.
Re: Maven and Mini MR Cluster
Michael Basnight wrote: Ya with the hadoop_home stuff i was grasping at straws. My mini MR Cluster has a valid classpath i assume, since my entire test runs (thru 3 mapreduce jobs via the localrunner) before it gets to the mini MR cluster portion. Is it possible to print out the classpath thru the JVMManager or anything else like that for debugging purposes? probably, though I don't know what.
Re: Maven and Mini MR Cluster
On Thu, Feb 4, 2010 at 12:12 PM, Steve Loughran ste...@apache.org wrote: Michael Basnight wrote: Ya with the hadoop_home stuff i was grasping at straws. My mini MR Cluster has a valid classpath i assume, since my entire test runs (thru 3 mapreduce jobs via the localrunner) before it gets to the mini MR cluster portion. Is it possible to print out the classpath thru the JVMManager or anything else like that for debugging purposes? probably, though I don't know what. Normally from a shell script, I do something like this to ensure I suck up hadoop.jar, hadoop-test.jar, and its dependents. CPATH= for f in /opt/hadoop/lib/*.jar ; do CPATH=${CPATH}:$f done for f in /opt/hadoop/*.jar ; do CPATH=${CPATH}:$f done java -cp $CPATH If you are in the build phase you should refer to the build.xml and build-common and try to emulate that classpath and add you needs.
[ANNOUNCE] Katta 0.6 released
Release 0.6 of Katta is now available. Katta - Lucene (or Hadoop Mapfiles or any content which can be split into shards) in the cloud. http://katta.sourceforge.net The key changes of the 0.6 release among dozens of bug fixes: - upgrade lucene to 3.0 - upgrade zookeeper to 3.2.2 - upgrade hadoop to 0.20.1 - generalize katta for serving shard-able content (lucene is one implementation, hadoop mapfiles another one) - basic lucene field sort capability - more robust zookeeper session expiration handling - throttling of shard deployment (kb/sec configurable) to hava a stabe search while deploying - load test facility - monitoring facility - alpha version of web-gui The changes from 0.6.rc1 release: KATTA-120, fix listIndices for wrong file pathes KATTA-117, add command line option to print stacktrace on error KATTA-116, fix distribution of shards does not take currently deploying shards into account KATTA-107, fix katta execution on cygwin KATTA-112, ship build.xml in core distribution KATTA-110, use a released 0.1 version of zkclient instead of the snapshot See full list of changes at http://oss.101tec.com/jira/secure/ReleaseNote.jspa?projectId=1styleName=Htmlversion=10010 Binary distribution is available at https://sourceforge.net/projects/katta/ The Katta Team
Mapper Process Duration
Hello, I have a question about mapred.Child processes. Even though a mapper is finished I see that the process (from ps) stays around longer than reported on the hadoop MR webpage. What is the mapper process doing after it has reported that it is finished? To illustrate my question: I see that one mapper reports it finished in 9 seconds but from logging ps output every second, I see it last for 24 seconds before exiting. I essentially see this for each mapper. Lastly, where can I find information on how exactly the map reduce framework reuses JVMs. The reason I'm asking is because I see that with reuse on (mapred.job.reuse.jvm.num.tasks set to -1), the pid's change for each new mapper. How can this be without starting a new JVM? Thanks! -- Navraj S. Chohan nlak...@gmail.com
Re: EOFException and BadLink, but file descriptors number is ok?
I wrote a hadoop job that checks for ulimits across the nodes, and every node is reporting: core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 139264 max locked memory (kbytes, -l) 32 max memory size (kbytes, -m) unlimited open files (-n) 65536 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 139264 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Is anything in there telling about file number limits? From what I understand, a high open files limit like 65536 should be enough. I estimate only a couple thousand part-files on HDFS being written to at once, and around 200 on the filesystem per node. On Wed, Feb 3, 2010 at 4:04 PM, Meng Mao meng...@gmail.com wrote: also, which is the ulimit that's important, the one for the user who is running the job, or the hadoop user that owns the Hadoop processes? On Tue, Feb 2, 2010 at 7:29 PM, Meng Mao meng...@gmail.com wrote: I've been trying to run a fairly small input file (300MB) on Cloudera Hadoop 0.20.1. The job I'm using probably writes to on the order of over 1000 part-files at once, across the whole grid. The grid has 33 nodes in it. I get the following exception in the run logs: 10/01/30 17:24:25 INFO mapred.JobClient: map 100% reduce 12% 10/01/30 17:24:25 INFO mapred.JobClient: Task Id : attempt_201001261532_1137_r_13_0, Status : FAILED java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) at org.apache.hadoop.io.Text.readString(Text.java:400) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2869) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263) lots of EOFExceptions 10/01/30 17:24:25 INFO mapred.JobClient: Task Id : attempt_201001261532_1137_r_19_0, Status : FAILED java.io.IOException: Bad connect ack with firstBadLink 10.2.19.1:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2871) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263) 10/01/30 17:24:36 INFO mapred.JobClient: map 100% reduce 11% 10/01/30 17:24:42 INFO mapred.JobClient: map 100% reduce 12% 10/01/30 17:24:49 INFO mapred.JobClient: map 100% reduce 13% 10/01/30 17:24:55 INFO mapred.JobClient: map 100% reduce 14% 10/01/30 17:25:00 INFO mapred.JobClient: map 100% reduce 15% From searching around, it seems like the most common cause of BadLink and EOFExceptions is when the nodes don't have enough file descriptors set. But across all the grid machines, the file-max has been set to 1573039. Furthermore, we set ulimit -n to 65536 using hadoop-env.sh. Where else should I be looking for what's causing this?
What framework Hadoop uses for daemonizing?
Hi. Just wondering - does anyone know what framework Hadoop uses for daemonizing? Any chance it's jsvc from Apache? Regards.
Re: What framework Hadoop uses for daemonizing?
Hi Stas, Hadoop doesn't daemonize itself. The shell scripts use nohup and a lot of bash code to achieve a similar idea. -Todd On Thu, Feb 4, 2010 at 1:03 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. Just wondering - does anyone know what framework Hadoop uses for daemonizing? Any chance it's jsvc from Apache? Regards.
Re: What framework Hadoop uses for daemonizing?
Hi Todd. Hadoop doesn't daemonize itself. The shell scripts use nohup and a lot of bash code to achieve a similar idea. Was there any design decision behind this approach? I remember that I had to do the same, as any wrapper just caused the daemon to run in lower priority then it should. Regards.
Re: What framework Hadoop uses for daemonizing?
On 2/4/10 1:21 PM, Stas Oskin stas.os...@gmail.com wrote: Was there any design decision behind this approach? Likely KISS. I remember that I had to do the same, as any wrapper just caused the daemon to run in lower priority then it should. ...which is also easily dealt with from the shell and gives you the flexibility to use OS-specific constructs. The other big benefit is that this also means you don't need to UNdaemonize code for those users that use something besides just pure init rc scripts. (djbtools, smf, launchd, whatever)
Re: What framework Hadoop uses for daemonizing?
On Thu, Feb 4, 2010 at 1:21 PM, Stas Oskin stas.os...@gmail.com wrote: Hi Todd. Hadoop doesn't daemonize itself. The shell scripts use nohup and a lot of bash code to achieve a similar idea. Was there any design decision behind this approach? It long predates my involvement in the project. In fact, it predates Hadoop itself - it got inherited from Nutch long ago. I vaguely recall a JIRA about using jsvc for Hadoop - if you search around I bet you can turn it up. -Todd
Re: What framework Hadoop uses for daemonizing?
I actually asked this because I'm looking for a good alternative to current bunch of scripts and lsb-redhat dependencies I have today in my own Hadoop client which runs as daemon. So I kinda hoped there is some sauce behind Hadoop I can borrow. While this might be not the most appropriate list, I'd appreciate if someone can say if jsvc can keep the right priorities, or suggest alternative daemon framework. Thanks again. It long predates my involvement in the project. In fact, it predates Hadoop itself - it got inherited from Nutch long ago. I vaguely recall a JIRA about using jsvc for Hadoop - if you search around I bet you can turn it up. -Todd
Re: What framework Hadoop uses for daemonizing?
On Thu, Feb 4, 2010 at 4:39 PM, Stas Oskin stas.os...@gmail.com wrote: I actually asked this because I'm looking for a good alternative to current bunch of scripts and lsb-redhat dependencies I have today in my own Hadoop client which runs as daemon. So I kinda hoped there is some sauce behind Hadoop I can borrow. While this might be not the most appropriate list, I'd appreciate if someone can say if jsvc can keep the right priorities, or suggest alternative daemon framework. Thanks again. It long predates my involvement in the project. In fact, it predates Hadoop itself - it got inherited from Nutch long ago. I vaguely recall a JIRA about using jsvc for Hadoop - if you search around I bet you can turn it up. -Todd Stas, Demonizing is one of those native bits java does not do well with by default. jsrv is an option. I have never had a problem with nohup as you have, although it is a bit hackish. Some concepts I was considering 1) Deamontools - manages processes run in the foreground (handles restarts), no need to demonize 2) linux-ha - much like init scripts but fancy cluster management capabilities Personally, I am pretty happy with the cloudera LSB scripts. Missing 'status' but ps -ef or jps deals with that. Do you just have general problems with 'nohup' or have you unearthed a specific hadoop nohup issue?
Re: What framework Hadoop uses for daemonizing?
Hi Edward. Do you just have general problems with 'nohup' or have you unearthed a specific hadoop nohup issue? Just to clarify that we speak about my own Java Hadoop connector here, not about Hadoop itself, which works just great (with some added pepper from monit for potential crashes). I don't like the fact that for simple init script I need to add a full redhat-lsb package, which just involves having a lot of packages installed. If Cloudera LSB scripts are self-contained, and (most important) can generate PID files, I will be happy to give them a look. Thanks.
Re: Mapper Process Duration
Nevermind, I had set the reuse to the wrong value. It seems that setting the reuse to 0 acts the same way as setting it to -1. On Thu, Feb 4, 2010 at 2:52 PM, Navraj S. Chohan nlak...@gmail.com wrote: Hello, I have a question about mapred.Child processes. Even though a mapper is finished I see that the process (from ps) stays around longer than reported on the hadoop MR webpage. What is the mapper process doing after it has reported that it is finished? To illustrate my question: I see that one mapper reports it finished in 9 seconds but from logging ps output every second, I see it last for 24 seconds before exiting. I essentially see this for each mapper. Lastly, where can I find information on how exactly the map reduce framework reuses JVMs. The reason I'm asking is because I see that with reuse on (mapred.job.reuse.jvm.num.tasks set to -1), the pid's change for each new mapper. How can this be without starting a new JVM? Thanks! -- Navraj S. Chohan nlak...@gmail.com -- Navraj S. Chohan nlak...@gmail.com
Re: configuration file
Hi Gang, You have to load the XML config file in your M/R code. Something like this: FSDataInputStream inS = fs.open(in); conf.addResource(inS); Where conf is your Configuration. This will in effect read all the parameters from that XML and override anything that you have previously set with: conf.set(parameter,parameterValue); regards, Eric Arenas - Original Message From: Gang Luo lgpub...@yahoo.com.cn To: common-user@hadoop.apache.org Sent: Thu, February 4, 2010 6:14:54 AM Subject: Re: configuration file I give the path to that xml file in that command. Do I need to add that path to classpath? I try to give a wrong path, there is no error reported. Aren't those parameters all configurable? like io.sort.mb, mapred.reduce.tasks, io.sort.factor, etc. Thanks. -Gang - 原始邮件 发件人: Amogh Vasekar am...@yahoo-inc.com 收件人: common-user@hadoop.apache.org common-user@hadoop.apache.org 发送日期: 2010/2/4 (周四) 6:09:04 上午 主 题: Re: configuration file Hi, A shot in the dark, is the conf file in your classpath? If yes, are the parameters you are trying to override marked final? Amogh On 2/4/10 3:18 AM, Gang Luo lgpub...@yahoo.com.cn wrote: Hi, I am writing script to run whole bunch of jobs automatically. But the configuration file doesn't seems working. I think there is something wrong in my command. The command is my script is like: bin/hadoop jar myJarFile myClass -conf myConfigurationFilr.xml arg1 agr2 I use conf.get() so show the value of some parameters. But the values are not what I define in that xml file. Is there something wrong? Thanks. -Gang ___ 好玩贺卡等你发,邮箱贺卡全新上线! http://card.mail.cn.yahoo.com/
heap memory
HI all, I suppose there is only map function that will consume the heap memory assigned to each map task. While the default heap memory is 200 mb, I just wonder most of the memory is wasted for a simple map function (e.g. IdentityMapper). So, I try to make use of this memory by buffering the output records, or maintaining large data structure in memory, but it doesn't work as I expect. For example, I want to build a hash table on a 100mb table in memory during the life time of that map task. But it fails due to lack of heap memory. Don't I get 200mb heap memory? What others also eat my heap memory? Thanks. -Gang ___ 好玩贺卡等你发,邮箱贺卡全新上线! http://card.mail.cn.yahoo.com/
Is it possible to write each key-value pair emitted by the reducer to a different output file
Hi, I was wondering if it is possible to write each key-value pair produced by the reduce function to a different file. How could I open a new file in the reduce function of the reducer? I know its possible in configure function but it will write all the output that reducer to that file. Thanks, Udaya.
Re: Is it possible to write each key-value pair emitted by the reducer to a different output file
See MultipleOutputs at http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html -Amareshwari On 2/5/10 10:41 AM, Udaya Lakshmi udaya...@gmail.com wrote: Hi, I was wondering if it is possible to write each key-value pair produced by the reduce function to a different file. How could I open a new file in the reduce function of the reducer? I know its possible in configure function but it will write all the output that reducer to that file. Thanks, Udaya.
Re: What framework Hadoop uses for daemonizing?
Hi, these are the most used tools - JSVC http://commons.apache.org/daemon/jsvc.html http://commons.apache.org/daemon/jsvc.html- Java service wrapper http://wrapper.tanukisoftware.org/ Windows only: - JNA windows service http://wrapper.tanukisoftware.org/doc/english/download.jsp- Windows service wrapper http://weblogs.java.net/blog/2008/09/29/winsw-windows-service-wrapper-less-restrictive-license http://weblogs.java.net/blog/2008/09/29/winsw-windows-service-wrapper-less-restrictive-license Regards, Leen On Thu, Feb 4, 2010 at 10:39 PM, Stas Oskin stas.os...@gmail.com wrote: I actually asked this because I'm looking for a good alternative to current bunch of scripts and lsb-redhat dependencies I have today in my own Hadoop client which runs as daemon. So I kinda hoped there is some sauce behind Hadoop I can borrow. While this might be not the most appropriate list, I'd appreciate if someone can say if jsvc can keep the right priorities, or suggest alternative daemon framework. Thanks again. It long predates my involvement in the project. In fact, it predates Hadoop itself - it got inherited from Nutch long ago. I vaguely recall a JIRA about using jsvc for Hadoop - if you search around I bet you can turn it up. -Todd
Re: EOFException and BadLink, but file descriptors number is ok?
not sure what else I could be checking to see where the problem lies. Should I be looking in the datanode logs? I looked briefly in there and didn't see anything from around the time exceptions started getting reported. lsof during the job execution? Number of open threads? I'm at a loss here. On Thu, Feb 4, 2010 at 2:52 PM, Meng Mao meng...@gmail.com wrote: I wrote a hadoop job that checks for ulimits across the nodes, and every node is reporting: core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 139264 max locked memory (kbytes, -l) 32 max memory size (kbytes, -m) unlimited open files (-n) 65536 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 139264 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Is anything in there telling about file number limits? From what I understand, a high open files limit like 65536 should be enough. I estimate only a couple thousand part-files on HDFS being written to at once, and around 200 on the filesystem per node. On Wed, Feb 3, 2010 at 4:04 PM, Meng Mao meng...@gmail.com wrote: also, which is the ulimit that's important, the one for the user who is running the job, or the hadoop user that owns the Hadoop processes? On Tue, Feb 2, 2010 at 7:29 PM, Meng Mao meng...@gmail.com wrote: I've been trying to run a fairly small input file (300MB) on Cloudera Hadoop 0.20.1. The job I'm using probably writes to on the order of over 1000 part-files at once, across the whole grid. The grid has 33 nodes in it. I get the following exception in the run logs: 10/01/30 17:24:25 INFO mapred.JobClient: map 100% reduce 12% 10/01/30 17:24:25 INFO mapred.JobClient: Task Id : attempt_201001261532_1137_r_13_0, Status : FAILED java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) at org.apache.hadoop.io.Text.readString(Text.java:400) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2869) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263) lots of EOFExceptions 10/01/30 17:24:25 INFO mapred.JobClient: Task Id : attempt_201001261532_1137_r_19_0, Status : FAILED java.io.IOException: Bad connect ack with firstBadLink 10.2.19.1:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2871) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263) 10/01/30 17:24:36 INFO mapred.JobClient: map 100% reduce 11% 10/01/30 17:24:42 INFO mapred.JobClient: map 100% reduce 12% 10/01/30 17:24:49 INFO mapred.JobClient: map 100% reduce 13% 10/01/30 17:24:55 INFO mapred.JobClient: map 100% reduce 14% 10/01/30 17:25:00 INFO mapred.JobClient: map 100% reduce 15% From searching around, it seems like the most common cause of BadLink and EOFExceptions is when the nodes don't have enough file descriptors set. But across all the grid machines, the file-max has been set to 1573039. Furthermore, we set ulimit -n to 65536 using hadoop-env.sh. Where else should I be looking for what's causing this?