Re: Optimized way
Hi Aayush, Do you want one map to run one command? You can give input file consisting of lines of . Use NLineInputFormat which splits N lines of input as one split. i.e gives N lines to one map for processing. By default, N is one. Then your map can just run the shell command on input line. Will this optimize your need? More details @ http://hadoop.apache.org/core/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html Thanks, Amareshwari Aayush Garg wrote: Hi, I am having a 5 node cluster for hadoop usage. All nodes are multi-core. I am running a shell command in Map function of my program and this shell command takes one file as an input. Many of such files are copied in the HDFS. So in summary map function will run a command like ./run Could you please suggest the optimized way to do this..like if I can use multi core processing of nodes and many of such maps in parallel. Thanks, Aayush
Re: Optimized way
Well, Map/Reduce and Hadoop by definition run maps in parallel. I think you're interested in the following two configuration settings: mapred.tasktracker.map.tasks.maximum mapred.tasktracker.reduce.tasks.maximum These go in hadoop-site.xml and will set the number of map and reduce tasks for each tasktracker (node). Learn more here: < http://hadoop.apache.org/core/docs/current/cluster_setup.html#Configuring+the+Hadoop+Daemons > Map tasks + reduce tasks should be slightly above the number of cores you have per node. So if you have 8 cores per node, setting map tasks to 6 and reduce tasks to 4 would probably be good. Hope this helps. Alex On Thu, Dec 4, 2008 at 6:42 AM, Aayush Garg <[EMAIL PROTECTED]> wrote: > Hi, > > I am having a 5 node cluster for hadoop usage. All nodes are multi-core. > I am running a shell command in Map function of my program and this shell > command takes one file as an input. Many of such files are copied in the > HDFS. > > So in summary map function will run a command like ./run > > > Could you please suggest the optimized way to do this..like if I can use > multi core processing of nodes and many of such maps in parallel. > > Thanks, > Aayush >
Optimized way
Hi, I am having a 5 node cluster for hadoop usage. All nodes are multi-core. I am running a shell command in Map function of my program and this shell command takes one file as an input. Many of such files are copied in the HDFS. So in summary map function will run a command like ./run Could you please suggest the optimized way to do this..like if I can use multi core processing of nodes and many of such maps in parallel. Thanks, Aayush