Re: No. of Map and reduce tasks
What if I had multiple files in input directory, hadoop should then fire parallel map jobs? On Thu, May 26, 2011 at 7:21 PM, jagaran das jagaran_...@yahoo.co.in wrote: If you give really low size files, then the use of Big Block Size of Hadoop goes away. Instead try merging files. Hope that helps From: James Seigel ja...@tynt.com To: common-user@hadoop.apache.org common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 6:04:07 PM Subject: Re: No. of Map and reduce tasks Set input split size really low, you might get something. I'd rather you fire up some nix commands and pack together that file onto itself a bunch if times and the put it back into hdfs and let 'er rip Sent from my mobile. Please excuse the typos. On 2011-05-26, at 4:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I think I understand that by last 2 replies :) But my question is can I change this configuration to say split file into 250K so that multiple mappers can be invoked? On Thu, May 26, 2011 at 3:41 PM, James Seigel ja...@tynt.com wrote: have more data for it to process :) On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote: I ran a simple pig script on this file: -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log that orders the contents by name. But it only created one mapper. How can I change this to distribute accross multiple machines? On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote: Hi Mohit, No of Maps - It depends on what is the Total File Size / Block Size No of Reducers - You can specify. Regards, Jagaran From: Mohit Anchlia mohitanch...@gmail.com To: common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 2:48:20 PM Subject: No. of Map and reduce tasks How can I tell how the map and reduce tasks were spread accross the cluster? I looked at the jobtracker web page but can't find that info. Also, can I specify how many map or reduce tasks I want to be launched? From what I understand is that it's based on the number of input files passed to hadoop. So if I have 4 files there will be 4 Map taks that will be launced and reducer is dependent on the hashpartitioner.
Re: No. of Map and reduce tasks
Hi Mohit, No of Maps - It depends on what is the Total File Size / Block Size No of Reducers - You can specify. Regards, Jagaran From: Mohit Anchlia mohitanch...@gmail.com To: common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 2:48:20 PM Subject: No. of Map and reduce tasks How can I tell how the map and reduce tasks were spread accross the cluster? I looked at the jobtracker web page but can't find that info. Also, can I specify how many map or reduce tasks I want to be launched? From what I understand is that it's based on the number of input files passed to hadoop. So if I have 4 files there will be 4 Map taks that will be launced and reducer is dependent on the hashpartitioner.
Re: No. of Map and reduce tasks
I ran a simple pig script on this file: -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log that orders the contents by name. But it only created one mapper. How can I change this to distribute accross multiple machines? On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote: Hi Mohit, No of Maps - It depends on what is the Total File Size / Block Size No of Reducers - You can specify. Regards, Jagaran From: Mohit Anchlia mohitanch...@gmail.com To: common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 2:48:20 PM Subject: No. of Map and reduce tasks How can I tell how the map and reduce tasks were spread accross the cluster? I looked at the jobtracker web page but can't find that info. Also, can I specify how many map or reduce tasks I want to be launched? From what I understand is that it's based on the number of input files passed to hadoop. So if I have 4 files there will be 4 Map taks that will be launced and reducer is dependent on the hashpartitioner.
Re: No. of Map and reduce tasks
have more data for it to process :) On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote: I ran a simple pig script on this file: -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log that orders the contents by name. But it only created one mapper. How can I change this to distribute accross multiple machines? On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote: Hi Mohit, No of Maps - It depends on what is the Total File Size / Block Size No of Reducers - You can specify. Regards, Jagaran From: Mohit Anchlia mohitanch...@gmail.com To: common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 2:48:20 PM Subject: No. of Map and reduce tasks How can I tell how the map and reduce tasks were spread accross the cluster? I looked at the jobtracker web page but can't find that info. Also, can I specify how many map or reduce tasks I want to be launched? From what I understand is that it's based on the number of input files passed to hadoop. So if I have 4 files there will be 4 Map taks that will be launced and reducer is dependent on the hashpartitioner.
Re: No. of Map and reduce tasks
I think I understand that by last 2 replies :) But my question is can I change this configuration to say split file into 250K so that multiple mappers can be invoked? On Thu, May 26, 2011 at 3:41 PM, James Seigel ja...@tynt.com wrote: have more data for it to process :) On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote: I ran a simple pig script on this file: -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log that orders the contents by name. But it only created one mapper. How can I change this to distribute accross multiple machines? On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote: Hi Mohit, No of Maps - It depends on what is the Total File Size / Block Size No of Reducers - You can specify. Regards, Jagaran From: Mohit Anchlia mohitanch...@gmail.com To: common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 2:48:20 PM Subject: No. of Map and reduce tasks How can I tell how the map and reduce tasks were spread accross the cluster? I looked at the jobtracker web page but can't find that info. Also, can I specify how many map or reduce tasks I want to be launched? From what I understand is that it's based on the number of input files passed to hadoop. So if I have 4 files there will be 4 Map taks that will be launced and reducer is dependent on the hashpartitioner.
Re: No. of Map and reduce tasks
Set input split size really low, you might get something. I'd rather you fire up some nix commands and pack together that file onto itself a bunch if times and the put it back into hdfs and let 'er rip Sent from my mobile. Please excuse the typos. On 2011-05-26, at 4:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I think I understand that by last 2 replies :) But my question is can I change this configuration to say split file into 250K so that multiple mappers can be invoked? On Thu, May 26, 2011 at 3:41 PM, James Seigel ja...@tynt.com wrote: have more data for it to process :) On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote: I ran a simple pig script on this file: -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log that orders the contents by name. But it only created one mapper. How can I change this to distribute accross multiple machines? On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote: Hi Mohit, No of Maps - It depends on what is the Total File Size / Block Size No of Reducers - You can specify. Regards, Jagaran From: Mohit Anchlia mohitanch...@gmail.com To: common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 2:48:20 PM Subject: No. of Map and reduce tasks How can I tell how the map and reduce tasks were spread accross the cluster? I looked at the jobtracker web page but can't find that info. Also, can I specify how many map or reduce tasks I want to be launched? From what I understand is that it's based on the number of input files passed to hadoop. So if I have 4 files there will be 4 Map taks that will be launced and reducer is dependent on the hashpartitioner.
Re: No. of Map and reduce tasks
If you give really low size files, then the use of Big Block Size of Hadoop goes away. Instead try merging files. Hope that helps From: James Seigel ja...@tynt.com To: common-user@hadoop.apache.org common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 6:04:07 PM Subject: Re: No. of Map and reduce tasks Set input split size really low, you might get something. I'd rather you fire up some nix commands and pack together that file onto itself a bunch if times and the put it back into hdfs and let 'er rip Sent from my mobile. Please excuse the typos. On 2011-05-26, at 4:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I think I understand that by last 2 replies :) But my question is can I change this configuration to say split file into 250K so that multiple mappers can be invoked? On Thu, May 26, 2011 at 3:41 PM, James Seigel ja...@tynt.com wrote: have more data for it to process :) On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote: I ran a simple pig script on this file: -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log that orders the contents by name. But it only created one mapper. How can I change this to distribute accross multiple machines? On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote: Hi Mohit, No of Maps - It depends on what is the Total File Size / Block Size No of Reducers - You can specify. Regards, Jagaran From: Mohit Anchlia mohitanch...@gmail.com To: common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 2:48:20 PM Subject: No. of Map and reduce tasks How can I tell how the map and reduce tasks were spread accross the cluster? I looked at the jobtracker web page but can't find that info. Also, can I specify how many map or reduce tasks I want to be launched? From what I understand is that it's based on the number of input files passed to hadoop. So if I have 4 files there will be 4 Map taks that will be launced and reducer is dependent on the hashpartitioner.