subject:"Re\: No. of Map and reduce tasks"

Re: No. of Map and reduce tasks

2011-05-31 Thread Mohit Anchlia

What if I had multiple files in input directory, hadoop should then
fire parallel map jobs?


On Thu, May 26, 2011 at 7:21 PM, jagaran das jagaran_...@yahoo.co.in wrote:
 If you give really low size files, then the use of Big Block Size of Hadoop
 goes away.
 Instead try merging files.

 Hope that helps



 
 From: James Seigel ja...@tynt.com
 To: common-user@hadoop.apache.org common-user@hadoop.apache.org
 Sent: Thu, 26 May, 2011 6:04:07 PM
 Subject: Re: No. of Map and reduce tasks

 Set input split size really low,  you might get something.

 I'd rather you fire up some nix commands and pack together that file
 onto itself a bunch if times and the put it back into hdfs and let 'er
 rip

 Sent from my mobile. Please excuse the typos.

 On 2011-05-26, at 4:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote:

 I think I understand that by last 2 replies :)  But my question is can
 I change this configuration to say split file into 250K so that
 multiple mappers can be invoked?

 On Thu, May 26, 2011 at 3:41 PM, James Seigel ja...@tynt.com wrote:
 have more data for it to process :)


 On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote:

 I ran a simple pig script on this file:

 -rw-r--r-- 1 root root   208348 May 26 13:43 excite-small.log

 that orders the contents by name. But it only created one mapper. How
 can I change this to distribute accross multiple machines?

 On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in
 wrote:
 Hi Mohit,

 No of Maps - It depends on what is the Total File Size / Block Size
 No of Reducers - You can specify.

 Regards,
 Jagaran



 
 From: Mohit Anchlia mohitanch...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Thu, 26 May, 2011 2:48:20 PM
 Subject: No. of Map and reduce tasks

 How can I tell how the map and reduce tasks were spread accross the
 cluster? I looked at the jobtracker web page but can't find that info.

 Also, can I specify how many map or reduce tasks I want to be launched?

 From what I understand is that it's based on the number of input files
 passed to hadoop. So if I have 4 files there will be 4 Map taks that
 will be launced and reducer is dependent on the hashpartitioner.

Re: No. of Map and reduce tasks

2011-05-26 Thread jagaran das

Hi Mohit,

No of Maps - It depends on what is the Total File Size / Block Size 
No of Reducers - You can specify.

Regards,
Jagaran 




From: Mohit Anchlia mohitanch...@gmail.com
To: common-user@hadoop.apache.org
Sent: Thu, 26 May, 2011 2:48:20 PM
Subject: No. of Map and reduce tasks

How can I tell how the map and reduce tasks were spread accross the
cluster? I looked at the jobtracker web page but can't find that info.

Also, can I specify how many map or reduce tasks I want to be launched?

From what I understand is that it's based on the number of input files
passed to hadoop. So if I have 4 files there will be 4 Map taks that
will be launced and reducer is dependent on the hashpartitioner.

Re: No. of Map and reduce tasks

2011-05-26 Thread Mohit Anchlia

I ran a simple pig script on this file:

-rw-r--r-- 1 root root   208348 May 26 13:43 excite-small.log

that orders the contents by name. But it only created one mapper. How
can I change this to distribute accross multiple machines?

On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote:
 Hi Mohit,

 No of Maps - It depends on what is the Total File Size / Block Size
 No of Reducers - You can specify.

 Regards,
 Jagaran



 
 From: Mohit Anchlia mohitanch...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Thu, 26 May, 2011 2:48:20 PM
 Subject: No. of Map and reduce tasks

 How can I tell how the map and reduce tasks were spread accross the
 cluster? I looked at the jobtracker web page but can't find that info.

 Also, can I specify how many map or reduce tasks I want to be launched?

 From what I understand is that it's based on the number of input files
 passed to hadoop. So if I have 4 files there will be 4 Map taks that
 will be launced and reducer is dependent on the hashpartitioner.

Re: No. of Map and reduce tasks

2011-05-26 Thread James Seigel

have more data for it to process :)


On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote:

 I ran a simple pig script on this file:
 
 -rw-r--r-- 1 root root   208348 May 26 13:43 excite-small.log
 
 that orders the contents by name. But it only created one mapper. How
 can I change this to distribute accross multiple machines?
 
 On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote:
 Hi Mohit,
 
 No of Maps - It depends on what is the Total File Size / Block Size
 No of Reducers - You can specify.
 
 Regards,
 Jagaran
 
 
 
 
 From: Mohit Anchlia mohitanch...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Thu, 26 May, 2011 2:48:20 PM
 Subject: No. of Map and reduce tasks
 
 How can I tell how the map and reduce tasks were spread accross the
 cluster? I looked at the jobtracker web page but can't find that info.
 
 Also, can I specify how many map or reduce tasks I want to be launched?
 
 From what I understand is that it's based on the number of input files
 passed to hadoop. So if I have 4 files there will be 4 Map taks that
 will be launced and reducer is dependent on the hashpartitioner.

Re: No. of Map and reduce tasks

2011-05-26 Thread Mohit Anchlia

I think I understand that by last 2 replies :)  But my question is can
I change this configuration to say split file into 250K so that
multiple mappers can be invoked?

On Thu, May 26, 2011 at 3:41 PM, James Seigel ja...@tynt.com wrote:
 have more data for it to process :)


 On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote:

 I ran a simple pig script on this file:

 -rw-r--r-- 1 root root   208348 May 26 13:43 excite-small.log

 that orders the contents by name. But it only created one mapper. How
 can I change this to distribute accross multiple machines?

 On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote:
 Hi Mohit,

 No of Maps - It depends on what is the Total File Size / Block Size
 No of Reducers - You can specify.

 Regards,
 Jagaran



 
 From: Mohit Anchlia mohitanch...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Thu, 26 May, 2011 2:48:20 PM
 Subject: No. of Map and reduce tasks

 How can I tell how the map and reduce tasks were spread accross the
 cluster? I looked at the jobtracker web page but can't find that info.

 Also, can I specify how many map or reduce tasks I want to be launched?

 From what I understand is that it's based on the number of input files
 passed to hadoop. So if I have 4 files there will be 4 Map taks that
 will be launced and reducer is dependent on the hashpartitioner.

Re: No. of Map and reduce tasks

2011-05-26 Thread James Seigel

Set input split size really low,  you might get something.

I'd rather you fire up some nix commands and pack together that file
onto itself a bunch if times and the put it back into hdfs and let 'er
rip

Sent from my mobile. Please excuse the typos.

On 2011-05-26, at 4:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote:

 I think I understand that by last 2 replies :)  But my question is can
 I change this configuration to say split file into 250K so that
 multiple mappers can be invoked?

 On Thu, May 26, 2011 at 3:41 PM, James Seigel ja...@tynt.com wrote:
 have more data for it to process :)


 On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote:

 I ran a simple pig script on this file:

 -rw-r--r-- 1 root root   208348 May 26 13:43 excite-small.log

 that orders the contents by name. But it only created one mapper. How
 can I change this to distribute accross multiple machines?

 On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in 
 wrote:
 Hi Mohit,

 No of Maps - It depends on what is the Total File Size / Block Size
 No of Reducers - You can specify.

 Regards,
 Jagaran



 
 From: Mohit Anchlia mohitanch...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Thu, 26 May, 2011 2:48:20 PM
 Subject: No. of Map and reduce tasks

 How can I tell how the map and reduce tasks were spread accross the
 cluster? I looked at the jobtracker web page but can't find that info.

 Also, can I specify how many map or reduce tasks I want to be launched?

 From what I understand is that it's based on the number of input files
 passed to hadoop. So if I have 4 files there will be 4 Map taks that
 will be launced and reducer is dependent on the hashpartitioner.

Re: No. of Map and reduce tasks

2011-05-26 Thread jagaran das

If you give really low size files, then the use of Big Block Size of Hadoop 
goes away.
Instead try merging files.

Hope that helps




From: James Seigel ja...@tynt.com
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Sent: Thu, 26 May, 2011 6:04:07 PM
Subject: Re: No. of Map and reduce tasks

Set input split size really low,  you might get something.

I'd rather you fire up some nix commands and pack together that file
onto itself a bunch if times and the put it back into hdfs and let 'er
rip

Sent from my mobile. Please excuse the typos.

On 2011-05-26, at 4:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote:

 I think I understand that by last 2 replies :)  But my question is can
 I change this configuration to say split file into 250K so that
 multiple mappers can be invoked?

 On Thu, May 26, 2011 at 3:41 PM, James Seigel ja...@tynt.com wrote:
 have more data for it to process :)


 On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote:

 I ran a simple pig script on this file:

 -rw-r--r-- 1 root root   208348 May 26 13:43 excite-small.log

 that orders the contents by name. But it only created one mapper. How
 can I change this to distribute accross multiple machines?

 On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in 
wrote:
 Hi Mohit,

 No of Maps - It depends on what is the Total File Size / Block Size
 No of Reducers - You can specify.

 Regards,
 Jagaran



 
 From: Mohit Anchlia mohitanch...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Thu, 26 May, 2011 2:48:20 PM
 Subject: No. of Map and reduce tasks

 How can I tell how the map and reduce tasks were spread accross the
 cluster? I looked at the jobtracker web page but can't find that info.

 Also, can I specify how many map or reduce tasks I want to be launched?

 From what I understand is that it's based on the number of input files
 passed to hadoop. So if I have 4 files there will be 4 Map taks that
 will be launced and reducer is dependent on the hashpartitioner.

Re: No. of Map and reduce tasks

Re: No. of Map and reduce tasks

Re: No. of Map and reduce tasks

Re: No. of Map and reduce tasks

Re: No. of Map and reduce tasks

Re: No. of Map and reduce tasks

Re: No. of Map and reduce tasks

7 matches

Site Navigation

Mail list logo

Footer information