Bejoy,

What exactly is your use case? I know down below you said you were just 
thinking of a weird design but it would really help if we knew exactly what you 
were shooting for because we might be able to refactor it.

I have a job that I developed that still required the input to be sorted for 
the reduce but I did not need to do any transformation or filtering in the map 
side so I just did an identity mapper, as Robert mentions below this, and it 
works perfectly. I do not think that there is any way to pass data directly 
into the S/S phase without going through the map phase (if that is what you 
were hinting at) and if you don’t require the data to go through S/S then you 
can make it a map only job.

Matt

From: Robert Hafner [mailto:ted...@tedivm.com]
Sent: Wednesday, September 07, 2011 11:34 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: No Mapper but Reducer


You could just have a mapper which sent off the exact values it took in (ie, 
output k1,v1 as k2,v2). I think that's the best you'll be able to do here.


On Sep 7, 2011, at 4:21 AM, Bejoy KS 
<bejoy.had...@gmail.com<mailto:bejoy.had...@gmail.com>> wrote:
Thank You All. Even I have noticed this strange behavior some time back.
Now my inital concern still remains.  If I provide my input directory an empty 
one, yes the map tasks wont be executed .But my reducer needs  input to do the 
processing/ aggregation. In such a scenario, is there an option to provide 
input just to the reducer?

Regards
Bejoy.K.S
On Wed, Sep 7, 2011 at 3:09 PM, Sudharsan Sampath 
<sudha...@gmail.com<mailto:sudha...@gmail.com>> wrote:
This is true and it took as off by surprise in recent past. Also, it had quite 
some impact on our job cycles where the size of input is totally random and 
could also be zero at times.

In one of our cycles, we run a lot of jobs. Say we configure X as the num of 
reducers for a job which does not have any input.

Y -> No of tasktrackers in the cluster

H -> Time Interval for Heartbeat response

With the cdh2 version, the job takes,

( X / Y) * H seconds to complete without doing any work since we assign only 
one reduce task per heartbeat


If the number of such jobs in the cycle is more, then the total time that the 
cluster spends doing nothing accumulates.

I was thinking of raising this as a jira but not sure. Should we raise and fix 
this as jira request? Num of reducers set by the client can be overriden if the 
number of mappers is 0?

We have a way to hack, by verifying the existence of the input path to the Map 
phase ourselves but just thought would be more intuitive for the framework to 
handle itself

-Sudhan S

On Wed, Sep 7, 2011 at 2:25 PM, Harsh J 
<ha...@cloudera.com<mailto:ha...@cloudera.com>> wrote:
Oh boy are you in for a surprise. Reducers _can_ run with 0 mappers in a job ;-)

/me puts his troll-mask on.

➜  ~HADOOP_HOME  hadoop fs -mkdir abc
➜  ~HADOOP_HOME  hadoop jar hadoop-examples-0.20.2-cdh3u1.jar wordcount abc out
11/09/07 14:24:14 INFO input.FileInputFormat: Total input paths to process : 0
11/09/07 14:24:14 INFO mapred.JobClient: Running job: job_201109071413_0001
11/09/07 14:24:15 INFO mapred.JobClient:  map 0% reduce 0%
11/09/07 14:24:21 INFO mapred.JobClient:  map 0% reduce 100%
11/09/07 14:24:22 INFO mapred.JobClient: Job complete: job_201109071413_0001
11/09/07 14:24:22 INFO mapred.JobClient: Counters: 13
11/09/07 14:24:22 INFO mapred.JobClient:   Job Counters
11/09/07 14:24:22 INFO mapred.JobClient:     Launched reduce tasks=1
11/09/07 14:24:22 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=2209
11/09/07 14:24:22 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
11/09/07 14:24:22 INFO mapred.JobClient:     Total time spent by all
maps waiting after reserving slots (ms)=0
11/09/07 14:24:22 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=3113
11/09/07 14:24:22 INFO mapred.JobClient:   FileSystemCounters
11/09/07 14:24:22 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=59220
11/09/07 14:24:22 INFO mapred.JobClient:   Map-Reduce Framework
11/09/07 14:24:22 INFO mapred.JobClient:     Reduce input groups=0
11/09/07 14:24:22 INFO mapred.JobClient:     Combine output records=0
11/09/07 14:24:22 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/09/07 14:24:22 INFO mapred.JobClient:     Reduce output records=0
11/09/07 14:24:22 INFO mapred.JobClient:     Spilled Records=0
11/09/07 14:24:22 INFO mapred.JobClient:     Combine input records=0
11/09/07 14:24:22 INFO mapred.JobClient:     Reduce input records=0

/me takes off troll mask.

On Wed, Sep 7, 2011 at 1:30 PM, Bejoy KS 
<bejoy.had...@gmail.com<mailto:bejoy.had...@gmail.com>> wrote:
> Thanks Sonal. I was just thinking of some weird design and wanted to make
> sure whether there is a possibility like that- no maps and all reducers.
>
> On Wed, Sep 7, 2011 at 1:22 PM, Sonal Goyal 
> <sonalgoy...@gmail.com<mailto:sonalgoy...@gmail.com>> wrote:
>>
>> I dont think that is possible, can you explain in what scenario you want
>> to have no mappers, only reducers?
>> Best Regards,
>> Sonal
>> Crux: Reporting for HBase
>> Nube Technologies
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Sep 7, 2011 at 1:18 PM, Bejoy KS 
>> <bejoy.had...@gmail.com<mailto:bejoy.had...@gmail.com>> wrote:
>>>
>>> Hi
>>>           I'm having a query here. Is it possible to have no mappers but
>>> reducers alone? AFAIK If we need to avoid the tyriggering of reducers we can
>>> set numReduceTasks to zero but such a setting on mapper wont work. So how
>>> can it be achieved if possible?
>>>
>>> Thank You
>>>
>>> Regards
>>> Bejoy.K.S
>>
>
>


--
Harsh J


This e-mail message may contain privileged and/or confidential information, and 
is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please 
notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of 
this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, 
reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking 
for the presence of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage 
caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control 
laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and 
sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this 
information you are obligated to comply with all
applicable U.S. export laws and regulations.

Reply via email to