Re: No Mapper but Reducer

2011-09-07 Thread Bejoy KS
Exactly Matthew, The weird thought was in that direction. Basically i do have a tilde separated input which has to undergo some aggregation operation. So I was just giving a shot to see if there is a possibility to run directly into Sort Shuffle phase directly and then the reducer without a mapper.

Re: How to Create an effective chained MapReduce program.

2011-09-07 Thread Lance Norskog
You might find it more easy to understand this if you use one of the low-level job-scripting languages like Oozie or Hamake. They put the whole assemblage of stuff into one file. On Wed, Sep 7, 2011 at 3:17 PM, David Rosenstrauch wrote: > * open a SequenceFile.Reader on the sequence file > * in a

Re: How to Create an effective chained MapReduce program.

2011-09-07 Thread David Rosenstrauch
* open a SequenceFile.Reader on the sequence file * in a loop, call next(key,val) on the reader to read the next key/val pair in the file (see: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Reader.html#next(org.apache.hadoop.io.Writable,%20org.apache.hadoop.i

Re: How to Create an effective chained MapReduce program.

2011-09-07 Thread ilyal levin
Can you be more specific on how to perform this. In general is there a way to convert the binary files i have to text files? On Tue, Sep 6, 2011 at 11:26 PM, David Rosenstrauch wrote: > On 09/06/2011 01:57 AM, Niels Basjes wrote: > >> Hi, >> >> In the past i've had the same situation where I ne

RE: No Mapper but Reducer

2011-09-07 Thread GOEKE, MATTHEW (AG/1000)
Bejoy, What exactly is your use case? I know down below you said you were just thinking of a weird design but it would really help if we knew exactly what you were shooting for because we might be able to refactor it. I have a job that I developed that still required the input to be sorted for

Re: No Mapper but Reducer

2011-09-07 Thread Robert Hafner
You could just have a mapper which sent off the exact values it took in (ie, output k1,v1 as k2,v2). I think that's the best you'll be able to do here. On Sep 7, 2011, at 4:21 AM, Bejoy KS wrote: > Thank You All. Even I have noticed this strange behavior some time back. > Now my inital conc

Reducer running out of heap before it gets to the reduce() method

2011-09-07 Thread Matrangola, Geoffrey
I think my job is running out of memory before it calls reduce() in the reducer. It's running with large blocks of binary data emitted from the Maper. Each record emitted from the mappers should be small enough to fit in memory. However, if it tried to somehow keep a bunch of records for one key

Re: How to pass a parameter across the cluster.

2011-09-07 Thread CHANG Lei
Another method is to store it on a shared store which can be accessed from each node, such as zookeeper, hdfs, hbase, db etc 在 2011 9 7 20:11,"Yaron Gonen" 写道: > There is no "right way". I think the best thing to do is to ask in the > forums. I thought maybe via the Configuration object, but this

Re: How to pass a parameter across the cluster.

2011-09-07 Thread Yaron Gonen
There is no "right way". I think the best thing to do is to ask in the forums. I thought maybe via the Configuration object, but this is by no way a formal solution. On Wed, Sep 7, 2011 at 2:39 PM, ilyal levin wrote: > Hi > What is the right way to pass a parameter for all mapper and reducers t

Re: No Mapper but Reducer

2011-09-07 Thread Harsh J
Nope. A reducer's input is from the map outputs alone (fetched in by the shuffling code), which would not exist here. What are you looking to do? Why won't a map task suffice for doing that? On Wed, Sep 7, 2011 at 4:51 PM, Bejoy KS wrote: > Thank You All. Even I have noticed this strange behavio

How to pass a parameter across the cluster.

2011-09-07 Thread ilyal levin
Hi What is the right way to pass a parameter for all mapper and reducers to see? Thanks

Re: No Mapper but Reducer

2011-09-07 Thread Bejoy KS
Thank You All. Even I have noticed this strange behavior some time back. Now my inital concern still remains. If I provide my input directory an empty one, yes the map tasks wont be executed .But my reducer needs input to do the processing/ aggregation. In such a scenario, is there an option to p

Re: Multiple Mappers and One Reducer

2011-09-07 Thread Harsh J
Praveenesh, The JIRA https://issues.apache.org/jira/browse/MAPREDUCE-369 introduced it and carries a patch that I think would apply without much trouble on your cluster's sources. You can mail me directly if you need help applying a patch. Alternatively, you can do something like downloading 0.21

Re: Multiple Mappers and One Reducer

2011-09-07 Thread praveenesh kumar
Harsh, Can you please tell how can we use MultipleInputs using Job Object on hadoop 0.20.2. As you can see, in MultipleInputs, its using JobConf object. I want to use Job object as mentioned in new hadoop 0.21 API. I remember you talked about pulling out things from new API and add it into out proj

Re: Multiple Mappers and One Reducer

2011-09-07 Thread Harsh J
Sahana, Yes this is possible as well. Please take a look at the MultipleInputs API @ http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/lib/MultipleInputs.html It will allow you to add a path each with its own mapper implementation, and you can then have a common reducer s

Re: Multiple Mappers and One Reducer

2011-09-07 Thread Sudharsan Sampath
Hi, Its possible by setting the num of reduce tasks to be 1. Based on your example, it looks like u need to group ur records based on "Date, counter1 and counter2". So that should go in the logic of building your key for your map o/p. Thanks Sudhan S On Wed, Sep 7, 2011 at 3:02 PM, Sahana Bhat

Re: No Mapper but Reducer

2011-09-07 Thread Sudharsan Sampath
This is true and it took as off by surprise in recent past. Also, it had quite some impact on our job cycles where the size of input is totally random and could also be zero at times. In one of our cycles, we run a lot of jobs. Say we configure X as the num of reducers for a job which does not hav

Re: Multiple Mappers and One Reducer

2011-09-07 Thread Sahana Bhat
Hi, I understand that given a file, the file is split across 'n' mapper instances, which is the normal case. The scenario i have is : 1. Two files which are not totally identical in terms of number of columns (but have data that is similar in a few columns) need to be processed and after

Re: Perl Mapper with Java Reducer

2011-09-07 Thread Amareshwari Sri Ramadasu
You can look at Hadoop streaming http://hadoop.apache.org/common/docs/r0.20.0/streaming.html Thanks Amareshwari On 9/7/11 1:38 PM, "Bejoy KS" wrote: Hi Is it possible to have my mapper in Perl and reducer in java. In my existing legacy system some larger process is being handled by Per

Re: Multiple Mappers and One Reducer

2011-09-07 Thread Harsh J
Sahana, Yes. But, isn't that how it is normally? What makes you question this capability? On Wed, Sep 7, 2011 at 2:37 PM, Sahana Bhat wrote: > Hi, >          Is it possible to have multiple mappers  where each mapper is > operating on a different input file and whose result (which is a key value

Multiple Mappers and One Reducer

2011-09-07 Thread Sahana Bhat
Hi, Is it possible to have multiple mappers where each mapper is operating on a different input file and whose result (which is a key value pair from different mappers) is processed by a single reducer? Regards, Sahana

Re: No Mapper but Reducer

2011-09-07 Thread Harsh J
Oh boy are you in for a surprise. Reducers _can_ run with 0 mappers in a job ;-) /me puts his troll-mask on. ➜ ~HADOOP_HOME hadoop fs -mkdir abc ➜ ~HADOOP_HOME hadoop jar hadoop-examples-0.20.2-cdh3u1.jar wordcount abc out 11/09/07 14:24:14 INFO input.FileInputFormat: Total input paths to pro

RE: No Mapper but Reducer

2011-09-07 Thread Devaraj K
Hi Bejoy, It is possible to execute a job with no mappers and reducers alone. You can try this by giving the empty directory as input for the job. Devaraj K _ From: Bejoy KS [mailto:bejoy.had...@gmail.com] Sent: Wednesday, September 07, 2011 1:30 PM To: mapreduce-use

Perl Mapper with Java Reducer

2011-09-07 Thread Bejoy KS
Hi Is it possible to have my mapper in Perl and reducer in java. In my existing legacy system some larger process is being handled by Perl and the business logic of those are really complex. It is a herculean task to convert all the Perl to java. But the reducer business logic which is agai

Re: No Mapper but Reducer

2011-09-07 Thread Bejoy KS
Thanks Sonal. I was just thinking of some weird design and wanted to make sure whether there is a possibility like that- no maps and all reducers. On Wed, Sep 7, 2011 at 1:22 PM, Sonal Goyal wrote: > I dont think that is possible, can you explain in what scenario you want to > have no mappers, o

Re: No Mapper but Reducer

2011-09-07 Thread Sonal Goyal
I dont think that is possible, can you explain in what scenario you want to have no mappers, only reducers? Best Regards, Sonal Crux: Reporting for HBase Nube Technologies On Wed, Sep 7, 2011

No Mapper but Reducer

2011-09-07 Thread Bejoy KS
Hi I'm having a query here. Is it possible to have no mappers but reducers alone? AFAIK If we need to avoid the tyriggering of reducers we can set numReduceTasks to zero but such a setting on mapper wont work. So how can it be achieved if possible? Thank You Regards Bejoy.K.S