One more thing:::
The HashMap that I am generating in the reduce phase will be on single node
or multiple nodes in the distributed enviornment? If my dataset is large
will this approach work? If not what can I do for this?
Also same thing with the file that I am writing in the run function (simple
file opening FileStream) ??



On Thu, Apr 17, 2008 at 6:04 AM, Amar Kamat <[EMAIL PROTECTED]> wrote:

> Ted Dunning wrote:
>
> > The easiest solution is to not worry too much about running an extra MR
> > step.
> >
> > So,
> >
> > - run a first pass to get the counts.  Use word count as the pattern.
> >  Store
> > the results in a file.
> >
> > - run the second pass.  You can now read the hash-table from the file
> > you
> > stored in pass 1.
> >
> > Another approach is to do the counting in your maps as specified and
> > then
> > before exiting, you can emit special records for each key to suppress.
> >  With
> > the correct sort and partition functions, you can make these killer
> > records
> > appear first in the reduce input.  Then, if your reducer sees the kill
> > flag
> > in the front of the values, it can avoid processing any extra data.
> >
> >
> >
> Ted,
> Will this work for the case where the cutoff frequency/count requires a
> global picture? I guess not.
>
>  In general, it is better to not try to communicate between map and reduce
> > except via the expected mechanisms.
> >
> >
> > On 4/16/08 1:33 PM, "Aayush Garg" <[EMAIL PROTECTED]> wrote:
> >
> >
> >
> > > We can not read HashMap in the configure method of the reducer because
> > > it is
> > > called before reduce job.
> > > I need to eliminate rows from the HashMap when all the keys are read.
> > > Also my concern is if dataset is large will this HashMap thing work??
> > >
> > >
> > > On Wed, Apr 16, 2008 at 10:07 PM, Ted Dunning <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > >
> > >
> > > > That design is fine.
> > > >
> > > > You should read your map in the configure method of the reducer.
> > > >
> > > > There is a MapFile format supported by Hadoop, but they tend to be
> > > > pretty
> > > > slow.  I usually find it better to just load my hash table by hand.
> > > >  If
> > > > you
> > > > do this, you should use whatever format you like.
> > > >
> > > >
> > > > On 4/16/08 12:41 PM, "Aayush Garg" <[EMAIL PROTECTED]> wrote:
> > > >
> > > >
> > > >
> > > > > HI,
> > > > >
> > > > > The current structure of my program is::
> > > > > Upper class{
> > > > > class Reduce{
> > > > >  reduce function(K1,V1,K2,V2){
> > > > >        // I count the frequency for each key
> > > > >     // Add output in  HashMap(Key,value)  instead  of
> > > > >  output.collect()
> > > > >   }
> > > > >  }
> > > > >
> > > > > void run()
> > > > >  {
> > > > >      runjob();
> > > > >     // Now eliminate top frequency keys in HashMap built in reduce
> > > > >
> > > > >
> > > > function
> > > >
> > > >
> > > > > here because only now hashmap is complete.
> > > > >     // Write this hashmap to a file in such a format so that I can
> > > > > use
> > > > >
> > > > >
> > > > this
> > > >
> > > >
> > > > > hashmap in next MapReduce job and key of this hashmap is taken as
> > > > > key in
> > > > > mapper function of that Map Reduce. ?? How and which format should
> > > > > I
> > > > > choose??? Is this design and approach ok?
> > > > >
> > > > >  }
> > > > >
> > > > >  public static void main() {}
> > > > > }
> > > > > I hope you have got my question.
> > > > >
> > > > > Thanks,
> > > > >
> > > > >
> > > > > On Wed, Apr 16, 2008 at 8:33 AM, Amar Kamat <[EMAIL PROTECTED]>
> > > > >
> > > > >
> > > > wrote:
> > > >
> > > >
> > > > > Aayush Garg wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Are you sure that another MR is required for eliminating some
> > > > > > > rows?
> > > > > > > Can't I
> > > > > > > just somehow eliminate from main() when I know the keys which
> > > > > > > are
> > > > > > >
> > > > > > >
> > > > > > needed
> > > >
> > > >
> > > > > to
> > > > > > > remove?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > Can you provide some more details on how exactly are you
> > > > > > filtering?
> > > > > > Amar
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
>
>


-- 
Aayush Garg,
Phone: +41 76 482 240

Reply via email to