Don't assume that any variables are shared between reducers or between maps, or between maps and reducers.
If you want to share data, put it into HDFS. On 4/17/08 4:01 AM, "Aayush Garg" <[EMAIL PROTECTED]> wrote: > One more thing::: > The HashMap that I am generating in the reduce phase will be on single node > or multiple nodes in the distributed enviornment? If my dataset is large > will this approach work? If not what can I do for this? > Also same thing with the file that I am writing in the run function (simple > file opening FileStream) ?? > > > > On Thu, Apr 17, 2008 at 6:04 AM, Amar Kamat <[EMAIL PROTECTED]> wrote: > >> Ted Dunning wrote: >> >>> The easiest solution is to not worry too much about running an extra MR >>> step. >>> >>> So, >>> >>> - run a first pass to get the counts. Use word count as the pattern. >>> Store >>> the results in a file. >>> >>> - run the second pass. You can now read the hash-table from the file >>> you >>> stored in pass 1. >>> >>> Another approach is to do the counting in your maps as specified and >>> then >>> before exiting, you can emit special records for each key to suppress. >>> With >>> the correct sort and partition functions, you can make these killer >>> records >>> appear first in the reduce input. Then, if your reducer sees the kill >>> flag >>> in the front of the values, it can avoid processing any extra data. >>> >>> >>> >> Ted, >> Will this work for the case where the cutoff frequency/count requires a >> global picture? I guess not. >> >> In general, it is better to not try to communicate between map and reduce >>> except via the expected mechanisms. >>> >>> >>> On 4/16/08 1:33 PM, "Aayush Garg" <[EMAIL PROTECTED]> wrote: >>> >>> >>> >>>> We can not read HashMap in the configure method of the reducer because >>>> it is >>>> called before reduce job. >>>> I need to eliminate rows from the HashMap when all the keys are read. >>>> Also my concern is if dataset is large will this HashMap thing work?? >>>> >>>> >>>> On Wed, Apr 16, 2008 at 10:07 PM, Ted Dunning <[EMAIL PROTECTED]> >>>> wrote: >>>> >>>> >>>> >>>>> That design is fine. >>>>> >>>>> You should read your map in the configure method of the reducer. >>>>> >>>>> There is a MapFile format supported by Hadoop, but they tend to be >>>>> pretty >>>>> slow. I usually find it better to just load my hash table by hand. >>>>> If >>>>> you >>>>> do this, you should use whatever format you like. >>>>> >>>>> >>>>> On 4/16/08 12:41 PM, "Aayush Garg" <[EMAIL PROTECTED]> wrote: >>>>> >>>>> >>>>> >>>>>> HI, >>>>>> >>>>>> The current structure of my program is:: >>>>>> Upper class{ >>>>>> class Reduce{ >>>>>> reduce function(K1,V1,K2,V2){ >>>>>> // I count the frequency for each key >>>>>> // Add output in HashMap(Key,value) instead of >>>>>> output.collect() >>>>>> } >>>>>> } >>>>>> >>>>>> void run() >>>>>> { >>>>>> runjob(); >>>>>> // Now eliminate top frequency keys in HashMap built in reduce >>>>>> >>>>>> >>>>> function >>>>> >>>>> >>>>>> here because only now hashmap is complete. >>>>>> // Write this hashmap to a file in such a format so that I can >>>>>> use >>>>>> >>>>>> >>>>> this >>>>> >>>>> >>>>>> hashmap in next MapReduce job and key of this hashmap is taken as >>>>>> key in >>>>>> mapper function of that Map Reduce. ?? How and which format should >>>>>> I >>>>>> choose??? Is this design and approach ok? >>>>>> >>>>>> } >>>>>> >>>>>> public static void main() {} >>>>>> } >>>>>> I hope you have got my question. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> >>>>>> On Wed, Apr 16, 2008 at 8:33 AM, Amar Kamat <[EMAIL PROTECTED]> >>>>>> >>>>>> >>>>> wrote: >>>>> >>>>> >>>>>> Aayush Garg wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Are you sure that another MR is required for eliminating some >>>>>>>> rows? >>>>>>>> Can't I >>>>>>>> just somehow eliminate from main() when I know the keys which >>>>>>>> are >>>>>>>> >>>>>>>> >>>>>>> needed >>>>> >>>>> >>>>>> to >>>>>>>> remove? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Can you provide some more details on how exactly are you >>>>>>> filtering? >>>>>>> Amar >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> >> >> >