One more thing::: The HashMap that I am generating in the reduce phase will be on single node or multiple nodes in the distributed enviornment? If my dataset is large will this approach work? If not what can I do for this? Also same thing with the file that I am writing in the run function (simple file opening FileStream) ??
On Thu, Apr 17, 2008 at 6:04 AM, Amar Kamat <[EMAIL PROTECTED]> wrote: > Ted Dunning wrote: > > > The easiest solution is to not worry too much about running an extra MR > > step. > > > > So, > > > > - run a first pass to get the counts. Use word count as the pattern. > > Store > > the results in a file. > > > > - run the second pass. You can now read the hash-table from the file > > you > > stored in pass 1. > > > > Another approach is to do the counting in your maps as specified and > > then > > before exiting, you can emit special records for each key to suppress. > > With > > the correct sort and partition functions, you can make these killer > > records > > appear first in the reduce input. Then, if your reducer sees the kill > > flag > > in the front of the values, it can avoid processing any extra data. > > > > > > > Ted, > Will this work for the case where the cutoff frequency/count requires a > global picture? I guess not. > > In general, it is better to not try to communicate between map and reduce > > except via the expected mechanisms. > > > > > > On 4/16/08 1:33 PM, "Aayush Garg" <[EMAIL PROTECTED]> wrote: > > > > > > > > > We can not read HashMap in the configure method of the reducer because > > > it is > > > called before reduce job. > > > I need to eliminate rows from the HashMap when all the keys are read. > > > Also my concern is if dataset is large will this HashMap thing work?? > > > > > > > > > On Wed, Apr 16, 2008 at 10:07 PM, Ted Dunning <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > > > > > That design is fine. > > > > > > > > You should read your map in the configure method of the reducer. > > > > > > > > There is a MapFile format supported by Hadoop, but they tend to be > > > > pretty > > > > slow. I usually find it better to just load my hash table by hand. > > > > If > > > > you > > > > do this, you should use whatever format you like. > > > > > > > > > > > > On 4/16/08 12:41 PM, "Aayush Garg" <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > > > > HI, > > > > > > > > > > The current structure of my program is:: > > > > > Upper class{ > > > > > class Reduce{ > > > > > reduce function(K1,V1,K2,V2){ > > > > > // I count the frequency for each key > > > > > // Add output in HashMap(Key,value) instead of > > > > > output.collect() > > > > > } > > > > > } > > > > > > > > > > void run() > > > > > { > > > > > runjob(); > > > > > // Now eliminate top frequency keys in HashMap built in reduce > > > > > > > > > > > > > > function > > > > > > > > > > > > > here because only now hashmap is complete. > > > > > // Write this hashmap to a file in such a format so that I can > > > > > use > > > > > > > > > > > > > > this > > > > > > > > > > > > > hashmap in next MapReduce job and key of this hashmap is taken as > > > > > key in > > > > > mapper function of that Map Reduce. ?? How and which format should > > > > > I > > > > > choose??? Is this design and approach ok? > > > > > > > > > > } > > > > > > > > > > public static void main() {} > > > > > } > > > > > I hope you have got my question. > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > On Wed, Apr 16, 2008 at 8:33 AM, Amar Kamat <[EMAIL PROTECTED]> > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Aayush Garg wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > Are you sure that another MR is required for eliminating some > > > > > > > rows? > > > > > > > Can't I > > > > > > > just somehow eliminate from main() when I know the keys which > > > > > > > are > > > > > > > > > > > > > > > > > > > > needed > > > > > > > > > > > > > to > > > > > > > remove? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Can you provide some more details on how exactly are you > > > > > > filtering? > > > > > > Amar > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- Aayush Garg, Phone: +41 76 482 240