Yeah, I'll write something up and post it on my web site. Definitely not InfoQ stuff, but a simple tip and tricks stuff.
-Mike > Subject: Re: Moving Files to Distributed Cache in MapReduce > From: a...@apache.org > Date: Sun, 31 Jul 2011 19:21:14 -0700 > To: common-user@hadoop.apache.org > > > We really need to build a working example to the wiki and add a link from the > FAQ page. Any volunteers? > > On Jul 29, 2011, at 7:49 PM, Michael Segel wrote: > > > > > Here's the meat of my post earlier... > > Sample code on putting a file on the cache: > > DistributedCache.addCacheFile(new URI(path+"MyFileName",conf)); > > > > Sample code in pulling data off the cache: > > private Path[] localFiles = > > DistributedCache.getLocalCacheFiles(context.getConfiguration()); > > boolean exitProcess = false; > > int i=0; > > while (!exit){ > > fileName = localFiles[i].getName(); > > if (fileName.equalsIgnoreCase("model.txt")){ > > // Build your input file reader on localFiles[i].toString() > > exitProcess = true; > > } > > i++; > > } > > > > > > Note that this is SAMPLE code. I didn't trap the exit condition if the file > > isn't there and you go beyond the size of the array localFiles[]. > > Also I set exit to false because its easier to read this as "Do this loop > > until the condition exitProcess is true". > > > > When you build your file reader you need the full path, not just the file > > name. The path will vary when the job runs. > > > > HTH > > > > -Mike > > > > > >> From: michael_se...@hotmail.com > >> To: common-user@hadoop.apache.org > >> Subject: RE: Moving Files to Distributed Cache in MapReduce > >> Date: Fri, 29 Jul 2011 21:43:37 -0500 > >> > >> > >> I could have sworn that I gave an example earlier this week on how to push > >> and pull stuff from distributed cache. > >> > >> > >>> Date: Fri, 29 Jul 2011 14:51:26 -0700 > >>> Subject: Re: Moving Files to Distributed Cache in MapReduce > >>> From: rogc...@ucdavis.edu > >>> To: common-user@hadoop.apache.org > >>> > >>> jobConf is deprecated in 0.20.2 I believe; you're supposed to be using > >>> Configuration for that > >>> > >>> On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia > >>> <mohitanch...@gmail.com>wrote: > >>> > >>>> Is this what you are looking for? > >>>> > >>>> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html > >>>> > >>>> search for jobConf > >>>> > >>>> On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen <rogc...@ucdavis.edu> wrote: > >>>>> Thanks for the response! However, I'm having an issue with this line > >>>>> > >>>>> Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf); > >>>>> > >>>>> because conf has private access in org.apache.hadoop.configured > >>>>> > >>>>> On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <mapred.le...@gmail.com > >>>>> wrote: > >>>>> > >>>>>> I hope my previous reply helps... > >>>>>> > >>>>>> On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <rogc...@ucdavis.edu> > >>>> wrote: > >>>>>> > >>>>>>> After moving it to the distributed cache, how would I call it within > >>>> my > >>>>>>> MapReduce program? > >>>>>>> > >>>>>>> On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn < > >>>> mapred.le...@gmail.com > >>>>>>>> wrote: > >>>>>>> > >>>>>>>> Did you try using -files option in your hadoop jar command as: > >>>>>>>> > >>>>>>>> /usr/bin/hadoop jar <jar name> <main class name> -files <absolute > >>>> path > >>>>>>> of > >>>>>>>> file to be added to distributed cache> <input dir> <output dir> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <rogc...@ucdavis.edu> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Slight modification: I now know how to add files to the > >>>> distributed > >>>>>>> file > >>>>>>>>> cache, which can be done via this command placed in the main or > >>>> run > >>>>>>>> class: > >>>>>>>>> > >>>>>>>>> DistributedCache.addCacheFile(new > >>>>>>> URI("/user/hadoop/thefile.dat"), > >>>>>>>>> conf); > >>>>>>>>> > >>>>>>>>> However I am still having trouble locating the file in the > >>>>>> distributed > >>>>>>>>> cache. *How do I call the file path of thefile.dat in the > >>>> distributed > >>>>>>>> cache > >>>>>>>>> as a string?* I am using Hadoop 0.20.2 > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <rogc...@ucdavis.edu > >>>>> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Hi all, > >>>>>>>>>> > >>>>>>>>>> Does anybody have examples of how one moves files from the local > >>>>>>>>>> filestructure/HDFS to the distributed cache in MapReduce? A > >>>> Google > >>>>>>>> search > >>>>>>>>>> turned up examples in Pig but not MR. > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> Roger Chen > >>>>>>>>>> UC Davis Genome Center > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Roger Chen > >>>>>>>>> UC Davis Genome Center > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Roger Chen > >>>>>>> UC Davis Genome Center > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Roger Chen > >>>>> UC Davis Genome Center > >>>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> Roger Chen > >>> UC Davis Genome Center > >> > > >