Are you looking to use DistributedCache for better performance? On Fri, Mar 2, 2012 at 9:42 AM, Geoffry Roberts <geoffry.robe...@gmail.com>wrote:
> This is a tardy response. I'm spread pretty thinly right now. > > DistributedCache< > http://hadoop.apache.org/common/docs/r1.0.0/mapred_tutorial.html#DistributedCache > >is > apparently deprecated. Is there a replacement? I didn't see anything > about this in the documentation, but then I am still using 0.21.0. I have > to for performance reasons. 1.0.1 is too slow and the client won't have > it. > > Also, the DistributedCache< > http://hadoop.apache.org/common/docs/r1.0.0/mapred_tutorial.html#DistributedCache > >approach > seems only to work from within a hadoop job. i.e. From within a > Mapper or a Reducer, but not from within a Driver. I have libraries that I > must access both from both places. I take it that I am stuck keeping two > copies of these libraries in synch--Correct? It's either that, or copy > them into hdfs, replacing them all at the beginning of each job run. > > Looking for best practices. > > Thanks > > On 28 February 2012 10:17, Owen O'Malley <omal...@apache.org> wrote: > > > On Tue, Feb 28, 2012 at 5:15 PM, Geoffry Roberts > > <geoffry.robe...@gmail.com> wrote: > > > > > If I create an executable jar file that contains all dependencies > > required > > > by the MR job do all said dependencies get distributed to all nodes? > > > > You can make a single jar and that will be distributed to all of the > > machines that run the task, but it is better in most cases to use the > > distributed cache. > > > > See > > > http://hadoop.apache.org/common/docs/r1.0.0/mapred_tutorial.html#DistributedCache > > > > > If I specify but one reducer, which node in the cluster will the > reducer > > > run on? > > > > The scheduling is done by the JobTracker and it isn't possible to > > control the location of the reducers. > > > > -- Owen > > > > > > -- > Geoffry Roberts > -- "What we are is the universe's gift to us. What we become is our gift to the universe."