Hi Karthik, Thanks a lot for the information! I will look into it and try!
Boyu 2009/11/22 Karthik Kambatla <karthik.shashank.kamba...@gmail.com> > Though it is recommended for large files, DistributedCache might be a good > alternative for you. > > > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html > < > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html > > > Karthik Kambatla > > > 2009/11/22 Gang Luo <lgpub...@yahoo.com.cn> > > > So, you want to read the sample file in main and add each line to job by > > job.set, and then read these lines in mapper by job.get? > > > > I think it is better to name the data file as input source to mapper, > while > > read the whole sample file in each mapper instance using HDFS api, and > then > > compare them. It is actually how map-side join works. > > > > > > Gang Luo > > --------- > > Department of Computer Science > > Duke University > > (919)316-0993 > > gang....@duke.edu > > > > > > > > ----- 原始邮件 ---- > > 发件人: Boyu Zhang <boyuzhan...@gmail.com> > > 收件人: common-user@hadoop.apache.org > > 发送日期: 2009/11/22 (周日) 3:21:23 下午 > > 主 题: Questions About Passing Parameters to Hadoop Job > > > > Dear All, > > > > I am implementing an algorithm that read a data file(.txt file, > > approximately 90MB), compare each line of the data file with each line of > a > > specific samples file(.txt file, approximately 20MB). To do this, I need > to > > pass each line of the samples file as parameters to map-reduce job. And > > they > > are large, in a sense. > > > > My current way is that I use the job.set and job.get to set and retrieve > > these lines as configurations. But it is not efficient at all! > > > > Could anyone help me with an alternative solution? Thanks a million! > > > > Boyu Zhang > > University of Delaware > > > > > > > > ___________________________________________________________ > > 好玩贺卡等你发,邮箱贺卡全新上线! > > http://card.mail.cn.yahoo.com/ > > >