Though it is recommended for large files, DistributedCache might be a good alternative for you.
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html <http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html> Karthik Kambatla 2009/11/22 Gang Luo <lgpub...@yahoo.com.cn> > So, you want to read the sample file in main and add each line to job by > job.set, and then read these lines in mapper by job.get? > > I think it is better to name the data file as input source to mapper, while > read the whole sample file in each mapper instance using HDFS api, and then > compare them. It is actually how map-side join works. > > > Gang Luo > --------- > Department of Computer Science > Duke University > (919)316-0993 > gang....@duke.edu > > > > ----- 原始邮件 ---- > 发件人: Boyu Zhang <boyuzhan...@gmail.com> > 收件人: common-user@hadoop.apache.org > 发送日期: 2009/11/22 (周日) 3:21:23 下午 > 主 题: Questions About Passing Parameters to Hadoop Job > > Dear All, > > I am implementing an algorithm that read a data file(.txt file, > approximately 90MB), compare each line of the data file with each line of a > specific samples file(.txt file, approximately 20MB). To do this, I need to > pass each line of the samples file as parameters to map-reduce job. And > they > are large, in a sense. > > My current way is that I use the job.set and job.get to set and retrieve > these lines as configurations. But it is not efficient at all! > > Could anyone help me with an alternative solution? Thanks a million! > > Boyu Zhang > University of Delaware > > > > ___________________________________________________________ > 好玩贺卡等你发,邮箱贺卡全新上线! > http://card.mail.cn.yahoo.com/ >