So, you want to read the sample file in main and add each line to job by job.set, and then read these lines in mapper by job.get?
I think it is better to name the data file as input source to mapper, while read the whole sample file in each mapper instance using HDFS api, and then compare them. It is actually how map-side join works. Gang Luo --------- Department of Computer Science Duke University (919)316-0993 gang....@duke.edu ----- 原始邮件 ---- 发件人: Boyu Zhang <boyuzhan...@gmail.com> 收件人: common-user@hadoop.apache.org 发送日期: 2009/11/22 (周日) 3:21:23 下午 主 题: Questions About Passing Parameters to Hadoop Job Dear All, I am implementing an algorithm that read a data file(.txt file, approximately 90MB), compare each line of the data file with each line of a specific samples file(.txt file, approximately 20MB). To do this, I need to pass each line of the samples file as parameters to map-reduce job. And they are large, in a sense. My current way is that I use the job.set and job.get to set and retrieve these lines as configurations. But it is not efficient at all! Could anyone help me with an alternative solution? Thanks a million! Boyu Zhang University of Delaware ___________________________________________________________ 好玩贺卡等你发,邮箱贺卡全新上线! http://card.mail.cn.yahoo.com/