So, you want to read the sample file in main and add each line to job by 
job.set, and then read these lines in mapper by job.get?

I think it is better to name the data file as input source to mapper, while 
read the whole sample file in each mapper instance using HDFS api, and then 
compare them. It is actually how map-side join works. 

 
Gang Luo
---------
Department of Computer Science
Duke University
(919)316-0993
gang....@duke.edu



----- 原始邮件 ----
发件人: Boyu Zhang <boyuzhan...@gmail.com>
收件人: common-user@hadoop.apache.org
发送日期: 2009/11/22 (周日) 3:21:23 下午
主   题: Questions About Passing Parameters to Hadoop Job

Dear All,

I am implementing an algorithm that read a data file(.txt file,
approximately 90MB), compare each line of the data file with each line of a
specific samples file(.txt file, approximately 20MB). To do this, I need to
pass each line of the samples file as parameters to map-reduce job. And they
are large, in a sense.

My current way is that I use the job.set and job.get to set and retrieve
these lines as configurations. But it is not efficient at all!

Could anyone help me with an alternative solution? Thanks a million!

Boyu Zhang
University of Delaware



      ___________________________________________________________ 
  好玩贺卡等你发,邮箱贺卡全新上线! 
http://card.mail.cn.yahoo.com/

Reply via email to