Thanks Amogh. For the second part of my question, I actually mean loading block separately from HDFS. I don't know whether it is realistic. Anyway, for my goal is to process different division of a file separately, to do that at split level is OK. But even I can get the splits from inputformat, how to "add only a few splits you need to mapper and discard the others"? (pathfilters only works for file, but not block, I think).
Thanks. -Gang ----- 原始邮件 ---- 发件人: Amogh Vasekar <am...@yahoo-inc.com> 收件人: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org> 发送日期: 2010/1/27 (周三) 1:40:26 下午 主 题: Re: fine granularity operation on HDFS Hi, >>now that I can get the splits of a file in hadoop, is it possible to name >>some splits (not all) as the input to mapper? I'm assuming when you say "splits of a file in hadoop" you mean splits generated from the inputformat and not the blocks stored in HDFS. The [File]InputFormat you use gives you access to splits, locations etc. You can use this to add only a few splits you need to mapper and discard the others ( something you can do on files as a whole using PathFilters ). >>Or can I manually read some of these splits (not the whole file) using HDFS >>api? You mean you list these splits somewhere in a file beforehand so individual mappers can read one line (split) ? Amogh ___________________________________________________________ 好玩贺卡等你发,邮箱贺卡全新上线! http://card.mail.cn.yahoo.com/