My bible code problem is someone similar. I have many small files and
one mapper needs to process an entire file. So I generate an input
file

/user/bc/ecapriolo/bible1/grid/10/0,dictionary.txt
/user/bc/ecapriolo/bible1/grid/10/1,dictionary.txt
/user/bc/ecapriolo/bible1/grid/10/2,dictionary.txt

use nline input format:

    JobConf conf = new JobConf(getConf(), GridSearcher.class);
    conf.setJobName("GridSearcher");
    conf.setMapperClass(MapClass.class);
    conf.setInputFormat(NLineInputFormat.class);
    conf.setMapOutputKeyClass(Text.class);
    conf.setMapOutputValueClass(Text.class);
    FileInputFormat.setInputPaths(conf, new Path("/user/bc/gridsearchcmd.txt"));
    FileOutputFormat.setOutputPath(conf, new Path("/user/bc/gridsearchres"));

Now each mapper opens and processes the entire file using
FSDataInputStream. It is an anti-pattern, but my map is NOT feeding me
line per line of data. It is only feeding me the names of files to
open. One map one file.

On Sat, Jan 23, 2010 at 9:54 AM, Raymond Jennings III
<raymondj...@yahoo.com> wrote:
> Not sure if this solves your problem but I had a similar case where there was 
> unique data at the beginning of the file and if that file was split between 
> maps I would lose that for the 2nd and subsequent maps.  I was able to pull 
> the file name from the conf and read the first two lines for every map.
>
> --- On Sat, 1/23/10, stolikp <stol...@o2.pl> wrote:
>
>> From: stolikp <stol...@o2.pl>
>> Subject: Passing whole text file to a single map
>> To: core-u...@hadoop.apache.org
>> Date: Saturday, January 23, 2010, 9:49 AM
>>
>> I've got some text files in my input directory and I want
>> to pass each single
>> text file (whole file not just a line) to a map (one file
>> per one map). How
>> can I do this ? TextInputFormat splits text into lines and
>> I do not want
>> this to happen.
>> I tried:
>> http://hadoop.apache.org/common/docs/r0.20./streaming.html#How+do+I+process+files%2C+one+per+map%3F
>> but it doesn't work for me, compiler doesn't know what
>> NonSplitableTextInputFormat.class is.
>> I'm using hadoop 0.20.1
>> --
>> View this message in context: 
>> http://old.nabble.com/Passing-whole-text-file-to-a-single-map-tp27286204p27286204.html
>> Sent from the Hadoop core-user mailing list archive at
>> Nabble.com.
>>
>>
>
>
>
>

Reply via email to