Re: Question about input file breakdown

Ted Dunning Mon, 15 Oct 2007 10:10:23 -0700

That doesn't quite do what the poster requested.  They wanted to pass the
entire file to the mapper.


That requires a custom input format or an indirect input approach (list of
file names in input).


On 10/15/07 9:57 AM, "Rick Cox" <[EMAIL PROTECTED]> wrote:

> You can also gzip each input file. Hadoop will not split a compressed
> input file (but will automatically decompress it before feeding it to
> your mapper).
> 
> rick
> 
> On 10/15/07, Ted Dunning <[EMAIL PROTECTED]> wrote:
>> 
>> 
>> Use a list of file names as your map input.  Then your mapper can read a
>> line, use that to open and read a file for processing.
>> 
>> This is similar to the problem of web-crawling where the input is a list of
>> URL's.
>> 
>> On 10/15/07 6:57 AM, "Ming Yang" <[EMAIL PROTECTED]> wrote:
>> 
>>> I was writing a test mapreduce program and noticed that the
>>> input file was always broken down into separate lines and fed
>>> to the mapper. However, in my case I need to process the whole
>>> file in the mapper since there are some dependency between
>>> lines in the input file. Is there any way I can achieve this --
>>> process the whole input file, either text or binary, in the mapper?
>> 
>>

Re: Question about input file breakdown

Reply via email to